CN117076388A - File processing method and device, storage medium and electronic equipment - Google Patents

File processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117076388A
CN117076388A CN202311316367.XA CN202311316367A CN117076388A CN 117076388 A CN117076388 A CN 117076388A CN 202311316367 A CN202311316367 A CN 202311316367A CN 117076388 A CN117076388 A CN 117076388A
Authority
CN
China
Prior art keywords
file
compressed
data dictionary
block
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311316367.XA
Other languages
Chinese (zh)
Inventor
房毅
高彪
翟玮楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Information Industry Innovation Technology Beijing Co ltd
Original Assignee
Zhongke Information Industry Innovation Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Information Industry Innovation Technology Beijing Co ltd filed Critical Zhongke Information Industry Innovation Technology Beijing Co ltd
Priority to CN202311316367.XA priority Critical patent/CN117076388A/en
Publication of CN117076388A publication Critical patent/CN117076388A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Some embodiments of the present application provide a method, an apparatus, a storage medium, and an electronic device for file processing, where the method includes: partitioning the file to be compressed to obtain a plurality of file blocks to be compressed; confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block. Some embodiments of the application can realize efficient compression of files and reduce storage space.

Description

File processing method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of file processing technologies, and in particular, to a method and an apparatus for file processing, a storage medium, and an electronic device.
Background
With the continuous popularization of monitoring, the picture pixels collected by the monitoring equipment are higher and higher, so that the requirement on the storage space of the picture files is also higher and higher.
At present, in order to meet the requirement of storing the picture files, the storage space is usually increased by adding storage equipment, but the early investment and the later equipment maintenance cost are higher. Or the compression rate is improved to a certain extent by compressing the picture files in batches, but the occupied memory of the compressed picture is not obviously reduced.
Therefore, how to provide a method for file processing with higher compression rate is a technical problem to be solved.
Disclosure of Invention
The application provides a file processing method, a file processing device, a storage medium and electronic equipment.
In a first aspect, some embodiments of the present application provide a method for file processing, including: partitioning the file to be compressed to obtain a plurality of file blocks to be compressed; confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
According to some embodiments of the application, after a plurality of file blocks to be compressed are obtained by cutting a file to be compressed, the file blocks after the compression are obtained by compressing each data dictionary corresponding to each file block to be compressed. According to the embodiment of the application, the file is processed in a manner of cutting and compressing the corresponding data dictionary, so that the compression rate can be effectively improved, and the occupied space of the compressed file block is reduced.
In some embodiments, the dicing the file to be compressed to obtain a plurality of file blocks to be compressed includes: and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
According to the method and the device, the file to be compressed is segmented to obtain a plurality of file blocks to be compressed by setting the segmentation value, so that data support can be provided for subsequent compression rate improvement.
In some embodiments, before said compressing each file block to be compressed using each data dictionary corresponding to each file directory, the method further comprises: and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
According to some embodiments of the application, each compressed data dictionary is decompressed through the zlib algorithm to obtain a corresponding data dictionary, so that the occupied memory can be reduced by storing the data dictionary.
In some embodiments, before said compressing each file block to be compressed using each data dictionary corresponding to each file directory, the method further comprises: splitting a plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples; storing each original file block in the plurality of original file blocks into the corresponding file directory; training each file directory to obtain each data dictionary corresponding to each file directory; and compressing each data dictionary to obtain each compressed data dictionary.
According to some embodiments of the application, the data dictionary is trained through a plurality of original file samples, so that each compressed data dictionary corresponding to each file directory is obtained, effective support is provided for subsequent file compression, and the occupied memory of the compressed file is effectively reduced.
In a second aspect, some embodiments of the present application provide a method for file processing, including: obtaining a compressed file obtained by any method embodiment provided in the first aspect, where the compressed file includes a plurality of compressed file blocks; determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs; decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block; and merging all the decompressed file blocks to obtain the target file.
Some embodiments of the application determine the attributes of a plurality of compressed file blocks in the compressed file, then obtain a corresponding data dictionary, and decompress and merge the compressed file blocks through the data dictionary to obtain the target file. The compressed file blocks occupy less memory, have higher compression rate and improve decompression efficiency.
In some embodiments, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each of the compressed data dictionaries is trained by a method embodiment as in the first aspect.
According to some embodiments of the application, the data dictionary is obtained through training and stored in the compressed data dictionary form, so that the occupied space is reduced.
In a third aspect, some embodiments of the present application provide an apparatus for file processing, including: the partitioning module is configured to partition the file to be compressed to obtain a plurality of file blocks to be compressed; the confirming module is configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and the compression module is configured to compress each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
In a fourth aspect, some embodiments of the present application provide an apparatus for file processing, including: an obtaining module, configured to obtain the compressed file obtained by any one of the method embodiments provided in the first aspect, where the compressed file includes a plurality of compressed file blocks; the determining module is configured to determine each file directory to which each compressed file block belongs in the plurality of compressed file blocks; the decompression module is configured to decompress each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block; and the merging module is configured to merge all the decompressed file blocks to obtain the target file.
In a fifth aspect, some embodiments of the application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs a method according to any of the embodiments of the first aspect.
In a sixth aspect, some embodiments of the application provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is capable of implementing a method according to any of the embodiments of the first aspect when executing the program.
In a seventh aspect, some embodiments of the application provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, is adapted to carry out the method according to any of the embodiments of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be construed as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a system diagram of a document process provided by some embodiments of the present application;
FIG. 2 is a flow chart of a method for obtaining a data dictionary according to some embodiments of the present application;
FIG. 3 is a diagram of a training data dictionary provided in some embodiments of the present application;
FIG. 4 is a second flowchart of a method for document processing according to some embodiments of the present application;
FIG. 5 is a diagram illustrating file compression according to some embodiments of the present application;
FIG. 6 is one of the flow charts of the method of file processing provided by some embodiments of the present application;
FIG. 7 is a schematic diagram of file decompression provided in some embodiments of the present application;
FIG. 8 is a second block diagram illustrating an apparatus for document processing according to some embodiments of the present application;
FIG. 9 is one of the block diagrams of the apparatus for file processing provided in some embodiments of the present application;
fig. 10 is a schematic diagram of an electronic device according to some embodiments of the present application.
Detailed Description
The technical solutions of some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
In the related art, a satellite image pickup apparatus generates a large number of high-definition large pictures of the same size every day. The picture is characterized in that the content of the node images does not change much at different times, and the images generated by the high definition device generally occupy a large amount of storage space. Although the capacity of the existing storage device is larger and larger, the frequency of capturing images by the device is high, and the storage space of the storage device can be used up quickly. At present, in order to solve the problem of insufficient storage space, storage equipment is generally added in the prior art, however, the storage equipment occupies a large amount of room space, and in addition, additional hardware expenditure, power consumption and other energy problems are added, so that later hardware maintenance is also a relatively large expenditure. Or, by directly compressing each original picture, the repeated data is not very large because of the large amount of information contained in the picture, and the compression rate is not very high when each picture is compressed. In addition, the format conversion method can be used for storing unimportant image resources by sacrificing the definition of the picture and adopting a format with smaller occupied space, but the method can cause the loss of important information in the picture.
As known from the related art, the prior art has large memory occupation and high cost when storing large pictures.
In view of this, some embodiments of the present application provide a method for file processing, which obtains a plurality of file blocks to be compressed by dicing a file to be compressed. And then obtaining each corresponding data dictionary by confirming the file directory to which each file block to be compressed belongs. And finally, compressing each file block to be compressed through a data dictionary to obtain each compressed file block. According to the method and the device for compressing the files, the files to be compressed can be compressed efficiently, the compression rate is improved, the occupied memory of the compressed files is reduced, the cost is low, and the practicability is high.
The overall composition of a document processing system provided in accordance with some embodiments of the present application is described below by way of example with reference to FIG. 1.
As shown in fig. 1, some embodiments of the present application provide a system for file processing, the system for file processing including: a terminal 100 and a file processing terminal 200. The terminal 100 may acquire a high-definition image file (as a specific example of a file to be compressed) photographed by a satellite. The file processing terminal 200 may acquire the high definition image file by reading or receiving. Then, the file processing end 200 needs to segment the high-definition image file to obtain a plurality of file blocks to be compressed, and determines a file directory to which each file block to be compressed belongs. And then compressing each file block to be compressed through each data dictionary corresponding to the file directory to which each file block to be compressed belongs, so as to obtain each compressed file block. Correspondingly, the file processing end 200 may decompress each compressed file block through each data dictionary to obtain each decompressed file block (each decompressed file block, that is, each file block to be compressed before compression), and finally combine all the decompressed file blocks to obtain the original high-definition image file.
In some embodiments of the present application, the file to be compressed may be a high-definition image shot by the satellite, or may be a Word, PDF document, etc. that occupies a relatively large memory. The embodiments of the present application are not limited thereto.
In some embodiments of the present application, the terminal 100 may be a mobile terminal or a non-portable computer terminal, and embodiments of the present application are not limited herein. In addition, in other embodiments of the present application, if the terminal 100 has the function of the file processing end 200 for blocking and compressing the file to be compressed, the file processing end 200 may not be provided. Specifically, the configuration may be set according to actual situations, and the embodiment of the present application is not limited herein.
It should be noted that, each data dictionary is obtained by training a plurality of original file samples in advance and stored in the form of a compressed file at the file processing end 200. It will be appreciated that, in order to achieve efficient compression of a file to be compressed, a data dictionary needs to be acquired first, and thus, the implementation of the data dictionary acquired by the file processing terminal 200 according to some embodiments of the present application is exemplarily described below with reference to fig. 2.
Referring to fig. 2, fig. 2 is a flowchart of a method for obtaining a data dictionary according to some embodiments of the present application, where the method for obtaining a data dictionary includes:
s210, segmenting the plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples.
For example, in some embodiments of the present application, a certain number of files are selected from the original files as training data. And then cutting all original file samples in the training data according to a slicing method to obtain a plurality of original file blocks. For example, a training architecture diagram shown in fig. 3 includes n original file samples (i.e., an original file a and an original file b in fig. 3. An original file n), and an a file block 1. An a file block n is obtained by splitting a service layer file, and the corresponding original file block is obtained by splitting an original file b file to an original file n. For example, 100 original file samples with the size of 2G are used as training data, and each original file sample is firstly cut according to 1M to obtain an original file block.
S220, storing each original file block in the plurality of original file blocks into the corresponding file directory.
For example, in some embodiments of the present application, the corresponding original file block is placed in the file directory corresponding to the block number, so as to ensure that the small file size in the file directory corresponding to the block number after cutting is consistent. Each file directory is used as a training set. For example, as can be seen from fig. 3, file block 1 of all files corresponds to dictionary 1 (as one for each data dictionaryA specific example), and so on, file blocks n of all files correspond to dictionary n. For example, for every 1M file (as a specific example of every original file block) after cutting, n=1024 is generated after cutting by 1250 blocks2/>1250 file catalogues, each file catalogue stores 100 corresponding original file blocks after large file cutting.
And S230, training each file directory to obtain each data dictionary corresponding to each file directory.
For example, in some embodiments of the present application, the training set for each file directory is trained to generate a corresponding data dictionary. And finally, one file directory corresponds to one data dictionary. For example, cut by 1250 blocks, 1024 is generated2/>1250 data dictionary.
S240, compressing each data dictionary to obtain each compressed data dictionary.
For example, in some embodiments of the present application, zlib compression storage is performed on each data dictionary to save storage space.
It should be noted that, the data dictionary (or referred to as a static dictionary) is generated according to the training set, and is used to establish a mapping relationship between patterns and symbols appearing in the original document sample. The data dictionary is not modified after being generated, so that each data dictionary can be subjected to independent zlib compression storage to save storage space better.
The following is an exemplary description of a specific process of file processing in a file compression process according to some embodiments of the present application with reference to fig. 4.
Referring to fig. 4, fig. 4 is a flowchart of a method for processing a file according to some embodiments of the present application, where the method for processing a file includes:
s410, partitioning the file to be compressed to obtain a plurality of file blocks to be compressed.
For example, in some embodiments of the present application, the file to be compressed is partitioned in a manner of splitting the original file sample in the data dictionary obtained in fig. 2, so as to obtain a file block to be compressed.
In some embodiments of the present application, S410 may include: and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
For example, in some embodiments of the present application, the dicing effect is better, the compression rate is better, and it is important to use zstd compression subsequently, by trial and error. Therefore, the file to be compressed is segmented according to 1M, and then each 1M file is segmented according to a preset value (e.g. 1250), so as to obtain a plurality of file blocks to be compressed. As shown in fig. 5, the file to be compressed is partitioned to obtain file block 1, file block 2.
S420, confirming each file directory to which each file block to be compressed belongs in the file blocks to be compressed.
For example, in some embodiments of the present application, each corresponding compressed data dictionary may be obtained by validating the file directory to which each file block to be compressed belongs. Wherein each compressed data dictionary is pre-trained by the method embodiment shown in fig. 2.
In some embodiments of the present application, before performing S430, the method of file processing may further include: and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
For example, in some embodiments of the present application, since the data dictionary is stored in compressed form, it is necessary to decompress it by zlib algorithm to obtain each data dictionary corresponding to each file directory. As shown in fig. 5, the data dictionary corresponding to each file block, that is, the data dictionary 1 corresponding to the file block 1, the data dictionary 2 corresponding to the file block 2.
And S430, compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
For example, in some embodiments of the present application, each file block to be compressed is compressed by each data dictionary, resulting in a plurality of compressed small file blocks (as a specific example of each compressed file block). And finally deleting the plurality of cut file blocks to be compressed. As shown in fig. 5, file block 1 is compressed by data dictionary 1 to obtain compressed file 1 (i.e., small file block), and so on to obtain compressed file 2.
The following is an exemplary description of the specific procedures for file processing provided by some embodiments of the present application in connection with fig. 6.
Referring to fig. 6, fig. 6 is a flowchart of a method for processing a file in a file decompression process according to some embodiments of the present application, where the method for processing a file includes:
s610, obtaining a compressed file, wherein the compressed file comprises a plurality of compressed file blocks;
for example, in some embodiments of the application, a compressed file resulting from the method embodiment shown in FIG. 4 is obtained. The compressed file includes a plurality of compressed small file blocks, i.e., compressed file 1, compressed file 2. Compressed file N as shown in fig. 7.
S620, determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs.
For example, in some embodiments of the present application, each corresponding compressed data dictionary may be facilitated by validating the file directory to which each small file block belongs. Wherein each compressed data dictionary is pre-trained by the method embodiment shown in fig. 2.
And S630, decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block.
For example, in some embodiments of the present application, each small file block is decompressed by its corresponding data dictionary, respectively, to obtain a corresponding decompressed file block. For example, as shown in fig. 7, the compressed file 1 and the compressed file 2..compressed file N are decompressed by the data dictionary 1 and the data dictionary 2..data dictionary N, respectively, to obtain the file block 1 and the file block 2..file block N.
And S640, merging all the decompressed file blocks to obtain the target file.
For example, in some embodiments of the present application, all decompressed file blocks are combined to obtain the target file. That is, all file blocks are read into one input stream, and then the contents of the input stream are all output into the same file output stream. For example, as shown in fig. 7, file block 1, file block 2.
In some embodiments of the present application, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each compressed data dictionary is obtained through training.
For example, in some embodiments of the application, each data dictionary is stored in compressed form, which is trained in accordance with the method shown in FIG. 2.
Referring to fig. 8, fig. 8 is a block diagram illustrating an apparatus for file processing according to some embodiments of the present application. It should be understood that the apparatus for processing a document corresponds to the above method embodiments, and is capable of performing the steps involved in the above method embodiments, and specific functions of the apparatus for processing a document may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
The apparatus of fig. 8 for file processing includes at least one software functional module that can be stored in a memory in the form of software or firmware or cured in the apparatus for file processing, the apparatus for file processing comprising: the partitioning module 810 is configured to partition a file to be compressed to obtain a plurality of file blocks to be compressed; a confirmation module 820 configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; the compression module 830 is configured to compress each file block to be compressed by using each data dictionary corresponding to each file directory, so as to obtain each compressed file block.
In some embodiments of the present application, the partitioning module 810 is configured to partition the file to be compressed according to a set partition value, so as to obtain the plurality of file blocks to be compressed.
In some embodiments of the present application, the compression module 830 is configured to decompress each compressed data dictionary using zlib algorithm to obtain the each data dictionary.
In some embodiments of the present application, the apparatus for file processing further includes, prior to the compression module 830: a training module (not shown in the figure) configured to segment a plurality of original file samples to obtain a plurality of original file blocks corresponding to each of the plurality of original file samples; storing each original file block in the plurality of original file blocks into the corresponding file directory; training each file directory to obtain each data dictionary corresponding to each file directory; and compressing each data dictionary to obtain each compressed data dictionary.
Referring to fig. 9, fig. 9 is a block diagram illustrating an apparatus for file processing according to some embodiments of the present application. It should be understood that the apparatus for processing a document corresponds to the above method embodiments, and is capable of performing the steps involved in the above method embodiments, and specific functions of the apparatus for processing a document may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
The apparatus of fig. 9 for file processing includes at least one software functional module that can be stored in a memory in the form of software or firmware or cured in the apparatus for file processing, the apparatus for file processing comprising: an obtaining module 910, configured to obtain a compressed file, where the compressed file includes a plurality of compressed file blocks; a determining module 920 configured to determine each file directory to which each compressed file block of the plurality of compressed file blocks belongs; the decompression module 930 is configured to decompress each of the compressed file blocks by using each of the data dictionaries corresponding to each of the file directories, so as to obtain each decompressed file block; and a merging module 940 configured to merge all the decompressed file blocks to obtain the target file.
In some embodiments of the present application, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each compressed data dictionary is obtained through training.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
Some embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the operations of the method according to any of the above-described methods provided by the above-described embodiments.
Some embodiments of the present application also provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement operations of a method corresponding to any of the above embodiments of the above method provided by the above embodiments.
As shown in fig. 10, some embodiments of the present application provide an electronic device 1000, the electronic device 1000 comprising: memory 1010, processor 1020, and a computer program stored on memory 1010 and executable on processor 1020, wherein processor 1020 reads the program from memory 1010 via bus 1030 and executes the program to implement the method of any of the embodiments described above.
The processor 1020 may process digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, the processor 1020 may be a microprocessor.
Memory 1010 may be used for storing instructions to be executed by processor 1020 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more of the modules described in embodiments of the present application. The processor 1020 of the disclosed embodiments may be configured to execute instructions in the memory 1010 to implement the methods shown above. Memory 1010 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of document processing, comprising:
partitioning the file to be compressed to obtain a plurality of file blocks to be compressed;
confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed;
and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
2. The method of claim 1, wherein the dicing the file to be compressed to obtain a plurality of file blocks to be compressed comprises:
and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
3. The method of claim 1 or 2, wherein prior to said compressing said each file block to be compressed with each data dictionary corresponding to said each file directory, said method further comprises:
and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
4. The method of claim 1 or 2, wherein prior to said compressing said each file block to be compressed with each data dictionary corresponding to said each file directory, said method further comprises:
splitting a plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples;
storing each original file block in the plurality of original file blocks into the corresponding file directory;
training each file directory to obtain each data dictionary corresponding to each file directory;
and compressing each data dictionary to obtain each compressed data dictionary.
5. A method of document processing, comprising:
obtaining a compressed file obtained by the method of any one of claims 1 to 4, wherein the compressed file comprises a plurality of compressed file blocks;
determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs;
decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block;
and merging all the decompressed file blocks to obtain the target file.
6. The method of claim 5, wherein each data dictionary is obtained by decompressing each compressed data dictionary; each compressed data dictionary is obtained through training.
7. An apparatus for processing a document, comprising:
the partitioning module is configured to partition the file to be compressed to obtain a plurality of file blocks to be compressed;
the confirming module is configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed;
and the compression module is configured to compress each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
8. An apparatus for processing a document, comprising:
an acquisition module configured to acquire a compressed file obtained by the method of any one of claims 1 to 4, wherein the compressed file includes a plurality of compressed file blocks;
the determining module is configured to determine each file directory to which each compressed file block belongs in the plurality of compressed file blocks;
the decompression module is configured to decompress each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block;
and the merging module is configured to merge all the decompressed file blocks to obtain the target file.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program when run by a processor performs the method according to any of claims 1-6.
10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the computer program when run by the processor performs the method of any one of claims 1-6.
CN202311316367.XA 2023-10-12 2023-10-12 File processing method and device, storage medium and electronic equipment Pending CN117076388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311316367.XA CN117076388A (en) 2023-10-12 2023-10-12 File processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311316367.XA CN117076388A (en) 2023-10-12 2023-10-12 File processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117076388A true CN117076388A (en) 2023-11-17

Family

ID=88717267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311316367.XA Pending CN117076388A (en) 2023-10-12 2023-10-12 File processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117076388A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110285556A1 (en) * 2010-05-19 2011-11-24 Red Hat, Inc. Parallel Compression for Dictionary-Based Sequential Coders
CN102891999A (en) * 2012-09-26 2013-01-23 南昌大学 Combined image compression/encryption method based on compressed sensing
CN110532235A (en) * 2019-08-06 2019-12-03 苏州浪潮智能科技有限公司 A kind of compressing file, decompression method and device
CN110851409A (en) * 2019-11-06 2020-02-28 南京星环智能科技有限公司 Log compression and decompression method, device and storage medium
CN111767258A (en) * 2020-06-30 2020-10-13 深圳前海微众银行股份有限公司 File compression method, device, equipment and storage medium applied to mass files
CN114328400A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Data processing method and related equipment
CN114449579A (en) * 2020-11-03 2022-05-06 大唐移动通信设备有限公司 Method, device and equipment for data compression
CN115208414A (en) * 2022-09-15 2022-10-18 本原数据(北京)信息技术有限公司 Data compression method, data compression device, computer device and storage medium
CN115774699A (en) * 2023-01-30 2023-03-10 本原数据(北京)信息技术有限公司 Database shared dictionary compression method and device, electronic equipment and storage medium
CN116566396A (en) * 2022-01-28 2023-08-08 华为云计算技术有限公司 Data compression method, device, storage medium, device cluster and program product

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110285556A1 (en) * 2010-05-19 2011-11-24 Red Hat, Inc. Parallel Compression for Dictionary-Based Sequential Coders
CN102891999A (en) * 2012-09-26 2013-01-23 南昌大学 Combined image compression/encryption method based on compressed sensing
CN110532235A (en) * 2019-08-06 2019-12-03 苏州浪潮智能科技有限公司 A kind of compressing file, decompression method and device
CN110851409A (en) * 2019-11-06 2020-02-28 南京星环智能科技有限公司 Log compression and decompression method, device and storage medium
CN111767258A (en) * 2020-06-30 2020-10-13 深圳前海微众银行股份有限公司 File compression method, device, equipment and storage medium applied to mass files
CN114328400A (en) * 2020-09-29 2022-04-12 华为技术有限公司 Data processing method and related equipment
CN114449579A (en) * 2020-11-03 2022-05-06 大唐移动通信设备有限公司 Method, device and equipment for data compression
CN116566396A (en) * 2022-01-28 2023-08-08 华为云计算技术有限公司 Data compression method, device, storage medium, device cluster and program product
CN115208414A (en) * 2022-09-15 2022-10-18 本原数据(北京)信息技术有限公司 Data compression method, data compression device, computer device and storage medium
CN115774699A (en) * 2023-01-30 2023-03-10 本原数据(北京)信息技术有限公司 Database shared dictionary compression method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110378338B (en) Text recognition method and device, electronic equipment and storage medium
US7978922B2 (en) Compressing images in documents
CN112527736B (en) DNA-based data storage method, data recovery method and terminal equipment
US9509862B2 (en) Image processing system, image output device, and image processing method
CN111124470A (en) Automatic optimization method and device for program package based on cloud platform
CN114218175A (en) Resource cross-platform sharing method and device, terminal equipment and storage medium
CN112650881A (en) Monitoring data processing method and device and storage medium
US11429317B2 (en) Method, apparatus and computer program product for storing data
CN110505289B (en) File downloading method and device, computer readable medium and wireless communication equipment
CN117076388A (en) File processing method and device, storage medium and electronic equipment
CN111767417A (en) Application picture management method, device, equipment and storage medium
CN110708355A (en) File uploading method, system, device and readable storage medium
CN112712610B (en) Vehicle diagnosis log processing method and device, terminal equipment and readable storage medium
CN112800183B (en) Content name data processing method and terminal equipment
CN110807300A (en) Image processing method and device, electronic equipment and medium
CN112579357B (en) Snapshot difference obtaining method, device, equipment and storage medium
CN112069771B (en) Method and device for analyzing pictures in PDF (portable document format) file
CN105469433B (en) Picture compression method and equipment thereof
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN112988622A (en) Queue caching method and equipment
CN111049836A (en) Data processing method, electronic device and computer readable storage medium
CN111080728A (en) Map processing method, device, equipment and storage medium
EP3819782B1 (en) Computing device, method of operation and computer program for a computing device
JP2005352561A (en) Database server and database client
CN116719483B (en) Data deduplication method, apparatus, storage device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination