CN117076388A - File processing method and device, storage medium and electronic equipment - Google Patents
File processing method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN117076388A CN117076388A CN202311316367.XA CN202311316367A CN117076388A CN 117076388 A CN117076388 A CN 117076388A CN 202311316367 A CN202311316367 A CN 202311316367A CN 117076388 A CN117076388 A CN 117076388A
- Authority
- CN
- China
- Prior art keywords
- file
- compressed
- data dictionary
- block
- blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title description 3
- 238000000034 method Methods 0.000 claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 49
- 238000007906 compression Methods 0.000 claims abstract description 23
- 230000006835 compression Effects 0.000 claims abstract description 22
- 238000000638 solvent extraction Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 15
- 230000006837 decompression Effects 0.000 claims description 6
- 238000005192 partition Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Some embodiments of the present application provide a method, an apparatus, a storage medium, and an electronic device for file processing, where the method includes: partitioning the file to be compressed to obtain a plurality of file blocks to be compressed; confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block. Some embodiments of the application can realize efficient compression of files and reduce storage space.
Description
Technical Field
The present application relates to the field of file processing technologies, and in particular, to a method and an apparatus for file processing, a storage medium, and an electronic device.
Background
With the continuous popularization of monitoring, the picture pixels collected by the monitoring equipment are higher and higher, so that the requirement on the storage space of the picture files is also higher and higher.
At present, in order to meet the requirement of storing the picture files, the storage space is usually increased by adding storage equipment, but the early investment and the later equipment maintenance cost are higher. Or the compression rate is improved to a certain extent by compressing the picture files in batches, but the occupied memory of the compressed picture is not obviously reduced.
Therefore, how to provide a method for file processing with higher compression rate is a technical problem to be solved.
Disclosure of Invention
The application provides a file processing method, a file processing device, a storage medium and electronic equipment.
In a first aspect, some embodiments of the present application provide a method for file processing, including: partitioning the file to be compressed to obtain a plurality of file blocks to be compressed; confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
According to some embodiments of the application, after a plurality of file blocks to be compressed are obtained by cutting a file to be compressed, the file blocks after the compression are obtained by compressing each data dictionary corresponding to each file block to be compressed. According to the embodiment of the application, the file is processed in a manner of cutting and compressing the corresponding data dictionary, so that the compression rate can be effectively improved, and the occupied space of the compressed file block is reduced.
In some embodiments, the dicing the file to be compressed to obtain a plurality of file blocks to be compressed includes: and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
According to the method and the device, the file to be compressed is segmented to obtain a plurality of file blocks to be compressed by setting the segmentation value, so that data support can be provided for subsequent compression rate improvement.
In some embodiments, before said compressing each file block to be compressed using each data dictionary corresponding to each file directory, the method further comprises: and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
According to some embodiments of the application, each compressed data dictionary is decompressed through the zlib algorithm to obtain a corresponding data dictionary, so that the occupied memory can be reduced by storing the data dictionary.
In some embodiments, before said compressing each file block to be compressed using each data dictionary corresponding to each file directory, the method further comprises: splitting a plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples; storing each original file block in the plurality of original file blocks into the corresponding file directory; training each file directory to obtain each data dictionary corresponding to each file directory; and compressing each data dictionary to obtain each compressed data dictionary.
According to some embodiments of the application, the data dictionary is trained through a plurality of original file samples, so that each compressed data dictionary corresponding to each file directory is obtained, effective support is provided for subsequent file compression, and the occupied memory of the compressed file is effectively reduced.
In a second aspect, some embodiments of the present application provide a method for file processing, including: obtaining a compressed file obtained by any method embodiment provided in the first aspect, where the compressed file includes a plurality of compressed file blocks; determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs; decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block; and merging all the decompressed file blocks to obtain the target file.
Some embodiments of the application determine the attributes of a plurality of compressed file blocks in the compressed file, then obtain a corresponding data dictionary, and decompress and merge the compressed file blocks through the data dictionary to obtain the target file. The compressed file blocks occupy less memory, have higher compression rate and improve decompression efficiency.
In some embodiments, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each of the compressed data dictionaries is trained by a method embodiment as in the first aspect.
According to some embodiments of the application, the data dictionary is obtained through training and stored in the compressed data dictionary form, so that the occupied space is reduced.
In a third aspect, some embodiments of the present application provide an apparatus for file processing, including: the partitioning module is configured to partition the file to be compressed to obtain a plurality of file blocks to be compressed; the confirming module is configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; and the compression module is configured to compress each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
In a fourth aspect, some embodiments of the present application provide an apparatus for file processing, including: an obtaining module, configured to obtain the compressed file obtained by any one of the method embodiments provided in the first aspect, where the compressed file includes a plurality of compressed file blocks; the determining module is configured to determine each file directory to which each compressed file block belongs in the plurality of compressed file blocks; the decompression module is configured to decompress each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block; and the merging module is configured to merge all the decompressed file blocks to obtain the target file.
In a fifth aspect, some embodiments of the application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs a method according to any of the embodiments of the first aspect.
In a sixth aspect, some embodiments of the application provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is capable of implementing a method according to any of the embodiments of the first aspect when executing the program.
In a seventh aspect, some embodiments of the application provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, is adapted to carry out the method according to any of the embodiments of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be construed as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a system diagram of a document process provided by some embodiments of the present application;
FIG. 2 is a flow chart of a method for obtaining a data dictionary according to some embodiments of the present application;
FIG. 3 is a diagram of a training data dictionary provided in some embodiments of the present application;
FIG. 4 is a second flowchart of a method for document processing according to some embodiments of the present application;
FIG. 5 is a diagram illustrating file compression according to some embodiments of the present application;
FIG. 6 is one of the flow charts of the method of file processing provided by some embodiments of the present application;
FIG. 7 is a schematic diagram of file decompression provided in some embodiments of the present application;
FIG. 8 is a second block diagram illustrating an apparatus for document processing according to some embodiments of the present application;
FIG. 9 is one of the block diagrams of the apparatus for file processing provided in some embodiments of the present application;
fig. 10 is a schematic diagram of an electronic device according to some embodiments of the present application.
Detailed Description
The technical solutions of some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
In the related art, a satellite image pickup apparatus generates a large number of high-definition large pictures of the same size every day. The picture is characterized in that the content of the node images does not change much at different times, and the images generated by the high definition device generally occupy a large amount of storage space. Although the capacity of the existing storage device is larger and larger, the frequency of capturing images by the device is high, and the storage space of the storage device can be used up quickly. At present, in order to solve the problem of insufficient storage space, storage equipment is generally added in the prior art, however, the storage equipment occupies a large amount of room space, and in addition, additional hardware expenditure, power consumption and other energy problems are added, so that later hardware maintenance is also a relatively large expenditure. Or, by directly compressing each original picture, the repeated data is not very large because of the large amount of information contained in the picture, and the compression rate is not very high when each picture is compressed. In addition, the format conversion method can be used for storing unimportant image resources by sacrificing the definition of the picture and adopting a format with smaller occupied space, but the method can cause the loss of important information in the picture.
As known from the related art, the prior art has large memory occupation and high cost when storing large pictures.
In view of this, some embodiments of the present application provide a method for file processing, which obtains a plurality of file blocks to be compressed by dicing a file to be compressed. And then obtaining each corresponding data dictionary by confirming the file directory to which each file block to be compressed belongs. And finally, compressing each file block to be compressed through a data dictionary to obtain each compressed file block. According to the method and the device for compressing the files, the files to be compressed can be compressed efficiently, the compression rate is improved, the occupied memory of the compressed files is reduced, the cost is low, and the practicability is high.
The overall composition of a document processing system provided in accordance with some embodiments of the present application is described below by way of example with reference to FIG. 1.
As shown in fig. 1, some embodiments of the present application provide a system for file processing, the system for file processing including: a terminal 100 and a file processing terminal 200. The terminal 100 may acquire a high-definition image file (as a specific example of a file to be compressed) photographed by a satellite. The file processing terminal 200 may acquire the high definition image file by reading or receiving. Then, the file processing end 200 needs to segment the high-definition image file to obtain a plurality of file blocks to be compressed, and determines a file directory to which each file block to be compressed belongs. And then compressing each file block to be compressed through each data dictionary corresponding to the file directory to which each file block to be compressed belongs, so as to obtain each compressed file block. Correspondingly, the file processing end 200 may decompress each compressed file block through each data dictionary to obtain each decompressed file block (each decompressed file block, that is, each file block to be compressed before compression), and finally combine all the decompressed file blocks to obtain the original high-definition image file.
In some embodiments of the present application, the file to be compressed may be a high-definition image shot by the satellite, or may be a Word, PDF document, etc. that occupies a relatively large memory. The embodiments of the present application are not limited thereto.
In some embodiments of the present application, the terminal 100 may be a mobile terminal or a non-portable computer terminal, and embodiments of the present application are not limited herein. In addition, in other embodiments of the present application, if the terminal 100 has the function of the file processing end 200 for blocking and compressing the file to be compressed, the file processing end 200 may not be provided. Specifically, the configuration may be set according to actual situations, and the embodiment of the present application is not limited herein.
It should be noted that, each data dictionary is obtained by training a plurality of original file samples in advance and stored in the form of a compressed file at the file processing end 200. It will be appreciated that, in order to achieve efficient compression of a file to be compressed, a data dictionary needs to be acquired first, and thus, the implementation of the data dictionary acquired by the file processing terminal 200 according to some embodiments of the present application is exemplarily described below with reference to fig. 2.
Referring to fig. 2, fig. 2 is a flowchart of a method for obtaining a data dictionary according to some embodiments of the present application, where the method for obtaining a data dictionary includes:
s210, segmenting the plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples.
For example, in some embodiments of the present application, a certain number of files are selected from the original files as training data. And then cutting all original file samples in the training data according to a slicing method to obtain a plurality of original file blocks. For example, a training architecture diagram shown in fig. 3 includes n original file samples (i.e., an original file a and an original file b in fig. 3. An original file n), and an a file block 1. An a file block n is obtained by splitting a service layer file, and the corresponding original file block is obtained by splitting an original file b file to an original file n. For example, 100 original file samples with the size of 2G are used as training data, and each original file sample is firstly cut according to 1M to obtain an original file block.
S220, storing each original file block in the plurality of original file blocks into the corresponding file directory.
For example, in some embodiments of the present application, the corresponding original file block is placed in the file directory corresponding to the block number, so as to ensure that the small file size in the file directory corresponding to the block number after cutting is consistent. Each file directory is used as a training set. For example, as can be seen from fig. 3, file block 1 of all files corresponds to dictionary 1 (as one for each data dictionaryA specific example), and so on, file blocks n of all files correspond to dictionary n. For example, for every 1M file (as a specific example of every original file block) after cutting, n=1024 is generated after cutting by 1250 blocks2/>1250 file catalogues, each file catalogue stores 100 corresponding original file blocks after large file cutting.
And S230, training each file directory to obtain each data dictionary corresponding to each file directory.
For example, in some embodiments of the present application, the training set for each file directory is trained to generate a corresponding data dictionary. And finally, one file directory corresponds to one data dictionary. For example, cut by 1250 blocks, 1024 is generated2/>1250 data dictionary.
S240, compressing each data dictionary to obtain each compressed data dictionary.
For example, in some embodiments of the present application, zlib compression storage is performed on each data dictionary to save storage space.
It should be noted that, the data dictionary (or referred to as a static dictionary) is generated according to the training set, and is used to establish a mapping relationship between patterns and symbols appearing in the original document sample. The data dictionary is not modified after being generated, so that each data dictionary can be subjected to independent zlib compression storage to save storage space better.
The following is an exemplary description of a specific process of file processing in a file compression process according to some embodiments of the present application with reference to fig. 4.
Referring to fig. 4, fig. 4 is a flowchart of a method for processing a file according to some embodiments of the present application, where the method for processing a file includes:
s410, partitioning the file to be compressed to obtain a plurality of file blocks to be compressed.
For example, in some embodiments of the present application, the file to be compressed is partitioned in a manner of splitting the original file sample in the data dictionary obtained in fig. 2, so as to obtain a file block to be compressed.
In some embodiments of the present application, S410 may include: and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
For example, in some embodiments of the present application, the dicing effect is better, the compression rate is better, and it is important to use zstd compression subsequently, by trial and error. Therefore, the file to be compressed is segmented according to 1M, and then each 1M file is segmented according to a preset value (e.g. 1250), so as to obtain a plurality of file blocks to be compressed. As shown in fig. 5, the file to be compressed is partitioned to obtain file block 1, file block 2.
S420, confirming each file directory to which each file block to be compressed belongs in the file blocks to be compressed.
For example, in some embodiments of the present application, each corresponding compressed data dictionary may be obtained by validating the file directory to which each file block to be compressed belongs. Wherein each compressed data dictionary is pre-trained by the method embodiment shown in fig. 2.
In some embodiments of the present application, before performing S430, the method of file processing may further include: and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
For example, in some embodiments of the present application, since the data dictionary is stored in compressed form, it is necessary to decompress it by zlib algorithm to obtain each data dictionary corresponding to each file directory. As shown in fig. 5, the data dictionary corresponding to each file block, that is, the data dictionary 1 corresponding to the file block 1, the data dictionary 2 corresponding to the file block 2.
And S430, compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
For example, in some embodiments of the present application, each file block to be compressed is compressed by each data dictionary, resulting in a plurality of compressed small file blocks (as a specific example of each compressed file block). And finally deleting the plurality of cut file blocks to be compressed. As shown in fig. 5, file block 1 is compressed by data dictionary 1 to obtain compressed file 1 (i.e., small file block), and so on to obtain compressed file 2.
The following is an exemplary description of the specific procedures for file processing provided by some embodiments of the present application in connection with fig. 6.
Referring to fig. 6, fig. 6 is a flowchart of a method for processing a file in a file decompression process according to some embodiments of the present application, where the method for processing a file includes:
s610, obtaining a compressed file, wherein the compressed file comprises a plurality of compressed file blocks;
for example, in some embodiments of the application, a compressed file resulting from the method embodiment shown in FIG. 4 is obtained. The compressed file includes a plurality of compressed small file blocks, i.e., compressed file 1, compressed file 2. Compressed file N as shown in fig. 7.
S620, determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs.
For example, in some embodiments of the present application, each corresponding compressed data dictionary may be facilitated by validating the file directory to which each small file block belongs. Wherein each compressed data dictionary is pre-trained by the method embodiment shown in fig. 2.
And S630, decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block.
For example, in some embodiments of the present application, each small file block is decompressed by its corresponding data dictionary, respectively, to obtain a corresponding decompressed file block. For example, as shown in fig. 7, the compressed file 1 and the compressed file 2..compressed file N are decompressed by the data dictionary 1 and the data dictionary 2..data dictionary N, respectively, to obtain the file block 1 and the file block 2..file block N.
And S640, merging all the decompressed file blocks to obtain the target file.
For example, in some embodiments of the present application, all decompressed file blocks are combined to obtain the target file. That is, all file blocks are read into one input stream, and then the contents of the input stream are all output into the same file output stream. For example, as shown in fig. 7, file block 1, file block 2.
In some embodiments of the present application, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each compressed data dictionary is obtained through training.
For example, in some embodiments of the application, each data dictionary is stored in compressed form, which is trained in accordance with the method shown in FIG. 2.
Referring to fig. 8, fig. 8 is a block diagram illustrating an apparatus for file processing according to some embodiments of the present application. It should be understood that the apparatus for processing a document corresponds to the above method embodiments, and is capable of performing the steps involved in the above method embodiments, and specific functions of the apparatus for processing a document may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
The apparatus of fig. 8 for file processing includes at least one software functional module that can be stored in a memory in the form of software or firmware or cured in the apparatus for file processing, the apparatus for file processing comprising: the partitioning module 810 is configured to partition a file to be compressed to obtain a plurality of file blocks to be compressed; a confirmation module 820 configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed; the compression module 830 is configured to compress each file block to be compressed by using each data dictionary corresponding to each file directory, so as to obtain each compressed file block.
In some embodiments of the present application, the partitioning module 810 is configured to partition the file to be compressed according to a set partition value, so as to obtain the plurality of file blocks to be compressed.
In some embodiments of the present application, the compression module 830 is configured to decompress each compressed data dictionary using zlib algorithm to obtain the each data dictionary.
In some embodiments of the present application, the apparatus for file processing further includes, prior to the compression module 830: a training module (not shown in the figure) configured to segment a plurality of original file samples to obtain a plurality of original file blocks corresponding to each of the plurality of original file samples; storing each original file block in the plurality of original file blocks into the corresponding file directory; training each file directory to obtain each data dictionary corresponding to each file directory; and compressing each data dictionary to obtain each compressed data dictionary.
Referring to fig. 9, fig. 9 is a block diagram illustrating an apparatus for file processing according to some embodiments of the present application. It should be understood that the apparatus for processing a document corresponds to the above method embodiments, and is capable of performing the steps involved in the above method embodiments, and specific functions of the apparatus for processing a document may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
The apparatus of fig. 9 for file processing includes at least one software functional module that can be stored in a memory in the form of software or firmware or cured in the apparatus for file processing, the apparatus for file processing comprising: an obtaining module 910, configured to obtain a compressed file, where the compressed file includes a plurality of compressed file blocks; a determining module 920 configured to determine each file directory to which each compressed file block of the plurality of compressed file blocks belongs; the decompression module 930 is configured to decompress each of the compressed file blocks by using each of the data dictionaries corresponding to each of the file directories, so as to obtain each decompressed file block; and a merging module 940 configured to merge all the decompressed file blocks to obtain the target file.
In some embodiments of the present application, each of the data dictionaries is obtained by decompressing each of the compressed data dictionaries; each compressed data dictionary is obtained through training.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
Some embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the operations of the method according to any of the above-described methods provided by the above-described embodiments.
Some embodiments of the present application also provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement operations of a method corresponding to any of the above embodiments of the above method provided by the above embodiments.
As shown in fig. 10, some embodiments of the present application provide an electronic device 1000, the electronic device 1000 comprising: memory 1010, processor 1020, and a computer program stored on memory 1010 and executable on processor 1020, wherein processor 1020 reads the program from memory 1010 via bus 1030 and executes the program to implement the method of any of the embodiments described above.
The processor 1020 may process digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, the processor 1020 may be a microprocessor.
Memory 1010 may be used for storing instructions to be executed by processor 1020 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more of the modules described in embodiments of the present application. The processor 1020 of the disclosed embodiments may be configured to execute instructions in the memory 1010 to implement the methods shown above. Memory 1010 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. A method of document processing, comprising:
partitioning the file to be compressed to obtain a plurality of file blocks to be compressed;
confirming each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed;
and compressing each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
2. The method of claim 1, wherein the dicing the file to be compressed to obtain a plurality of file blocks to be compressed comprises:
and cutting the file to be compressed according to the set slicing value to obtain the file blocks to be compressed.
3. The method of claim 1 or 2, wherein prior to said compressing said each file block to be compressed with each data dictionary corresponding to said each file directory, said method further comprises:
and decompressing each compressed data dictionary by utilizing a zlib algorithm to obtain each data dictionary.
4. The method of claim 1 or 2, wherein prior to said compressing said each file block to be compressed with each data dictionary corresponding to said each file directory, said method further comprises:
splitting a plurality of original file samples to obtain a plurality of original file blocks corresponding to each original file sample in the plurality of original file samples;
storing each original file block in the plurality of original file blocks into the corresponding file directory;
training each file directory to obtain each data dictionary corresponding to each file directory;
and compressing each data dictionary to obtain each compressed data dictionary.
5. A method of document processing, comprising:
obtaining a compressed file obtained by the method of any one of claims 1 to 4, wherein the compressed file comprises a plurality of compressed file blocks;
determining each file directory to which each compressed file block in the plurality of compressed file blocks belongs;
decompressing each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block;
and merging all the decompressed file blocks to obtain the target file.
6. The method of claim 5, wherein each data dictionary is obtained by decompressing each compressed data dictionary; each compressed data dictionary is obtained through training.
7. An apparatus for processing a document, comprising:
the partitioning module is configured to partition the file to be compressed to obtain a plurality of file blocks to be compressed;
the confirming module is configured to confirm each file directory to which each file block to be compressed belongs in the plurality of file blocks to be compressed;
and the compression module is configured to compress each file block to be compressed by utilizing each data dictionary corresponding to each file directory to obtain each compressed file block.
8. An apparatus for processing a document, comprising:
an acquisition module configured to acquire a compressed file obtained by the method of any one of claims 1 to 4, wherein the compressed file includes a plurality of compressed file blocks;
the determining module is configured to determine each file directory to which each compressed file block belongs in the plurality of compressed file blocks;
the decompression module is configured to decompress each compressed file block by utilizing each data dictionary corresponding to each file directory to obtain each decompressed file block;
and the merging module is configured to merge all the decompressed file blocks to obtain the target file.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program when run by a processor performs the method according to any of claims 1-6.
10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the computer program when run by the processor performs the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311316367.XA CN117076388A (en) | 2023-10-12 | 2023-10-12 | File processing method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311316367.XA CN117076388A (en) | 2023-10-12 | 2023-10-12 | File processing method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117076388A true CN117076388A (en) | 2023-11-17 |
Family
ID=88717267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311316367.XA Pending CN117076388A (en) | 2023-10-12 | 2023-10-12 | File processing method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117076388A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110285556A1 (en) * | 2010-05-19 | 2011-11-24 | Red Hat, Inc. | Parallel Compression for Dictionary-Based Sequential Coders |
CN102891999A (en) * | 2012-09-26 | 2013-01-23 | 南昌大学 | Combined image compression/encryption method based on compressed sensing |
CN110532235A (en) * | 2019-08-06 | 2019-12-03 | 苏州浪潮智能科技有限公司 | A kind of compressing file, decompression method and device |
CN110851409A (en) * | 2019-11-06 | 2020-02-28 | 南京星环智能科技有限公司 | Log compression and decompression method, device and storage medium |
CN111767258A (en) * | 2020-06-30 | 2020-10-13 | 深圳前海微众银行股份有限公司 | File compression method, device, equipment and storage medium applied to mass files |
CN114328400A (en) * | 2020-09-29 | 2022-04-12 | 华为技术有限公司 | Data processing method and related equipment |
CN114449579A (en) * | 2020-11-03 | 2022-05-06 | 大唐移动通信设备有限公司 | Method, device and equipment for data compression |
CN115208414A (en) * | 2022-09-15 | 2022-10-18 | 本原数据(北京)信息技术有限公司 | Data compression method, data compression device, computer device and storage medium |
CN115774699A (en) * | 2023-01-30 | 2023-03-10 | 本原数据(北京)信息技术有限公司 | Database shared dictionary compression method and device, electronic equipment and storage medium |
CN116566396A (en) * | 2022-01-28 | 2023-08-08 | 华为云计算技术有限公司 | Data compression method, device, storage medium, device cluster and program product |
-
2023
- 2023-10-12 CN CN202311316367.XA patent/CN117076388A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110285556A1 (en) * | 2010-05-19 | 2011-11-24 | Red Hat, Inc. | Parallel Compression for Dictionary-Based Sequential Coders |
CN102891999A (en) * | 2012-09-26 | 2013-01-23 | 南昌大学 | Combined image compression/encryption method based on compressed sensing |
CN110532235A (en) * | 2019-08-06 | 2019-12-03 | 苏州浪潮智能科技有限公司 | A kind of compressing file, decompression method and device |
CN110851409A (en) * | 2019-11-06 | 2020-02-28 | 南京星环智能科技有限公司 | Log compression and decompression method, device and storage medium |
CN111767258A (en) * | 2020-06-30 | 2020-10-13 | 深圳前海微众银行股份有限公司 | File compression method, device, equipment and storage medium applied to mass files |
CN114328400A (en) * | 2020-09-29 | 2022-04-12 | 华为技术有限公司 | Data processing method and related equipment |
CN114449579A (en) * | 2020-11-03 | 2022-05-06 | 大唐移动通信设备有限公司 | Method, device and equipment for data compression |
CN116566396A (en) * | 2022-01-28 | 2023-08-08 | 华为云计算技术有限公司 | Data compression method, device, storage medium, device cluster and program product |
CN115208414A (en) * | 2022-09-15 | 2022-10-18 | 本原数据(北京)信息技术有限公司 | Data compression method, data compression device, computer device and storage medium |
CN115774699A (en) * | 2023-01-30 | 2023-03-10 | 本原数据(北京)信息技术有限公司 | Database shared dictionary compression method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378338B (en) | Text recognition method and device, electronic equipment and storage medium | |
US7978922B2 (en) | Compressing images in documents | |
CN112527736B (en) | DNA-based data storage method, data recovery method and terminal equipment | |
US9509862B2 (en) | Image processing system, image output device, and image processing method | |
CN111124470A (en) | Automatic optimization method and device for program package based on cloud platform | |
CN114218175A (en) | Resource cross-platform sharing method and device, terminal equipment and storage medium | |
CN112650881A (en) | Monitoring data processing method and device and storage medium | |
US11429317B2 (en) | Method, apparatus and computer program product for storing data | |
CN110505289B (en) | File downloading method and device, computer readable medium and wireless communication equipment | |
CN117076388A (en) | File processing method and device, storage medium and electronic equipment | |
CN111767417A (en) | Application picture management method, device, equipment and storage medium | |
CN110708355A (en) | File uploading method, system, device and readable storage medium | |
CN112712610B (en) | Vehicle diagnosis log processing method and device, terminal equipment and readable storage medium | |
CN112800183B (en) | Content name data processing method and terminal equipment | |
CN110807300A (en) | Image processing method and device, electronic equipment and medium | |
CN112579357B (en) | Snapshot difference obtaining method, device, equipment and storage medium | |
CN112069771B (en) | Method and device for analyzing pictures in PDF (portable document format) file | |
CN105469433B (en) | Picture compression method and equipment thereof | |
CN110413603B (en) | Method and device for determining repeated data, electronic equipment and computer storage medium | |
CN112988622A (en) | Queue caching method and equipment | |
CN111049836A (en) | Data processing method, electronic device and computer readable storage medium | |
CN111080728A (en) | Map processing method, device, equipment and storage medium | |
EP3819782B1 (en) | Computing device, method of operation and computer program for a computing device | |
JP2005352561A (en) | Database server and database client | |
CN116719483B (en) | Data deduplication method, apparatus, storage device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |