CN113051238A - Processing method and device - Google Patents

Processing method and device Download PDF

Info

Publication number
CN113051238A
CN113051238A CN202110346716.7A CN202110346716A CN113051238A CN 113051238 A CN113051238 A CN 113051238A CN 202110346716 A CN202110346716 A CN 202110346716A CN 113051238 A CN113051238 A CN 113051238A
Authority
CN
China
Prior art keywords
file
type
signature
data part
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110346716.7A
Other languages
Chinese (zh)
Inventor
陶晓风
邝宇豪
吴伟洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202110346716.7A priority Critical patent/CN113051238A/en
Publication of CN113051238A publication Critical patent/CN113051238A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments

Abstract

The application provides a processing method and a device, and the method comprises the following steps: obtaining a first file set to be compressed, wherein the first file set comprises a plurality of first type files with signatures; carrying out data part duplication removal and/or signature part duplication removal on the signed first type file to obtain a duplicated second file set; generating a configuration file, wherein configuration information of at least a data part and a signature part in a first file set is recorded in the configuration file so as to enable a second file set to be restored into the first file set; and compressing the configuration file and the second file set to obtain a compressed file of the first file set.

Description

Processing method and device
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a processing method and apparatus.
Background
Since a file set includes a plurality of files and has a large amount of data, the file set is often compressed.
And more files of the same data content may exist in the file set. However, when there are many duplicate content in the application file, even if the file is compressed, the data size of the compressed file is relatively large, and the maximum file compression is not achieved.
Disclosure of Invention
The application provides a processing method and a processing device.
The processing method comprises the following steps:
obtaining a first file set to be compressed, wherein the first file set comprises a plurality of first type files with signatures;
carrying out data part duplication removal and/or signature part duplication removal on the first type file with the signature to obtain a second file set after duplication removal;
generating a configuration file, wherein configuration information of at least the data part and the signature part in the first file set is recorded in the configuration file so as to enable the second file set to be restored into the first file set;
and compressing the configuration file and the second file set to obtain a compressed file of the first file set.
In another possible implementation manner, before performing data part deduplication and/or signature part deduplication on the signed first type file to obtain a deduplicated second file set, the method further includes:
and disassembling the data part and the signature part of the first type file.
In one possible implementation manner, the method further includes:
if the data part of the first type file is a compressed data part, decompressing the compressed data part to obtain a decompressed first type file with a signature, and returning to execute the operation of disassembling the data part and the signature part of the first type file aiming at the decompressed first type file;
and if the data part of the first type file does not belong to the compressed data part, finishing the disassembling of the first type file.
In another possible implementation manner, the method further includes:
for a first type file of which the disassembled data part is a compressed data part, determining a file association relation between the first type file and the first type file which is disassembled from the first type file;
determining decompression parameters adopted by the decompressed first type of file;
the configuration file further comprises: and the file association relation and the decompression parameters corresponding to the decompressed first type file can restore the first type file containing the compressed data part corresponding to the decompressed first type file.
In another possible implementation manner, the performing data part deduplication and/or signature part deduplication on the signed first type file to obtain a deduplicated second file set includes:
determining at least one first file group in the plurality of first type files, and performing file duplication removal on the first file group, wherein the first file group comprises first type files with the same data part and signature part;
and/or the presence of a gas in the gas,
determining at least one second file group in the plurality of first type files, and performing data part duplication removal on the first type files in the second file group, wherein the second file group comprises the first type files with the same data part and different signature parts;
and/or the presence of a gas in the gas,
determining at least one third file group in the plurality of first type files, and performing duplication removal on the signed parts of the first type files in the third file group, wherein the third file group comprises the first type files with different data parts and the same signed parts.
In one possible implementation, the first set of files further includes: a plurality of files of a second type without signatures;
the method further comprises the following steps:
performing file deduplication on the plurality of second type files to remove duplicate second type files;
the configuration file further comprises: configuration information of the second type file.
In yet another possible implementation manner, the method further includes:
decompressing the compressed file to obtain the configuration file and the second file set;
and restoring the second file set into the first file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
In another possible implementation manner, the configuration file further records: the file association relation and decompression parameters corresponding to the decompressed first type files;
the restoring the second file set to the first file set according to the configuration information of the data part and the signature part of the first type file in the configuration file includes:
if the data part of the first type file to be restored in the first file set is determined to be a compressed data part according to the configuration file, acquiring the data part and the signature part of each level of the first type file associated with the first type file from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and compressing and assembling the data part and the signature part of each level of the first type file associated with the first type file and the first type file respectively according to the file association relation and the decompression parameter to obtain the restored first type file;
if the data part of the first type file to be restored in the first file set is determined not to belong to the compressed data part according to the configuration file, obtaining the data part and the signature part which form the first type file from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and assembling to obtain the restored first type file;
and forming the first file set by using the plurality of restored files of the first type.
The other processing method comprises the following steps:
obtaining a compressed file;
decompressing the compressed file to obtain a configuration file and an initial file set to be restored, wherein the configuration file records configuration information of a data part and a signature part in a first type file with a signature in a target file set, and the initial file set is obtained by at least carrying out data part duplication removal and/or signature part duplication removal on the first type file in the target file set;
and restoring the initial file set into the target file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
In another possible implementation manner, the configuration file further records: the data part is a file association relation between a first type file of the compressed data part and a first type file which is decomposed from the first type file;
the restoring the initial file set to the target file set according to the configuration information of the data part and the signature part of the first type file in the configuration file includes:
for a first type file with a data part being a compressed data part, acquiring the data part and the signature part of each level of first type file related to the first type file from the initial file set according to configuration information of the data part and the signature part of the first type file in a configuration file, and compressing and assembling the first type file and the data part and the signature part of each level of first type file related to the first type file respectively according to the file association relationship to obtain the first type file;
for a first type file of which the data part does not belong to the compressed data part, acquiring the data part and the signature part which form the first type file from the initial file set according to configuration information of the data part and the signature part of the first type file in a configuration file, and assembling to obtain the first type file;
a target set of files is composed using a plurality of files of the first type.
Wherein, a processing apparatus comprises:
the file obtaining unit is used for obtaining a first file set to be compressed, and the first file set comprises a plurality of first type files with signatures;
the data deduplication unit is used for performing data part deduplication and/or signature part deduplication on the first type file with the signature to obtain a second file set after deduplication;
a configuration generating unit, configured to generate a configuration file, where configuration information of at least the data portion and the signature portion in the first file set is recorded in the configuration file, so as to enable the second file set to be restored to the first file set;
and the file compression unit is used for compressing the configuration file and the second file set to obtain a compressed file of the first file set.
Wherein, still another kind of processing apparatus includes:
a file obtaining unit for obtaining a compressed file;
the file decompression unit is used for decompressing the compressed file to obtain a configuration file and an initial file set to be restored, the configuration file records configuration information of a data part and a signature part in a first type file with a signature in a target file set, and the initial file set is obtained by at least carrying out data part duplication removal and/or signature part duplication removal on the first type file in the target file set;
and the file restoration unit is used for restoring the initial file set into the target file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
In another aspect, the present application further provides an electronic device, including:
a processor and a memory;
wherein the processor is configured to execute the processing method as described in any one of the above;
the memory is used for storing programs needed by the processor to execute operations.
In yet another aspect, the present application further provides a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the processing method according to any one of the above embodiments.
According to the scheme, after the first file set to be compressed is obtained, the data part and/or the signature part of the first type files with the signatures in the first file set are subjected to deduplication, so that the repeated data part and/or signature part existing in the first file set can be removed, and the data quantity of the second file set obtained through deduplication is reduced relative to that of the first file set. Moreover, after the first file set is deduplicated, a configuration file for restoring the second file set to the first file set is generated, and on the basis, after the second file set and the configuration file are compressed, the first file set is not influenced and the data volume of the compressed file can be reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a processing method provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a processing method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a method for disassembling a signed document according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a processing method according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of another exemplary method for disassembling a signed file in a file set according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating the present application of data deduplication of documents obtained after the disassembly of FIG. 5;
FIG. 7 is a portion of data and a portion of a signature remaining after a data portion deduplication of the file set of FIG. 5;
FIG. 8 is a schematic flow chart of another processing method provided in the embodiments of the present application;
fig. 9 is a schematic structural diagram of a processing apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of another processing apparatus according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The scheme of the application is suitable for compression processing and recovery of the file sets at least including the program files and the like of the files with the signatures, so that the file sets can be more effectively compressed on the premise that the compressed files can be effectively recovered to the file state before compression, and the data volume of the compressed file sets is reduced.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a schematic flow chart of a processing method according to an embodiment of the present disclosure, where the method of the present embodiment is applied to an electronic device, and the electronic device may be a mobile phone, a notebook computer, a desktop computer, a workstation, a server, or the like, without limitation.
The processing method of the embodiment may include:
s101, a first file set to be compressed is obtained.
Wherein the first set of files includes a plurality of signed first type files.
It is understood that a signed document refers to a document with a digital signature, and a signed document generally consists of two parts, a signature part and a data part. The digital signature carried in the document can characterize the validity of the data portion in the document, such as proving that the document or the data in the document is not modified. For example, the file to be signed may be a signed executable file.
There may be many possibilities for the first set of files comprising a plurality of signed first type files, which is not limited in this application. For example, the first file set may be a program file of an application program, the program file may include a plurality of signed files, the plurality of files to be signed may be files of an add-in or library files, and the like, which is not limited thereto.
In the present application, for the purpose of distinguishing from subsequently appearing files without signature, the signed files are referred to as first type files and the subsequent files without signature are referred to as second type files.
S102, carrying out data part duplication removal and/or signature part duplication removal on the signed first type file to obtain a duplicated second file set.
In this application, S102 may respectively use the data portion or the signature portion in the first type file as a duplicate removal object that needs to be repeatedly detected, and remove the duplicate of the data portion or the signature portion in the first type file, so as to remove the duplicate of the data portion or the duplicate signature portion existing in the first file set.
In this application, the step S102 may further perform: the method includes the steps of de-duplicating data portions of different first type files in the first set of files, and de-duplicating signature portions of different first type files in the first set of files, thereby de-duplicating the presence of duplicate data portions and the presence of duplicate signature portions in the first set of files.
For example, taking data portion deduplication for a first type file as an example, it is necessary to determine data portions included in each of a plurality of first type files, and remove duplicate data portions in the plurality of first type files, so that there are no data portions belonging to different first type files but being duplicate in a second set of files subjected to data portion deduplication.
Similarly, signature part deduplication is performed on the first type files, and in order to remove the duplicate signature parts existing in the plurality of first type files, the duplicate signature parts which belong to different first type files and are duplicated do not exist in the second file set obtained after deduplication.
It can be understood that one or both of the data portion and the signature portion in the first file set are used as the deduplication objects, so that the situation that the data portion or the signature portion cannot be deduplicated because the files are different but the data portion or the signature portion is the same can be reduced, and the existence of duplicated data in the first file set can be reduced more effectively.
It is understood that, in practical applications, the data portion and the signature portion may be deduplicated as a whole for the first file set to remove the first type file with the same data portion and signature portion. Of course, when the data portion and the signature portion are deduplicated separately for the first file set, it is also possible to actually deduplicate a file set in which the signature portion and the data portion are identical.
S103, generating a configuration file.
The configuration file records configuration information of at least a data part and a signature part in the first file set so as to restore the second file set to the first file set.
The configuration file can determine each first type file contained in the first file set and the data part and the signature part contained in each first type file, so that each first type file contained in the first file set can be recovered by using the second file set after the deduplication.
For example, in one possible case, the configuration file records identification information of each of the first-type files included in the first file set, identification information of the data portion and identification information of the signature portion included in each of the first-type files.
The identification information of the first type file may be a file name of the first type file or other information capable of uniquely identifying the first type file. Similarly, the identification information of the data portion may be a name or a unique number of the data portion, and the like, and the identification information of the signature portion may be a name or a unique number of the signature portion, and the like.
In this case, according to the information recorded in the configuration file, each first type file included in the first file set may be determined, and for each first type file in the first file set, based on the identification information of the data portion and the identification information of the signature portion in the first type file, the data portion and the signature portion belonging to the first type file may be determined from the second file set, so as to reconstruct each first type file in the first file set, and thus restore the first file set.
It can be understood that each first type file in the first file set has its own file address interval, and in order to accurately restore the first file set based on the second file set, the configuration file may further include: the address interval of the data part and the address interval of the signature part in each first type file. On the basis, the address interval where each first type file in the first file set is located can be reflected through the configuration file, and therefore, the address interval of the restored first type file in the first file set is determined based on the address interval of the data part and the address interval of the signature part in the first type file recorded in the configuration file, so that the position where the first type file is located in the first file set is recovered.
In yet another possible implementation manner, the configuration file may further record identification information of at least one first type file in which the data portion and/or the signature portion in the first file set are deduplicated and the signature portion and/or the data portion in the at least one first type file that is deduplicated. On the basis of the configuration file, it can be determined which first type files have one or two of the data part and the signature part removed, so that the removed data part and/or signature part of the first type files can be recovered.
S104, compressing the configuration file and the second file set to obtain a compressed file of the first file set.
The configuration file and the second file set are compressed together, so that the compressed file contains information capable of recovering the first file set, and the first file set can be recovered based on the second file set and the configuration file.
The inventor of the present application found through research that: in the process of compressing the file set, each file in the file set can be used as a duplicate removal object to perform file duplicate removal, so that duplicate files can be removed. However, in the file deduplication method, only different files are completely identical as a whole are regarded as duplicate files. For a file with a signature, the situation that the data parts are completely the same and only the file is different due to the difference of the signature parts easily occurs, so that the repeated data parts between different files cannot be deduplicated. Similarly, the existence of duplicate signature portions between different files cannot be deduplicated. Therefore, the mode of removing the duplicate of the whole file is adopted, and the duplicate data in the file set cannot be removed to the maximum extent.
In the scheme of the application, the entire file is not taken as the duplication elimination object, but any one of the data part and the signature part in the first type file set is taken as the duplication elimination object, so that the duplicated data part and the signature part exist in the first file set.
In the embodiment of the present application, there are many possible ways to compress the second file set and the configuration file, which are not limited in this application.
As can be seen from the above, according to the present application, after the first file set to be compressed is obtained, the data portion and/or the signature portion of the first type file with the signature in the first file set are deduplicated, so that the duplicated data portion and/or signature portion in the first file set can be removed, and the data amount of the second file set obtained through deduplication is reduced relative to the first file set. Moreover, after the first file set is deduplicated, a configuration file for restoring the second file set to the first file set is generated, and on the basis, after the second file set and the configuration file are compressed, the first file set is not influenced and the data volume of the compressed file can be reduced.
It is understood that, in the present application, when performing data portion and/or signature portion deduplication on first type files in the first file set, it may be determined which first type files have the same data portion or the same signature portion, or both the signature portion and the data portion.
Specifically, the deduplication of the data part and/or the signature part of the first type file in the first file set may include:
determining at least one first file group in a plurality of first type files, and performing file duplication removal on the first file group, wherein the first file group comprises the first type files with the same data parts and signature parts;
and/or the presence of a gas in the gas,
determining at least one second file group in the plurality of first type files, and performing data part duplication removal on the first type files in the second file group, wherein the second file group comprises the first type files with the same data part and different signature parts;
and/or the presence of a gas in the gas,
determining at least one third file group in the plurality of first type files, and performing duplication elimination on the signed parts of the first type files in the third file group, wherein the third file group comprises the first type files with different data parts and the same signed parts.
Wherein the different types of file groups and the first type of files contained in different file groups of the same type are different. For example, different first file groups contain different first type files, and the first file group is different from the first type files contained in any one of the second file group and the third file group.
In a possible implementation manner, in order to be more convenient for comparing the existence of repeated data portions or signature portions in the respective first-type files, the application may further split the data portions and signature portions of the first-type files for each first-type file before the first file set is deduplicated.
The step of disassembling the file refers to analyzing and determining each component of the file. For example, the data portion and the signature portion may be parsed from the first type file based on their respective constituent features.
It will be appreciated that some of the first type files in the first set of files may be compressed files, in which case the first type files need to be decompressed before the decompressed files are disassembled. When the first type file is a compressed file, the data portion in the first type file is a compressed data portion that has been compressed, and therefore, the compressed data portion needs to be decompressed. Meanwhile, for the file with the signature which is decomposed, the file still needs to be decomposed continuously, so that repeated data can be removed to the maximum extent during deduplication.
The following description is made in connection with one possible implementation. As shown in fig. 2, which shows another schematic flow chart of a processing method provided in the embodiment of the present application, the embodiment may include:
s201, a first file set to be compressed is obtained.
Wherein the first set of files includes a plurality of signed first type files.
S202, for each first type file, a data part and a signature part of the first type file are disassembled.
S203, detecting whether the data part of the first type file is a compressed data part, if so, executing the step S204; if not, the parsing of the first type file is ended, and step S205 is executed.
Wherein, the compressed data part refers to a compressed data part.
S204, decompressing the compressed data part of the first type file to obtain the first type file with the signature, which is decompressed by the compressed data part, and returning to the step S202 for the decompressed first type file.
It will be appreciated that if the data portions of the first type of file belong to compressed data portions, then after decompression of the first compressed data portion, at least one signed first type of data may still be decompressed. For decompressing the first type file with signature, further parsing is still needed to analyze whether the data portion in the decompressed first type file is duplicated with other first type files.
It can be seen that, if the data portion of the first-type file is a compressed data portion, one or more iterations of decompression and parsing may be performed until the data portions of the first-type files at different levels included in the first-type file are parsed.
For ease of understanding, the following is a signed document: the signature file M is an example, and a data portion in the signature file is assumed to be a compressed data portion, which will be described below with reference to fig. 3.
The process of layer-by-layer parsing and decompression of the signature file M is shown from left to right in fig. 3.
As shown in fig. 3, the signature file M is disassembled into a signature part of the signature file M and a compressed data part of the signature file M.
Wherein, the compressed data part of the signature file M can be decompressed to obtain two files with signatures, which are respectively: signature file M1 and signature file M2.
On the basis, the signature file M1 and the signature file M2 need to be disassembled respectively.
As shown in FIG. 3, the signature file M1 disassembles the data part of the signature file M1 and the signature part of the signature file M1, and assuming that the data part of the signature file M1 does not belong to the compressed data part, the disassembly of the signature file M1 is completed.
The signature file M2 disassembles the data portion of the signature file M2 and the signature portion of the signature file M2, assuming that the data portion of the signature file M2 still belongs to the compressed data portion, therefore, the data portion of the signature file M2 needs to be decompressed. As in FIG. 3, the data portion of signature file M2 is parsed out of signature file M21.
For the signature file M21, it is still necessary to disassemble, as shown in fig. 3, the signature file M21 disassembles the data part of the signature file M21 and the not-yet-signed part of the signature file M21, and assuming that the data part of the signature file M21 does not belong to the compressed data part, the disassembly of the signature file M21 can be finished. Meanwhile, as all levels of signature files under the signature file M are disassembled, all the disassembling for the signature file M is finished.
S205, if all the first type files in the first file set are disassembled, carrying out data part duplication removal and/or signature part duplication removal on a plurality of first type files determined from the first file set to obtain a second file set after duplication removal.
It is understood that the disassembled first-type files and the first-type files originally existing in the first file set can be obtained through the previous steps, and therefore, one or both of the data portion and the signature portion need to be deduplicated in combination with the data portion and the signature portion included in each first-type file.
The detailed deduplication process is similar to the foregoing embodiment, and is not described again.
For ease of understanding, it is assumed that the first set of files includes a signature file a and a signature file M mentioned in the example of fig. 3 above, and that the signature file a is broken down into a data portion (not belonging to the compressed data portion) of the signature file a and a signature portion of the signature file a.
And the signature file M is finally obtained through multi-layer decompression and disassembly: the signature part of the signature file M1, the data part of the signature file M1, the signature part of the signature file M2, the data part of the signature file M21 below the signature file M2, and the signature part of the signature file M21.
On this basis, when the data part is deduplicated, it is necessary to compare whether there are duplicate data parts in the data part of the signature file M1, the data part of the signature file M21, and the data part of the signature file a, and if the signature file a is duplicated with the data part of the signature file M21, the data part of the signature file a or the signature file M21 may be removed.
Similarly, for the deduplication of the signature part, it may be compared whether there is a duplicate signature part of the signature file a, the signature part of the signature file M1, the signature part of the signature file M2, and the signature part of the signature file M1, and if there is a duplicate signature part, the duplicate signature part is deduplicated.
In one possible implementation, the signature part of any one of the first type files may include a signature certificate public key and a signature result part.
The signature result part is obtained by signing the data part in the first type file by using the public key of the signature certificate, so that the signature result part is completely the same only if the data part is completely the same and the public key of the signature certificate is completely the same. And two first-type files having partially the same signature result are actually the same first-type files. The duplicate file may be removed by using file deduplication for the same first type file, and of course, the duplicate first type file may be removed when the data portion and the signature portion are the same and the duplicate data portion and signature portion are removed.
In order to repeat the operation of file deduplication when the signature part is deduplicated, the public key of the signature certificate of the first type file may be deduplicated when the signature part of the first type file is deduplicated separately in the present application.
S206, for the first type file of which the disassembled data part is the compressed data part, determining the file association relationship between the first type file and the first type file which is decompressed from the first type file, and decompressing parameters adopted by the decompressed first type file.
The file association relationship can ensure the relationship among the first type files of each layer disassembled from the first type files.
As still explained in the foregoing example in fig. 3, as can be seen from the parsing process of the signature file M in fig. 3, the file association relationship between the signature files at least includes:
the compressed data portion of the signature file M includes: signature file M1 and signature file M2;
the compressed data portion of the signature file M2 includes: signature file M21.
Wherein the decompression parameters used for decompressing the first type file are decompression parameters used for decompressing the compressed data portion containing the first type file.
As still illustrated in the example of fig. 3, it is assumed that the decompression parameter used by the compressed data portion of the signature file M may be decompression parameter 1, i.e., the decompression parameters of the signature file M1 and the signature file M2 are decompression parameter 1. And the decompression parameter for decompressing the compressed data portion of the signature file M2 is decompression parameter 2.
It is understood that this step 206 is an optional step, and that this step S206 may not be executed in the case that only the respective data portions or signature portions in the file set need to be determined, or the relationship between the files does not need to be concerned.
And S207, generating a configuration file.
Configuration information of at least the data part and the signature part in the first file set is recorded so that the second file set can be restored to the first file set.
Optionally, on the premise that the file association relationship and the decompression parameter are determined in step S206, the configuration file further includes: the file association relationship determined above and the decompression parameters corresponding to the decompressed first type file enable the first type file including the compressed data portion corresponding to the decompressed first type file to be restored.
S208, the configuration file and the second file set are compressed to obtain a compressed file of the first file set.
In a possible implementation manner, it is considered that each first type file in the first file set has been disassembled into a signature part and a data part, and therefore, after the signature part and/or the data part of each first type file in the first file set is deduplicated, the deduplicated signature part and data part may be spliced according to the signature part and/or data part remaining in the first type file remaining after deduplication in the first file set, so as to obtain the second file set.
It can be understood that, in this embodiment, before deduplication of a data portion or a signature portion is performed on a first type of file with a signature in a first file set, the data portion and the signature portion in the first type of file are disassembled, and when the disassembled data portion is a compressed data portion, the compressed data portion is disassembled after decompression, so that each data portion and signature portion included in the first file set can be determined more comprehensively, and further, it is possible to reduce that the partial data portion or signature portion cannot be deduplicated due to being in a compressed form, further improve the comprehensiveness of deduplication of the first file set, and enable a second file set obtained by deduplication to be compressed and called a compressed file with a smaller data size.
It will be appreciated that for signed files, some of the data portions in the class of files may also be divided into multiple portions. For example, in one possible scenario, the data portion of the first type of file may include: the data verification system comprises a data body, signature description information and initial data verification information used for verifying the data body and the signature description information. In this case, since the signature description information is related to the signature part, even in the case where the data body is the same for different signature parts of the first type file, the data part of the first type file as a whole is regarded as different content because the signature parts are different.
In order to further remove the repeated data, the data verification information of the data main body may be calculated based on the data main body included in the data part in the first type file, and the data main body and the calculated data verification information are determined as the modified data part in the first type file.
On the basis, when the data part in the file set is deduplicated, the modified data part in each first type file can be deduplicated to remove the duplicated modified data part in the file set.
In order to facilitate understanding of the scheme of the present application, data partial deduplication is performed on the first type files in the first file set, and a file set in which the file set is an application program is taken as an example for description.
Referring to fig. 4, which shows another schematic flow chart of a document processing method according to the present application, the method of the present embodiment may include:
s401, a first file set of the application program to be compressed is obtained.
The first file set comprises a plurality of signed first type files in the application program.
As shown in FIG. 5, the first set of files for the application may include signed File A, File B, and File D.
S402, for each first type file, disassembling a data part, a signature public key part and a signature result part of the first type file.
In this embodiment, the description is made taking as an example that the signature part includes a public signature key part and a signature result part.
S403, detecting whether the data part of the first type file is a compressed data part, if so, executing the step S404; if not, the parsing of the first type file is ended, and step S405 is executed.
Wherein, the compressed data part refers to a compressed data part.
S404, decompressing the compressed data part of the first type file to obtain the first type file with signature decompressed by the compressed data part, and returning to the step S402 for the decompressed first type file.
As shown in fig. 5, the signature file a with a signature in each first file set may be disassembled into a data part Aa of the signature file a, a signature certificate public key part Aa of the signature file a, and a signature result part Aa. And if the data part Aa of the signature file A does not belong to the compressed data part, the disassembling process of the signature file A is finished.
Similar to the disassembly of the signature file a, the signature file B is disassembled into a data part Bb, a signature certificate public key part Bb, and a signature result part Bb.
And the signature file D is disassembled into a data part Dd, a signature certificate public key part Dd and a signature result part Dd. Wherein the data portion Dd belongs to compressed data, and therefore, the data portion Dd is decompressed to obtain the signature file DA and the signature file E. Wherein, the data part Aa, the signature certificate part DA and the signature result part DA are disassembled from the signature file DA.
And the signature file E is disassembled into a data part Ee, a signature certificate public key part Aa, and a signature result part Ee.
And continuously decompressing the data part Ee to obtain a signature file B, and disassembling the signature file B to obtain a data part Bb, a signature certificate public key part Bb and a signature result part Bb.
S405, if all the first type files in the first file set are disassembled, performing data partial deduplication on the determined first type files, and performing signature certificate public key partial deduplication on the first type files.
S406, splicing the rest data part and the signature part in the first file set after the duplication into a file to obtain a second file set.
As shown in fig. 5, if the data part Aa of the signature file D obtained by one or more layers of parsing is overlapped with the data part Aa of the signature file a directly obtained from the first file set, one data part Aa can be deleted, and as shown in fig. 6, the data part Aa of the signature file DB is removed.
Similarly, the signature file B obtained by disassembling and decompressing the signature file E is the same as the data signature part and the public key part of the signature certificate in the above signature file B, so that one signature file B can be removed, and the data part and the signature part of the signature file B obtained by disassembling the signature file E are removed in fig. 6.
Meanwhile, the signature certificate public key part Aa disassembled from the signature file E is also removed from the signature certificate public key part Aa in the above signature file a, and therefore, one signature certificate public key Aa needs to be removed. As shown in fig. 6, the signature certificate public key portion Aa in the signature file E is removed.
After splicing the remaining parts of the signature files in fig. 6 obtained after the deduplication, the content contained in the obtained second file set is as shown in fig. 7.
S407, for the first type file of which the disassembled data part is the compressed data part, determining the file association relationship between the first type file and the first type file which is decompressed from the first type file, and decompressing parameters adopted by the decompressed first type file.
It is understood that the file association and the decompression parameters may be obtained during the process of disassembling each first type file in the first set of files and decompressing the compressed data portion.
As shown in fig. 5, the file association relationship may include: the data compression part disassembled from the signature file D comprises a signature file DA and a signature file E; and the data compression portion of signature file E may include signature file B.
The decompression parameters may be parameters used for decompressing the data compression parts of the signature files, where the decompression parameters of the data compression parts of different signature files may be the same or different, and are not limited thereto.
S408, generating a configuration file.
In the configuration file, identification information of each first type file included in the first file set; identification information and address intervals corresponding to the data part, the signature certificate public key part and the signature result part contained in each first type file respectively; file association relation; and compression parameters.
For example, in fig. 5 and 6, the file names of the signature file a, the signature file B, and the signature file D, the included data parts or the components of the data parts, and the identifications of the public key part of the signature certificate and the signature result part need to be recorded in the configuration file, where the components of the data parts may be embodied by the file association relationship.
Meanwhile, for each signature file of fig. 5 and 6, the address space of each component in the signature file may also be recorded. For example, taking the signature file a as an example, the signature file a includes a data part Aa, a signature certificate public key part Aa and a signature result part Aa, and based on this, the information of the address space of each component part of the signature file a that needs to be recorded in the configuration file is as follows:
{
"signature file a": {
The data part of the signature file A is 0-10000 ",
the signature file a signs the certificate public key portion "10000-" 10500 ",
signature file a signature result portion "10500-,
}
·······
}。
of course, the configuration information corresponding to other signature files a may also be recorded in the above manner, which is not described herein again.
S409, compressing the configuration file and the second file set to obtain a compressed file of the application program.
It will be appreciated that in the above embodiments of the present application, a plurality of files of the second type without signatures may also be included in the first set of files.
If the first file set comprises a plurality of second-type files without signatures, file deduplication can be performed on the plurality of second-type files to remove duplicate second-type files in the first file set, so that the obtained second file set comprises at least one second-type file subjected to deduplication.
On this basis, the configuration file can also record the configuration information of the second type file. And restoring each second type file contained in the first file set based on the configuration information of the second type file and the second file set.
It will be appreciated that the above embodiments of the processing method are described in terms of de-duplicating and compressing the first set of files.
On the basis of the above embodiment, if the first file set is restored based on the compressed file after the compressed file of the first file set is obtained, the configuration file and the second file set may be firstly decompressed from the compressed file, and then the configuration file is combined to restore the second file set, so as to obtain the first file set. The process of restoring the first file set from the second file set is the reverse process of carrying out de-duplication on the first file set to obtain the second file set.
For example, in one possible implementation, the compressed file may be decompressed to obtain the configuration file and the second set of files. Accordingly, the second set of files may be restored to the first set of files according to the configuration information of the data portion and the signature portion of the first type of files in the configuration file.
In a possible implementation manner, the configuration file further records: the data part is a file association relation and a decompression parameter between a first type file of the compressed data part and a first type file decompressed from the first type file.
If the data part of the first type file to be restored in the first file set is determined to be the compressed data part according to the configuration file, the data part and the signature part which form the first type file are obtained from the second file set and assembled according to the configuration information of the data part and the signature part of the first type file in the configuration file, and the restored first type file is obtained. Reference may be made in detail to the foregoing description, which is not repeated herein.
If the data part of the first type file to be recovered in the first file set is determined not to belong to the compressed data part according to the configuration file, the data part and the signature part of each level of the first type file related to the first type file are obtained from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and the data part and the signature part of each level of the first type file related to the first type file and the first type file are compressed and assembled respectively according to the file association relation and the decompression parameter, so that the restored first type file is obtained.
And forming a first file set by utilizing the plurality of restored first type files based on the configuration files.
The following describes a process of restoring the first file set from the compressed file obtained based on the foregoing embodiment with reference to a flowchart, as shown in fig. 8, which shows a flowchart of another processing method of the present application, where the method of this embodiment may include:
s801, obtaining a compressed file.
The compressed file may be a compressed file of the first set of files generated in the manner mentioned in any of the previous embodiments.
S802, decompressing the compressed file to obtain a configuration file and an initial file set to be restored.
The configuration file records configuration information of a data part and a signature part in a first type file with a signature in a target file set.
The initial file set is obtained by at least carrying out data part duplication removal and/or signature part duplication removal on the first type files in the target file set. The initial file set is equivalent to the second file set mentioned in the foregoing embodiment, and specific reference may be made to the related description of the foregoing embodiment, which is not described herein again.
And S803, restoring the initial file set into a target file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
It can be understood that, because the configuration file records the configuration information of the data portion and the signature portion in the first type file with the signature in the target file set, the data portion and the signature portion included in each first type file in the target file set can be determined according to the configuration file, so as to reconstruct each first type file in the target file set, and thus restore the target file set.
Wherein the target file set corresponds to the first file set mentioned in the previous embodiment.
For example, the configuration information includes at least identification information of the data portion and the signature portion in the first type file. Because the original file set comprises the first type files after the duplication removal, the original file set comprises a data part and a signature part which are contained in each first type file left after the data part and/or the signature part of the target file set are duplicated. And the removed data portions and signature portions in the target set of files may be found from the remaining data portions and signature portions in the initial set of files.
Therefore, in the case of the first-type file included in the target file set indicated in the configuration file and the identification information of the data part and the signature part in each first-type file, for each first-type file, the data part and the signature part constituting the first-type file can be obtained from the initial file set, so that the first-type file can be restored. Accordingly, the target file set can be formed based on the restored files of the first types.
It is understood that, if the configuration file records the address intervals of the respective first type files and the data portions and signature portions in the first type files, when restoring each first type file, the data portions and signature portions may be assembled according to the respective address areas of the data portions and signature portions in the first type files to restore the first type files. Accordingly, after each first-type file is restored, the target file set may be restored by combining the plurality of first-type files according to the address interval of each first-type file.
Of course, the above is only an example of one reduction method, and in practical applications, there may be other reduction methods, which are not limited to this.
In a possible implementation manner, the configuration file further records: the data part is a file association relation between a first type file of the compressed data part and a first type file which is decomposed from the first type file.
If the data part of the first type file to be restored in the target file set is determined to belong to the compressed data part according to the configuration file, the data part and the signature part which form the first type file are obtained from the initial file set according to the configuration information of the data part and the signature part of the first type file in the configuration file and are assembled to obtain the first type file. Reference may be made in detail to the foregoing description, which is not repeated herein.
If the data part of the first type file to be restored in the target file set is determined not to belong to the compressed data part according to the configuration file, the data part and the signature part of each level of the first type file related to the first type file are obtained from the initial file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and the data part and the signature part of each level of the first type file related to the first type file and the first type file are compressed and assembled respectively according to the file association relation, so that the restored first type file is obtained.
And forming a target file set by utilizing the plurality of restored first type files based on the configuration file.
The application also provides a processing device corresponding to the processing method.
As shown in fig. 9, which shows a schematic structural diagram of a processing apparatus of the present application, the apparatus of the present embodiment may include:
a file obtaining unit 901, configured to obtain a first file set to be compressed, where the first file set includes a plurality of first type files with signatures;
a data deduplication unit 902, configured to perform data partial deduplication and/or signature partial deduplication on the signed first type file, to obtain a deduplicated second file set;
a configuration generating unit 903, configured to generate a configuration file, where configuration information of at least the data portion and the signature portion in the first file set is recorded in the configuration file, so as to enable the second file set to be restored to the first file set;
a file compressing unit 904, configured to compress the configuration file and the second file set to obtain a compressed file of the first file set.
In one possible implementation manner, the method further includes:
and the file disassembling unit is used for disassembling the data part and the signature part of the first type file before the data part and/or signature part of the first type file with the signature are/is subjected to data part and/or signature part duplication elimination by the data duplication eliminating unit.
In another possible implementation manner, the method further includes:
the file decompressing unit is used for decompressing the compressed data part to obtain a decompressed first type file with a signature after the data part is disassembled by the file disassembling unit, and returning to execute the operation of the disassembling unit aiming at the decompressed first type file; and if the data part of the first type file does not belong to the compressed data part, finishing the disassembling of the first type file.
In another possible implementation manner, the method further includes:
the relationship determination unit is used for determining the file association relationship between the first type file and the first type file which is decompressed from the first type file for the first type file of which the decomposed data part is the compressed data part;
the parameter determining unit is used for determining decompression parameters adopted by the decompressed first type files;
the configuration file further comprises: and the file association relation and the decompression parameters corresponding to the decompressed first type file can restore the first type file containing the compressed data part corresponding to the decompressed first type file.
In yet another possible approach, a data deduplication unit includes:
a first data deduplication unit, configured to determine at least one first file group in the multiple first-type files, and perform file deduplication on the first file group, where the first file group includes first-type files with the same data portion and signature portion;
and/or the presence of a gas in the gas,
a second data deduplication unit, configured to determine at least one second file group in the multiple first-type files, perform deduplication on data portions of the first-type files in the second file group, where the second file group includes first-type files with the same data portion but different signature portions;
and/or the presence of a gas in the gas,
and the third data deduplication unit is used for determining at least one third file group in the plurality of first type files, and performing deduplication of the signature part on the first type files in the third file group, wherein the third file group comprises the first type files with different data parts and the same signature part.
In another possible implementation manner, the first set of files obtained by the file obtaining unit further includes: a plurality of files of a second type without signatures;
the device also includes:
the file deduplication unit is used for performing file deduplication on the plurality of second type files so as to remove duplicated second type files;
the configuration file generated by the configuration generating unit further comprises: configuration information of the second type file.
In yet another possible implementation manner, the apparatus further includes:
the file decompression unit is used for decompressing the compressed file to obtain a configuration file and a second file set to be restored;
and the file restoration unit is used for restoring the second file set into the first file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
In an alternative, the configuration file further records: under the condition of file association relation and decompression parameters between a first type file with a data part being a compressed data part and a first type file decompressed from the first type file, the file restoring unit comprises:
a first file restoration unit, configured to, if it is determined according to the configuration file that the data part of the first type file to be restored in the first file set is a compressed data part, obtain, from a second file set, the data part and the signature part of each level of the first type file associated with the first type file according to configuration information of the data part and the signature part of the first type file in the configuration file, and compress and assemble the data part and the signature part of each level of the first type file associated with the first type file and the first type file, respectively, according to the file association relationship and the decompression parameter, to obtain the restored first type file;
the second file restoration unit is used for obtaining and assembling the data part and the signature part which form the first type file from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file to obtain the restored first type file if the data part of the first type file to be restored in the first file set is determined not to belong to the compressed data part according to the configuration file;
and the file combination unit is used for forming a first file set by utilizing the restored files of the first type.
In another aspect, the present application further provides another processing apparatus corresponding to the processing method for decompressing a compressed file in the present application. As shown in fig. 10, it shows a schematic structural diagram of another processing device, which includes:
a file obtaining unit 1001 for obtaining a compressed file;
a file decompressing unit 1002, configured to decompress the compressed file to obtain a configuration file and an initial file set to be restored, where the configuration file records configuration information of a data portion and a signature portion in a first type file with a signature in a target file set, and the initial file set is obtained by at least performing data portion deduplication and/or signature portion deduplication on the first type file in the target file set;
a file restoring unit 1003, configured to restore the initial file set to the target file set according to configuration information of the data portion and the signature portion of the first type file in the configuration file.
In a possible implementation manner, the configuration file further records: under the condition of file association relation and decompression parameters between a first type file with a data part being a compressed data part and a first type file decompressed from the first type file, the file restoring unit comprises:
a first file restoration unit, configured to, if it is determined according to the configuration file that the data part of the first type file to be restored in the target file set is a compressed data part, obtain, from the initial file set, the data part and the signature part of each level of the first type file associated with the first type file according to configuration information of the data part and the signature part of the first type file in the configuration file, and compress and assemble, according to the file association relationship and the decompression parameter, the data part and the signature part of each level of the first type file associated with the first type file, respectively, so as to obtain the restored first type file;
a second file restoration unit, configured to, if it is determined, according to the configuration file, that the data portion of the first type file to be restored in the target file set does not belong to the compressed data portion, obtain, according to configuration information of the data portion and the signature portion of the first type file in the configuration file, and assemble the data portion and the signature portion that constitute the first type file from the initial file set, so as to obtain the restored first type file;
and the file combination unit is used for forming the target file set by utilizing the restored files of the first type.
In another aspect, the present application further provides an electronic device, as shown in fig. 11, which shows a schematic structural diagram of the electronic device according to the present application.
The electronic device comprises at least a processor 1101 and a memory 1102.
Wherein the processor 1101 is configured to execute the processing method in any one of the above embodiments.
The memory 1102 is used to store programs needed for the processor to perform operations.
It will be appreciated that the electronic device may also include an input device 1103, an output device 1104, a communication bus 1105, and the like. Of course, the electronic device may have more or less components than those shown in fig. 11, which is not limited thereto.
In another aspect, the present application further provides a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the processing method according to any one of the above embodiments.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. Meanwhile, the features described in the embodiments of the present specification may be replaced or combined with each other, so that those skilled in the art can implement or use the present application. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of processing, comprising:
obtaining a first file set to be compressed, wherein the first file set comprises a plurality of first type files with signatures;
carrying out data part duplication removal and/or signature part duplication removal on the first type file with the signature to obtain a second file set after duplication removal;
generating a configuration file, wherein configuration information of at least the data part and the signature part in the first file set is recorded in the configuration file so as to enable the second file set to be restored into the first file set;
and compressing the configuration file and the second file set to obtain a compressed file of the first file set.
2. The method according to claim 1, further comprising, before said performing data part deduplication and/or signature part deduplication on the signed first type file to obtain a deduplicated second set of files:
and disassembling the data part and the signature part of the first type file.
3. The method of claim 2, further comprising:
if the data part of the first type file is a compressed data part, decompressing the compressed data part to obtain a decompressed first type file with a signature, and returning to execute the operation of disassembling the data part and the signature part of the first type file aiming at the decompressed first type file;
and if the data part of the first type file does not belong to the compressed data part, finishing the disassembling of the first type file.
4. The method of claim 3, further comprising:
for a first type file of which the disassembled data part is a compressed data part, determining a file association relation between the first type file and the first type file which is disassembled from the first type file;
determining decompression parameters adopted by the decompressed first type of file;
the configuration file further comprises: and the file association relation and the decompression parameters corresponding to the decompressed first type file can restore the first type file containing the compressed data part corresponding to the decompressed first type file.
5. The method according to claim 1, wherein the performing data part deduplication and/or signature part deduplication on the signed first type file to obtain a second deduplicated file set includes:
determining at least one first file group in the plurality of first type files, and performing file duplication removal on the first file group, wherein the first file group comprises first type files with the same data part and signature part;
and/or the presence of a gas in the gas,
determining at least one second file group in the plurality of first type files, and performing data part duplication removal on the first type files in the second file group, wherein the second file group comprises the first type files with the same data part and different signature parts;
and/or the presence of a gas in the gas,
determining at least one third file group in the plurality of first type files, and performing duplication removal on the signed parts of the first type files in the third file group, wherein the third file group comprises the first type files with different data parts and the same signed parts.
6. The method of claim 1, the first set of files further comprising: a plurality of files of a second type without signatures;
the method further comprises the following steps:
performing file deduplication on the plurality of second type files to remove duplicate second type files;
the configuration file further comprises: configuration information of the second type file.
7. The method of any of claims 1 to 6, further comprising:
decompressing the compressed file to obtain the configuration file and the second file set;
and restoring the second file set into the first file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
8. The method of claim 7, wherein the configuration file further records: the file association relation and decompression parameters corresponding to the decompressed first type files;
the restoring the second file set to the first file set according to the configuration information of the data part and the signature part of the first type file in the configuration file includes:
if the data part of the first type file to be restored in the first file set is determined to be a compressed data part according to the configuration file, acquiring the data part and the signature part of each level of the first type file associated with the first type file from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and compressing and assembling the data part and the signature part of each level of the first type file associated with the first type file and the first type file respectively according to the file association relation and the decompression parameter to obtain the restored first type file;
if the data part of the first type file to be restored in the first file set is determined not to belong to the compressed data part according to the configuration file, obtaining the data part and the signature part which form the first type file from the second file set according to the configuration information of the data part and the signature part of the first type file in the configuration file, and assembling to obtain the restored first type file;
and forming the first file set by using the plurality of restored files of the first type.
9. A processing apparatus, comprising:
the file obtaining unit is used for obtaining a first file set to be compressed, and the first file set comprises a plurality of first type files with signatures;
the data deduplication unit is used for performing data part deduplication and/or signature part deduplication on the first type file with the signature to obtain a second file set after deduplication;
a configuration generating unit, configured to generate a configuration file, where configuration information of at least the data portion and the signature portion in the first file set is recorded in the configuration file, so as to enable the second file set to be restored to the first file set;
and the file compression unit is used for compressing the configuration file and the second file set to obtain a compressed file of the first file set.
10. A method of processing, comprising:
obtaining a compressed file;
decompressing the compressed file to obtain a configuration file and an initial file set to be restored, wherein the configuration file records configuration information of a data part and a signature part in a first type file with a signature in a target file set, and the initial file set is obtained by at least carrying out data part duplication removal and/or signature part duplication removal on the first type file in the target file set;
and restoring the initial file set into the target file set according to the configuration information of the data part and the signature part of the first type file in the configuration file.
CN202110346716.7A 2021-03-31 2021-03-31 Processing method and device Pending CN113051238A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110346716.7A CN113051238A (en) 2021-03-31 2021-03-31 Processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110346716.7A CN113051238A (en) 2021-03-31 2021-03-31 Processing method and device

Publications (1)

Publication Number Publication Date
CN113051238A true CN113051238A (en) 2021-06-29

Family

ID=76516633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110346716.7A Pending CN113051238A (en) 2021-03-31 2021-03-31 Processing method and device

Country Status (1)

Country Link
CN (1) CN113051238A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116059A (en) * 2021-11-26 2022-03-01 北京江南天安科技有限公司 Implementation method of multi-stage chained decompression structure cipher machine and cipher computing equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125722A1 (en) * 2009-11-23 2011-05-26 Ocarina Networks Methods and apparatus for efficient compression and deduplication
CN103152430A (en) * 2013-03-21 2013-06-12 河海大学 Cloud storage method for reducing data-occupied space
CN107506153A (en) * 2017-09-26 2017-12-22 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system
US10659076B1 (en) * 2019-04-30 2020-05-19 EMC IP Holding Company LLC Reducing the amount of data stored in a sequence of data blocks by combining deduplication and compression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125722A1 (en) * 2009-11-23 2011-05-26 Ocarina Networks Methods and apparatus for efficient compression and deduplication
CN103152430A (en) * 2013-03-21 2013-06-12 河海大学 Cloud storage method for reducing data-occupied space
CN107506153A (en) * 2017-09-26 2017-12-22 深信服科技股份有限公司 A kind of data compression method, data decompression method and related system
US10659076B1 (en) * 2019-04-30 2020-05-19 EMC IP Holding Company LLC Reducing the amount of data stored in a sequence of data blocks by combining deduplication and compression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAO X, YU J, SHEN W T, ET AL.: "Achieving low-entropy secure cloud data auditing with file and authenticator deduplication", INFORMATION SCIENCES, vol. 546, 13 August 2020 (2020-08-13), pages 177 - 191, XP086366146, DOI: 10.1016/j.ins.2020.08.021 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116059A (en) * 2021-11-26 2022-03-01 北京江南天安科技有限公司 Implementation method of multi-stage chained decompression structure cipher machine and cipher computing equipment
CN114116059B (en) * 2021-11-26 2023-08-22 北京江南天安科技有限公司 Implementation method of multistage chained decompression structure cipher machine and cipher computing equipment

Similar Documents

Publication Publication Date Title
CN106936441B (en) Data compression method and device
CN107229420B (en) Data storage method, reading method, deleting method and data operating system
Son et al. A study of user data integrity during acquisition of Android devices
WO2013051129A1 (en) Deduplication method for storage data, deduplication device for storage data, and deduplication program
JPWO2010100733A1 (en) Integrated deduplication system, data storage device, and server device
WO2023000674A1 (en) Method and apparatus for data compression, backup and recovery of cloud hard disk, device and storage medium
Aronson et al. Towards an engineering approach to file carver construction
CN110618974A (en) Data storage method, device, equipment and storage medium
US8909606B2 (en) Data block compression using coalescion
CN111339551B (en) Data verification method and related device and equipment
CN113051238A (en) Processing method and device
CN107766075B (en) Code merging processing method and device
KR101652436B1 (en) Apparatus for data de-duplication in a distributed file system and method thereof
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
CN105608150A (en) Business data processing method and system
Du et al. Deduplicated disk image evidence acquisition and forensically-sound reconstruction
CN108090095B (en) Method and device for reconstructing database in batches
CN109360605A (en) Gene order-checking data archiving method, server and computer readable storage medium
CN113419896B (en) Data recovery method, device, electronic equipment and computer readable medium
CN111125012A (en) Snapshot generation method, device and equipment and readable storage medium
CN107239505B (en) Cluster mirror synchronization method and system
CN111625468A (en) Test case duplicate removal method and device
CN112596948B (en) Database cluster data backup method, device, equipment and storage medium
CN113076068B (en) Data storage method and device, electronic equipment and readable storage medium
CN114153647B (en) Rapid data verification method, device and system for cloud storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination