CN116303297B - File compression processing method, device, equipment and medium - Google Patents

File compression processing method, device, equipment and medium Download PDF

Info

Publication number
CN116303297B
CN116303297B CN202310596740.5A CN202310596740A CN116303297B CN 116303297 B CN116303297 B CN 116303297B CN 202310596740 A CN202310596740 A CN 202310596740A CN 116303297 B CN116303297 B CN 116303297B
Authority
CN
China
Prior art keywords
data
file
groups
splitting
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310596740.5A
Other languages
Chinese (zh)
Other versions
CN116303297A (en
Inventor
王恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donson Times Information Technology Co ltd
Original Assignee
Donson Times Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donson Times Information Technology Co ltd filed Critical Donson Times Information Technology Co ltd
Priority to CN202310596740.5A priority Critical patent/CN116303297B/en
Publication of CN116303297A publication Critical patent/CN116303297A/en
Application granted granted Critical
Publication of CN116303297B publication Critical patent/CN116303297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a file compression processing method, a device, equipment and a medium. The method comprises the following steps: if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data; calculating the file byte similarity among all groups of split data to obtain similarity information; obtaining a target index strategy matched with the similarity information according to a preset configuration table; respectively carrying out data compression on each group of split data according to a target index strategy to obtain a plurality of groups of corresponding compressed data; and combining a plurality of groups of compressed data with the identification information of the target index strategy to obtain the compressed file. By the method, the optimal index strategy is matched based on the calculated similarity information to compress the file, so that the efficiency of the file compression process can be greatly improved, and the time consumption of the file compression process is shortened.

Description

File compression processing method, device, equipment and medium
Technical Field
The present invention relates to the field of file compression technologies, and in particular, to a method, an apparatus, a device, and a medium for file compression processing.
Background
In order to improve file transmission and storage efficiency, reduce the space required for file storage or reduce the time required for file transmission, the files can be compressed. In the prior art, a specific compression method is generally adopted to compress a file, or the file is compressed according to a compression method selected by a user. Existing compression methods result in slower compression rates because the particular compression mode is typically selected for processing. Therefore, the prior art method has the problem of low compression efficiency in compressing files.
Disclosure of Invention
The embodiment of the invention provides a file compression processing method, device, equipment and medium, which aim to solve the problem of low compression efficiency in the prior art method for compressing files.
In a first aspect, an embodiment of the present invention provides a method for compressing a file, where the method includes:
if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data;
calculating the file byte similarity between each group of split data to obtain similarity information;
obtaining a target index strategy matched with the similarity information according to a preset configuration table;
respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data;
and combining the multiple groups of compressed data with the identification information of the target index strategy to obtain a compressed file.
In a second aspect, an embodiment of the present invention provides a file compression processing apparatus, where the apparatus includes:
the splitting unit is used for splitting the data contained in the initial file according to a preset segmentation rule if the initial file input by the user is received, so as to obtain a plurality of groups of corresponding split data;
the similarity information acquisition unit is used for calculating the file byte similarity among the split data of each group to obtain similarity information;
the target index strategy acquisition unit is used for acquiring a target index strategy matched with the similarity information according to a preset configuration table;
the data compression processing unit is used for respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data;
and the compressed file acquisition unit is used for combining the plurality of groups of compressed data with the identification information of the target index strategy to obtain a compressed file.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the computer device implements the file compression processing method according to the first aspect when the computer program is executed.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program, when executed by a processor, implements the file compression processing method according to the first aspect.
The embodiment of the invention provides a file compression processing method, device, equipment and medium. The method comprises the following steps: if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data; calculating the file byte similarity among all groups of split data to obtain similarity information; obtaining a target index strategy matched with the similarity information according to a preset configuration table; respectively carrying out data compression on each group of split data according to a target index strategy to obtain a plurality of groups of corresponding compressed data; and combining a plurality of groups of compressed data with the identification information of the target index strategy to obtain the compressed file. By the method, the optimal index strategy is matched based on the calculated similarity information to compress the file, so that the efficiency of the file compression process can be greatly improved, and the time consumption of the file compression process is shortened.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a file compression processing method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a file compression processing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flow chart of a file compression processing method according to an embodiment of the invention; the file compression processing method is applied to the terminal equipment and is executed through application software installed in the terminal equipment; the terminal device is a device for executing the file compression processing method to compress the file, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone, and the terminal device may also be a server built in an enterprise. As shown in FIG. 1, the method includes steps S110 to S150.
S110, if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data.
If an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data. The user may input an initial file to the terminal device, that is, a file that is not compressed, and the initial file may be a video, image, text, or the like type file. The terminal equipment can split the data contained in the initial file according to the segmentation rule, the data information contained in the initial file is stored in a binary form of 1 and 0 in a storage medium of the computer, and then the data information of the initial file can be split, so that multiple groups of split data are obtained.
In one embodiment, step S110 includes the steps of: coding the data contained in the initial file according to the coding mode in the segmentation rule to obtain coded data; and splitting the coded data according to the splitting length in the segmentation rule to obtain a plurality of groups of splitting data corresponding to the splitting length.
The data included in the initial file may be encoded according to the encoding method in the segmentation rule, for example, the encoding method is hexadecimal, and the binary data included in the initial file may be encoded and converted into hexadecimal encoded data.
And then splitting the coded data according to the splitting length, wherein the data quantity contained in each group of splitting data is equal to the splitting length except the last group of splitting data.
For example, if the splitting length is 1000, the number of characters included in each set of splitting data is 1000 except for the last set of splitting data.
In one embodiment, step S110 includes the steps of: coding the data contained in the initial file according to the coding mode in the segmentation rule to obtain coded data; splitting the encoded data according to the splitting length and the overlapping coefficient in the segmentation rule to obtain a plurality of groups of splitting data corresponding to the splitting length; the overlapping data quantity contained between two adjacent groups of split data is the product value of the split length and the overlapping coefficient.
In another embodiment, after the data included in the initial file is encoded according to the encoding mode to obtain encoded data, the encoded data may be split according to the splitting length and the overlapping coefficient, and in the split data obtained after splitting, except for the last group of split data, the data amount included in each group of split data is equal to the splitting length. And calculating corresponding overlapped data quantity according to the splitting length and the overlapping coefficient, wherein the overlapped data quantity is the product value of the splitting length and the overlapping coefficient, and the overlapped data quantity is the number of overlapped characters between two groups of splitting data. For example, the overlap coefficient is 0.1, and the corresponding overlap data amount is 100, that is, the last 100 characters of the former group of split data in the two adjacent groups of split data are the same as the last 100 characters of the latter group of split data.
In a specific application process, the overlap coefficient can be obtained by corresponding calculation according to the splitting length and the total character number of the coded data, and the specific calculation can be obtained by adopting a formula (1):
(1);
wherein R is the calculated overlap coefficient, C is the total character number of the coded data, F is the splitting length, and e is the natural logarithmic base number. For example, c=150k, f=1k, and the corresponding calculated overlap coefficient r=0.0664.
S120, calculating the file byte similarity among the split data of each group to obtain similarity information.
And calculating the file byte similarity among the split data of each group to obtain similarity information. Further, the terminal device may calculate the similarity of file bytes between each group of split data, so as to obtain similarity information, where the similarity information includes a statistical value obtained by counting a plurality of similarity values.
In one embodiment, step S120 includes the steps of: respectively calculating the file byte similarity between two adjacent groups of split data to obtain a corresponding similarity value; and carrying out numerical statistics on the obtained similarity value to obtain similarity information.
Specifically, the file byte similarity between two groups of split data can be calculated, and a corresponding similarity value is obtained. The specific calculation method includes firstly counting the same character fragments between two groups of split data, if the length of the character fragments is 10, for the split data with the character number of 1000, the corresponding acquired number of the character fragments is 1000-10+1, namely 991 character fragments are acquired, comparing each character fragment with the character fragments of adjacent split data to acquire the same number of the character fragments, and dividing the same number of the character fragments in the adjacent two groups of split data by the total number of the character fragments 991, thereby calculating the similarity value.
If two sets of split data with overlapping characters are to be obtained, the overlapping coefficient is subtracted from the value obtained by dividing the same number of character fragments in two adjacent sets of split data by the total number of character fragments 991, and the finally obtained difference is the calculated similarity value.
And then, carrying out numerical statistics on the obtained plurality of similarity values so as to obtain similarity information, and specifically, carrying out numerical statistics on the obtained similarity values according to a preset statistical term, wherein the statistical term comprises an average value, a variance, a median, a mean square error and the like.
S130, acquiring a target index strategy matched with the similarity information according to a preset configuration table.
And acquiring a target index strategy matched with the similarity information according to a preset configuration table. The terminal equipment is also provided with a configuration table in advance, the configuration table comprises a plurality of groups of index strategies, each group of index strategies correspondingly comprises numerical matching intervals corresponding to each statistical item, whether the similarity information is matched with the numerical matching intervals of the index strategies can be judged, and if the similarity information is matched with the numerical matching intervals of the index strategies, the index strategies are determined to be the corresponding target index strategies. The indexing strategy is a specific strategy for carrying out index matching on characters, and short character strings can be obtained by replacing long character string indexes, so that the compression of the files is realized, and the storage volume of the files is reduced.
In one embodiment, step S130 includes the steps of: matching each statistic value in the similarity information with an index strategy in the configuration table; and obtaining an index strategy matched with each statistic value as a target index strategy.
Specifically, each statistic value in the similarity information can be matched with the numerical matching interval of each index policy in the configuration table, that is, whether each statistic value in the similarity information is located in the corresponding numerical matching interval in the index policy is judged, and if each statistic value is located in the numerical matching interval, the index policy is determined to be the matched target index policy.
And S140, respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data.
And respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data. And respectively carrying out data compression on each group of split data according to the target index strategy, namely matching the character strings in the split data with the index character strings in the target index strategy, and carrying out index replacement on the character strings according to the matched index character strings so as to obtain a plurality of groups of compressed data.
In one embodiment, step S140 includes the steps of: matching the character strings in the split data according to the target index strategy; and carrying out index replacement on the character strings in each split data according to the matching result so as to carry out data compression, thereby obtaining compressed data corresponding to each group of split data.
Specifically, a string with a fixed length may be sequentially obtained from the split data, and the string is matched with an index string in the target index policy, for example, the fixed length may be 20, and the number of characters contained in the string obtained from the split data is also 20; if the character string is matched with a certain index character string, replacing the character string in the split data with index information corresponding to the index character string, and continuously intercepting the next character string from the characters after the replaced character string in the split data; if the character string is not matched with any index character string in the target index strategy, the next character string is sequentially intercepted from characters after the character string which is not replaced in the split data, and the replacing step is repeated based on the newly acquired character string.
For example, if a certain index string is "3E402CB79C26DB9FAC4B" and the corresponding index information is FF3C7B, the string matching the index string in the split data may be replaced with FF3C7B.
The method can realize index replacement of the character strings in the split data to obtain compressed data corresponding to each group of split data, and each split data corresponds to one group of compressed data.
And S150, combining the multiple groups of compressed data with the identification information of the target index strategy to obtain a compressed file.
And combining the multiple groups of compressed data with the identification information of the target index strategy to obtain a compressed file. Each split data corresponds to a group of compressed data, so that the compressed data corresponding to each split data can be sequentially combined according to the sequence of the split data, and then combined with the identification information of the target index strategy to obtain the compressed file. The identification information can be used for decompressing the compressed file, for example, the target index strategy corresponding to the identification information is obtained to perform reverse index replacement on the characters in the compressed file, so as to obtain the decompressed file for use.
For two groups of split data without overlapping characters, the two groups of split data with overlapping characters can be directly combined according to the sequence of the corresponding split data, and for the two groups of split data with overlapping characters, the following steps are needed to be combined.
In one embodiment, step S150 includes the steps of: comparing the length values of overlapping data between two adjacent groups of split data in the two corresponding groups of compressed data; deleting data information corresponding to overlapping data with longer length values in two groups of compressed data of adjacent data to obtain duplicate-removed compressed data corresponding to each group of compressed data; and sequentially combining the de-duplicated compressed data and then combining the de-duplicated compressed data with the identification information to obtain a corresponding compressed file.
The length values of the overlapped data in the two corresponding groups of compressed data between the two adjacent groups of split data can be compared, and as the overlapped data are contained between the two adjacent groups of split data and characters corresponding to the overlapped data are respectively compressed, the character length values of the overlapped data in the two groups of compressed data can be compared, the data information with shorter length values of the overlapped data in the two adjacent groups of compressed data can be determined, the data information with longer character length values can be deleted, the data information with shorter character length values can be reserved, and the de-duplication compressed data corresponding to each group of compressed data can be obtained. And then sequentially combining the de-duplication compressed data with the identification information according to the sorting of the split data corresponding to the de-duplication compressed data, and finally obtaining the compressed file.
The file compression processing method provided by the embodiment of the invention comprises the following steps: if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data; calculating the file byte similarity among all groups of split data to obtain similarity information; obtaining a target index strategy matched with the similarity information according to a preset configuration table; respectively carrying out data compression on each group of split data according to a target index strategy to obtain a plurality of groups of corresponding compressed data; and combining a plurality of groups of compressed data with the identification information of the target index strategy to obtain the compressed file. By the method, the optimal index strategy is matched based on the calculated similarity information to compress the file, so that the efficiency of the file compression process can be greatly improved, and the time consumption of the file compression process is shortened.
The embodiment of the invention also provides a file compression processing device which can be configured in the terminal equipment and is used for executing any embodiment of the file compression processing method. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of a file compression processing apparatus according to an embodiment of the present invention.
As shown in fig. 2, the file compression processing apparatus 100 includes a splitting unit 110, a similarity information acquisition unit 120, a target index policy acquisition unit 130, a data compression processing unit 140, and a compressed file acquisition unit 150.
And the splitting unit 110 is configured to split data included in an initial file according to a preset segmentation rule if the initial file input by a user is received, so as to obtain multiple groups of corresponding split data.
And a similarity information obtaining unit 120, configured to calculate a similarity of file bytes between each group of split data, so as to obtain similarity information.
The target index policy obtaining unit 130 is configured to obtain a target index policy matched with the similarity information according to a preset configuration table.
And the data compression processing unit 140 is configured to perform data compression on each set of split data according to the target index policy, so as to obtain a corresponding plurality of sets of compressed data.
And the compressed file obtaining unit 150 is configured to combine the multiple sets of compressed data with the identification information of the target index policy to obtain a compressed file.
The file compression processing device provided by the embodiment of the invention applies the file compression processing method, and the method comprises the following steps: if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data; calculating the file byte similarity among all groups of split data to obtain similarity information; obtaining a target index strategy matched with the similarity information according to a preset configuration table; respectively carrying out data compression on each group of split data according to a target index strategy to obtain a plurality of groups of corresponding compressed data; and combining a plurality of groups of compressed data with the identification information of the target index strategy to obtain the compressed file. By the method, the optimal index strategy is matched based on the calculated similarity information to compress the file, so that the efficiency of the file compression process can be greatly improved, and the time consumption of the file compression process is shortened.
The above-described file compression processing method may be implemented in the form of a computer program, and the file compression processing apparatus may be implemented as a computer device, which may be run on the computer device as shown in fig. 3. The computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor; the file compression processing method described in the above embodiment is implemented when the computer program is executed by the computer device.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device may be a terminal device for performing a file compression processing method to implement compression processing of a file.
With reference to FIG. 3, the computer device 500 includes a processor 502, a memory, and a network interface 505, connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.
The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a file compression method, wherein the storage medium 503 may be a volatile storage medium or a nonvolatile storage medium.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a file compression processing method.
The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
The processor 502 is configured to execute a computer program 5032 stored in a memory, so as to implement the corresponding functions in the file compression processing method.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 3 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 3, and will not be described again.
It should be appreciated that in an embodiment of the invention, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium. The computer readable storage medium stores a computer program which, when executed by a processor, implements the steps included in the file compression processing method described above.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or part of what contributes to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. A method of file compression processing, the method comprising:
if an initial file input by a user is received, splitting data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data;
calculating the file byte similarity between each group of split data to obtain similarity information;
obtaining a target index strategy matched with the similarity information according to a preset configuration table;
respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data;
combining the multiple groups of compressed data with the identification information of the target index strategy to obtain a compressed file;
splitting the data contained in the initial file according to a preset segmentation rule to obtain a plurality of groups of corresponding split data, wherein the splitting comprises the following steps:
the initial file is processed according to the coding mode in the segmentation ruleCoding the contained data to obtain coded data; the overlapping coefficient is correspondingly calculated according to the splitting length and the total character number of the coded data, and the calculation formula of the overlapping coefficient is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein R is the calculated overlap coefficient, C is the total character number of the coded data, and F is the splitting length;
splitting the encoded data according to the splitting length and the overlapping coefficient in the segmentation rule to obtain a plurality of groups of splitting data corresponding to the splitting length; the overlapping data quantity contained between two adjacent groups of split data is the product value of the split length and the overlapping coefficient.
2. The method for compressing files according to claim 1, wherein splitting the data included in the initial file according to a preset segmentation rule to obtain a plurality of corresponding groups of split data includes:
coding the data contained in the initial file according to the coding mode in the segmentation rule to obtain coded data;
and splitting the coded data according to the splitting length in the segmentation rule to obtain a plurality of groups of splitting data corresponding to the splitting length.
3. The method of claim 1, wherein calculating the file byte similarity between the split data of each group to obtain similarity information comprises:
respectively calculating the file byte similarity between two adjacent groups of split data to obtain a corresponding similarity value;
and carrying out numerical statistics on the obtained similarity value to obtain similarity information.
4. The method for compressing files according to claim 1, wherein the obtaining a target index policy matched with the similarity information according to a preset configuration table includes:
matching each statistic value in the similarity information with an index strategy in the configuration table;
and obtaining an index strategy matched with each statistic value as a target index strategy.
5. The method of claim 1, wherein the performing data compression on each set of split data according to the target index policy to obtain a corresponding plurality of sets of compressed data includes:
matching the character strings in the split data according to the target index strategy;
and carrying out index replacement on the character strings in each split data according to the matching result so as to carry out data compression, thereby obtaining compressed data corresponding to each group of split data.
6. The method of claim 1, wherein combining the plurality of sets of compressed data with the identification information of the target index policy to obtain the compressed file comprises:
comparing the length values of overlapping data between two adjacent groups of split data in the two corresponding groups of compressed data;
deleting data information corresponding to overlapping data with longer length values in two groups of compressed data of adjacent data to obtain duplicate-removed compressed data corresponding to each group of compressed data;
and sequentially combining the de-duplicated compressed data and then combining the de-duplicated compressed data with the identification information to obtain a corresponding compressed file.
7. A file compression processing apparatus for performing the file compression processing method according to any one of claims 1 to 6, the apparatus comprising:
the splitting unit is used for splitting the data contained in the initial file according to a preset segmentation rule if the initial file input by the user is received, so as to obtain a plurality of groups of corresponding split data;
the similarity information acquisition unit is used for calculating the file byte similarity among the split data of each group to obtain similarity information;
the target index strategy acquisition unit is used for acquiring a target index strategy matched with the similarity information according to a preset configuration table;
the data compression processing unit is used for respectively carrying out data compression on each group of split data according to the target index strategy to obtain a plurality of groups of corresponding compressed data;
and the compressed file acquisition unit is used for combining the plurality of groups of compressed data with the identification information of the target index strategy to obtain a compressed file.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer device implements the file compression processing method according to any of claims 1 to 6 when the computer program is executed by the computer device.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the file compression processing method according to any one of claims 1 to 6.
CN202310596740.5A 2023-05-25 2023-05-25 File compression processing method, device, equipment and medium Active CN116303297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310596740.5A CN116303297B (en) 2023-05-25 2023-05-25 File compression processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310596740.5A CN116303297B (en) 2023-05-25 2023-05-25 File compression processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116303297A CN116303297A (en) 2023-06-23
CN116303297B true CN116303297B (en) 2023-09-29

Family

ID=86834541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310596740.5A Active CN116303297B (en) 2023-05-25 2023-05-25 File compression processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116303297B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909299B (en) * 2024-03-19 2024-05-10 电子科技大学 Dynamic hierarchical data splitting system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104868922A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Data compression method and device
CN106294683A (en) * 2016-08-05 2017-01-04 中国银行股份有限公司 A kind of file declustering method and device
CN110781155A (en) * 2019-10-18 2020-02-11 赛尔网络有限公司 Data storage reading method, system, equipment and medium based on IPFS
CN110929518A (en) * 2019-12-09 2020-03-27 朱利 Text sequence labeling algorithm using overlapping splitting rule
CN113641643A (en) * 2021-07-02 2021-11-12 阿里巴巴新加坡控股有限公司 File writing method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7881544B2 (en) * 2006-08-24 2011-02-01 Dell Products L.P. Methods and apparatus for reducing storage size
US20220408127A1 (en) * 2021-06-16 2022-12-22 Meta Platforms, Inc. Systems and methods for selecting efficient encoders for streaming media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104868922A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Data compression method and device
CN106294683A (en) * 2016-08-05 2017-01-04 中国银行股份有限公司 A kind of file declustering method and device
CN110781155A (en) * 2019-10-18 2020-02-11 赛尔网络有限公司 Data storage reading method, system, equipment and medium based on IPFS
CN110929518A (en) * 2019-12-09 2020-03-27 朱利 Text sequence labeling algorithm using overlapping splitting rule
CN113641643A (en) * 2021-07-02 2021-11-12 阿里巴巴新加坡控股有限公司 File writing method and device

Also Published As

Publication number Publication date
CN116303297A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US10359939B2 (en) Data object processing method and apparatus
US9690802B2 (en) Stream locality delta compression
US8751462B2 (en) Delta compression after identity deduplication
CN107046812B (en) Data storage method and device
CN107682016B (en) Data compression method, data decompression method and related system
CN107395209B (en) Data compression method, data decompression method and equipment thereof
WO2012033498A1 (en) Systems and methods for data compression
CN116303297B (en) File compression processing method, device, equipment and medium
EP2509226A1 (en) Data segmentation method and device in data compression
US9843802B1 (en) Method and system for dynamic compression module selection
CN111125033B (en) Space recycling method and system based on full flash memory array
CN112165331A (en) Data compression method and device, data decompression method and device, storage medium and electronic equipment
US9088297B2 (en) High throughput decoding of variable length data symbols
US6748520B1 (en) System and method for compressing and decompressing a binary code image
CN111124939A (en) Data compression method and system based on full flash memory array
CN108880559B (en) Data compression method, data decompression method, compression equipment and decompression equipment
CN111061428B (en) Data compression method and device
CN111124259A (en) Data compression method and system based on full flash memory array
US9571698B1 (en) Method and system for dynamic compression module selection
GB2539239A (en) Encoders, decoders and methods
CN110958212B (en) Data compression method, data decompression method, device and equipment
US9843702B1 (en) Method and system for dynamic compression module selection
CN111198857A (en) Data compression method and system based on full flash memory array
CN107783990B (en) Data compression method and terminal
CN112054805B (en) Model data compression method, system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant