CN112100143B - File compression storage method, device, equipment and storage medium - Google Patents

File compression storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN112100143B
CN112100143B CN202011027617.4A CN202011027617A CN112100143B CN 112100143 B CN112100143 B CN 112100143B CN 202011027617 A CN202011027617 A CN 202011027617A CN 112100143 B CN112100143 B CN 112100143B
Authority
CN
China
Prior art keywords
file
stored
coefficient
time
preset period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011027617.4A
Other languages
Chinese (zh)
Other versions
CN112100143A (en
Inventor
兰东平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011027617.4A priority Critical patent/CN112100143B/en
Publication of CN112100143A publication Critical patent/CN112100143A/en
Application granted granted Critical
Publication of CN112100143B publication Critical patent/CN112100143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file compression storage method, which comprises the following steps: recording the operation times and the total operation time of each file according to a preset period; for each file type, calculating average consumed time according to the total operation consumed time of the files belonging to the file type in the current preset period; determining a system busy degree coefficient according to the average consumed time, and determining whether a compression strategy needs to be determined again according to the system busy degree coefficient; if the file is required, counting the operation frequency according to the times of the file in the current preset period and determining a compression strategy corresponding to the operation frequency, and if the compression strategy is different from the currently applied compression strategy, re-compressing and storing the file by using the compression strategy. The number of file operations is recorded according to a certain period, so that the number of times of one period is stored every time. The file type and the system busyness degree are combined to the adjustment process of the compression strategy, so that the adjustment of the file compression strategy can meet the time-consuming requirement of user operation, and the access efficiency is improved.

Description

File compression storage method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of big data, is applied to the field of smart cities, and particularly relates to a file compression storage method, a device, equipment and a storage medium.
Background
With the popularization of the internet and electronic information era and the heat of big data and AI industries, various unstructured data volumes of enterprises and organizations are suddenly increased. The archiving and compressing storage commonly used in the industry adopts a uniform compression storage tool to compress mass files and then store the compressed mass files into a storage product.
However, if the compression efficiency of the adopted compression storage tool is high, the access efficiency of the user is limited, and if the compression storage tool with low compression efficiency is adopted to improve the access efficiency of the user, the storage cost is high, so that a single archive compression scheme can only select one of the compression efficiency and the access efficiency, and the two constraints of the access efficiency and the compression efficiency cannot be balanced.
Disclosure of Invention
The object of the present invention is to provide a method, an apparatus, a device and a storage medium for compressing and storing files, which are provided to overcome the above disadvantages of the prior art, and the object is achieved by the following technical solutions.
The first aspect of the present invention provides a file compression storage method, where the method includes:
recording the times of operating each stored file by a user and the total operation time according to a preset period; wherein, each stored file is correspondingly recorded with the file type and the file size;
for each file type, calculating the average consumed time of unit capacity operation of the file type according to the file size of the stored file belonging to the file type and the total consumed time of the operation of the stored file belonging to the file type, which is recorded in the current preset period;
determining a system busy degree coefficient for operating the file type according to the average consumed time, and determining whether a compression strategy needs to be determined for the stored files belonging to the file type according to the system busy degree coefficient;
if the file is required to be compressed, counting a first operation frequency according to the number of times of the stored file recorded in the current preset period, determining a compression strategy corresponding to the first operation frequency, if the compression strategy is different from the currently applied compression strategy of the stored file, re-compressing the stored file by using the compression strategy, and covering the stored file by using the compressed file.
A second aspect of the present invention provides a file compression storage apparatus, including:
the recording module is used for recording the times of the user operating each stored file and the total operation time according to a preset period; wherein, each stored file is correspondingly recorded with the file type and the file size;
the average time consumption calculation module is used for calculating the average time consumption of unit capacity operation of each file type according to the file size of the stored file belonging to the file type and the total operation time consumption of the stored file belonging to the file type, which is recorded in the current preset period;
the judging module is used for determining a system busy degree coefficient for operating the file type according to the average consumed time and determining whether a compression strategy needs to be determined for the stored files belonging to the file type according to the system busy degree coefficient;
and the recompression module is used for counting a first operation frequency according to the number of times of the stored file recorded in the current preset period when the compression strategy is required to be determined, determining the compression strategy corresponding to the first operation frequency, if the compression strategy is different from the currently applied compression strategy of the stored file, recompressing the stored file by using the compression strategy again, and covering the stored file by using the compressed file.
A third aspect of the invention proposes a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect when executing the program.
A fourth aspect of the present invention proposes a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the file compression storage method according to the first aspect.
The file compression storage method and device based on the first aspect and the second aspect have the following beneficial effects:
the times and the total operation time of the user for operating each file are recorded according to a certain period, and the operation frequency in one period is counted each time, so that the times and the total operation time of the user for operating the files in one period are stored each time, the occupied extra storage space is small, and extra and excessive cost is avoided. Due to the fact that different file types and different decompression time are different, when a compression strategy is adjusted, the compression strategy can be adjusted according to the file types, the average consumed time of unit capacity operation of each file type is further obtained through the recorded consumed time of the operation, then a system busy degree coefficient for operating the corresponding file type is obtained according to the average consumed time of each file type, when the compression strategy adjustment of the files in the corresponding file types is determined according to the system busy degree coefficient, the compression strategy of the files can be intelligently adjusted in time according to the operation frequency obtained through statistics, the access service requirement can be met, the storage requirement can be guaranteed, the file types and the system busy degree are combined in the adjustment process of the file compression strategy, the adjustment of the file compression strategy can meet the requirement of the consumed time of user operation, the access efficiency is improved, and user experience is improved. And because the time period of each file requiring adjustment of the compression strategy is relatively fixed, the process of frequently adjusting the compression strategy can not occur, and further, the quantity of the files requiring adjustment in each matching check is not large, and the additional calculation cost is not large. This scheme can be applied to wisdom city field, can promote the construction in wisdom city through this scheme.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating an embodiment of a file compression storage method according to an exemplary embodiment of the present invention;
FIG. 2 is a diagram illustrating a hardware configuration of a computer device in accordance with an illustrative embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of a file compression storage device according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
Generally, different compression tools have completely different compression rates, compression speeds, and decompression speeds for the same file. Through experimental tests, referring to table 1, the same pafa.txt file is compressed and decompressed by using 7 compression tools shown in the first column in table 1, so that different compression ratios, compression speeds and decompression speeds are obtained respectively.
The inventor finds that due to complexity and diversity of a service scene, data generated by a service has a typical characteristic that access frequencies of different data are obviously different, so in order to solve the problem that a single archive compression scheme can only select one of compression efficiency and access efficiency and cannot balance two constraint relations of the access efficiency and the compression efficiency, the inventor conceives that different compression strategies can be divided according to compression rate, compression speed and decompression speed of different compression tools, and different compression strategies are adopted for files with different access frequencies to meet requirements of storing and accessing the service.
Figure BDA0002702580930000071
TABLE 1
Based on the above, the present invention provides an improved file compression storage method, after a file is compressed and stored for the first time, the number of times of operating a stored file and the total operation time consumption are recorded according to a preset period, the average time consumption of each file type is calculated, a system busy degree coefficient is obtained according to the average time consumption, when it is determined that a compression policy needs to be determined for the stored file belonging to the file type according to the system busy degree coefficient, the operation frequency is counted according to the number of times of the stored file recorded in the current preset period, a compression policy corresponding to the operation frequency is determined, if the compression policy is different from the compression policy currently applied to the stored file, the stored file is compressed again by using the compression policy, and the compressed file is used to cover the stored file.
Based on the above description, the number of times that the user operates each file and the total operation time are recorded according to a certain period, and the operation frequency in one period is counted each time, so that the number of times that the user operates the file and the total operation time in one period are stored each time, the occupied additional storage space is small, and extra and excessive cost is not caused. Due to the fact that different file types and different decompression time are different, when a compression strategy is adjusted, the compression strategy can be adjusted according to the file types, the average consumed time of unit capacity operation of each file type is further obtained through the recorded consumed time of the operation, then a system busy degree coefficient for operating the corresponding file type is obtained according to the average consumed time of each file type, when the compression strategy adjustment of the files in the corresponding file types is determined according to the system busy degree coefficient, the compression strategy of the files can be intelligently adjusted in time according to the operation frequency obtained through statistics, the access service requirement can be met, the storage requirement can be guaranteed, the file types and the system busy degree are combined in the adjustment process of the file compression strategy, the adjustment of the file compression strategy can meet the requirement of the consumed time of user operation, the access efficiency is improved, and user experience is improved. And because the time period of each file requiring adjustment of the compression strategy is relatively fixed, the process of frequently adjusting the compression strategy does not occur, so that the quantity of the files requiring adjustment in each matching check is not large, and the extra calculation cost is not large. This scheme can be applied to wisdom city field, can promote the construction in wisdom city through this scheme.
The following describes the file compression and storage method proposed by the present invention in detail with specific embodiments.
Fig. 1 is a flowchart illustrating an embodiment of a file compression storage method according to an exemplary embodiment of the present invention, where the file compression storage method may be applied to a computer device (e.g., a terminal, a server, etc.), as shown in fig. 1, and the method includes the following steps:
step 101: and recording the times of operating each stored file by the user and the total operation time according to a preset period.
The stored file refers to a compressed file stored after the written file is compressed by adopting a default compression strategy when the file is written for the first time. Each stored file is correspondingly recorded with the file type and the file size
It should be noted that after a file is written into a storage, the written file is frequently read and written by a user as a business progresses, so that the number of times that the file is operated by the user and the total operation time consumption can be recorded in real time, and in order to facilitate accurate prediction of the operation frequency of the user on the file subsequently, in each preset period, the preset period can be divided into a plurality of time intervals, then the number of times that the user operates on the stored file in each time interval and the time consumption for performing the number of operations are recorded, and finally the sum of the time consumption recorded in the plurality of time intervals is recorded as the total operation time consumption.
In this embodiment, since the operations performed on the file include a write operation and a read operation, when recording the number of times the file is operated, the number of times the write operation is performed and the number of times the read operation is performed may be recorded separately.
It will be understood by those skilled in the art that the present invention relates to write operations and read operations, which refer to broad writing and reading, i.e. for write operations, including user operations of modifying, overwriting, deleting, etc. a file, and for read operations, including user operations of accessing, downloading, etc. a file.
For example, taking a preset period as 2 weeks, and taking a time interval divided into 1 day as an example, for a certain stored file, it is necessary to record the number CW of write operations and the number CR of read operations of the stored file per day, that is, a user writes each pair of files, adds 1 to the CW value, a user reads each pair of files, adds 1 to the CR value, and at the same time, adds up the consumed time of each operation (including write operation and read operation) performed on the stored file per day. Thus, after 2 weeks, the write operation array CW [ n ], the read operation array CR [ n ] and the operation time consumption array T [ n ] of the stored file in the current 2 weeks can be formed, and the value range of n is 0-13. Where CW [0] represents the number of times the stored file was written to on the first day, CW [1] represents the number of times the stored file was written to on the second day, and so on until CW [13] represents the number of times the stored file was written to on day 14; CR [0] represents the number of times the stored file was read on the first day, CR [1] represents the number of times the stored file was read on the second day, and so on until CR [13] represents the number of times the stored file was read on day 14; t [0] represents the accumulated time consumed by the operation of the stored file on the first day, T [1] represents the accumulated time consumed by the operation of the stored file on the second day, and so on, until T [13] represents the accumulated time consumed by the operation of the stored file on the 14 th day, and the total operation time is obtained after T [0] to T [13] are accumulated.
Step 102: and calculating the average consumed time of unit capacity operation of each file type according to the file size of the stored file belonging to the file type and the total consumed time of the operation of the stored file belonging to the file type recorded in the current preset period.
The same compression strategy is used for compressing files with the same size, if the file types belong to different files, the time required for decompression is different, and further, even if the file sizes are the same, the compression strategies are the same and belong to the same file type, the system busy degree is different, the time required for decompression is different, the operation on the files is time-consuming, and the system busy degree can be reflected.
Illustratively, file types include txt, doc, xml, and the like.
Assuming that the file size of a stored file is in units of (mega) M, a certain file type includes a file 1 and a file 2, the file size of the file 1 is 2M, the total number of times of being operated in a cycle is 5, the total time of being operated is 6 seconds, the file size of the file 2 is 3M, the total number of times of being operated in the cycle is 3, and the total time of being operated is 3 seconds, then the average time T of unit capacity operation of the file type = (3 seconds +6 seconds 2m × 5 times +3m × 3 times) =0.47 seconds/mega.
Step 103: and determining a system busy degree coefficient for operating the file type according to the average consumed time.
In some embodiments, the system busy level is divided into three level coefficients by using two preset values: the system comprises a first coefficient, a second coefficient and a third coefficient, wherein the first coefficient represents that the system is low in busy degree, the second coefficient represents that the system is medium in busy degree, and the third coefficient represents that the system is busy degree.
Based on this, the process of determining the system busy degree coefficient according to the average consumed time may be: if the average consumed time is less than the first preset consumed time, determining a system busy degree coefficient as a first coefficient; if the average consumed time is between the first preset consumed time and the second preset consumed time, determining the system busy degree coefficient as a first coefficient; and if the average consumed time is larger than the second preset consumed time, determining the system busy degree coefficient as a third coefficient.
The average time consumption and the system busy degree coefficient are in negative correlation, that is, the larger the average time consumption is, the more busy the system is, and the longer the time required for operating the file is.
Step 104: and determining whether a compression strategy needs to be determined for the stored files belonging to the file type according to the system busyness coefficient, if so, executing the step 105, otherwise, continuing to execute the step 102.
In some embodiments, if the system busy degree coefficient is the first coefficient, it indicates that the system operating the file type is low in busy degree, and in order to meet the access service requirement and ensure the storage requirement, compression policy adjustment may be further performed on the file belonging to the file type, that is, it is determined that a compression policy needs to be determined for all stored files belonging to the file type; if the system busy degree coefficient is the second coefficient, which indicates that the system operating the file type is medium in busy degree, the compression strategy adjustment can be further performed on the file belonging to the file type according to the random probability, that is, a random probability is generated for each stored file belonging to the file type, and if the random probability is smaller than the preset probability, the compression strategy needs to be determined for the stored file; if the system busy degree coefficient is the third coefficient, it indicates that the system operating the file type is high in busy degree, and in order to meet the requirement of a user on operation time consumption and improve access efficiency, it is determined that a compression strategy does not need to be determined for the stored file belonging to the file type.
Therefore, the file type and the system busyness degree are combined in the adjusting process of the file compression strategy, so that the adjusting of the file compression strategy can meet the time-consuming requirement of user operation, the access efficiency is improved, and the user experience is improved.
Step 105: and counting a first operation frequency according to the number of times of the stored files recorded in the current preset period.
In step 105, when counting the operation frequency of the user on the stored file in the current preset period, due to the influence of some accidental service activities, holidays and the like, the frequency of the read operation and the frequency of the write operation are suddenly high or suddenly low, and in order to avoid the influence of the accidental service activities, holidays and the like, the abnormal frequency can be eliminated from the frequency of the read operation and the frequency of the write operation recorded in the current preset period, so that the operation frequency counting accuracy is improved.
It should be noted that, if the recorded times include the number of write operations and the number of read operations, the counted first operating frequency also includes the first write operating frequency and the first read operating frequency.
The statistical procedure for the first write operation frequency is: the abnormal times are removed from the times of the write operation corresponding to each time interval in the current preset period, then the second write operation frequency of the stored file in the current preset period is calculated by utilizing the remaining times of the write operation, and the future first write operation frequency of the stored file is estimated according to the second write operation frequency and the estimated first write operation frequency of the stored file in the last preset period of the current preset period, so that the accuracy of the future first write operation frequency is improved.
The removed abnormal times are special values influenced by some accidental business activities, holidays and the like, and refer to the maximum writing times and the minimum writing times.
For the process of predicting the future first write operation frequency of the stored file according to the second write operation frequency and the first write operation frequency predicted by the stored file in the last preset period of the current preset period, a once exponential smoothing prediction formula can be used for prediction, namely the calculation formula of the first write operation frequency is as follows:
YC_CW(t)=a*Y_CW(t)+(1-a)*YC_CW(t-1) (1)
the method comprises the following steps that YC _ CW (t) is a first future writing operation frequency, Y _ CW (t) is a second writing operation frequency of a current preset period, and YC _ CW (t-1) is a first writing operation frequency estimated in a last preset period of the current preset period; a is a weighting coefficient, a is a value range (0, 1), a default value is 0.5, and configuration can be carried out according to an actual service scene. The smaller the value of a is, the smaller the predicted value change is, and the larger the value of a is, the larger the predicted value change is.
Still taking the preset cycle described in the above step 101 as a 2-week time and the time interval as 1 day as an example, the array of the number of write operations recorded in the current 2-week time is CW 0 to CW 13, that is, the number of 14 write operations, after the maximum number and the minimum number are removed from the number of 14 write operations, the number of 12 write operations is remained, then the second write operation frequency is calculated by using the number of remaining 12 write operations, and the future first write operation frequency YC _ CW t is estimated by using the above formula (1).
Wherein, if the time interval is 1 day, the second write operation frequency unit is calculated as (times/day).
The statistical procedure for the first read operation frequency is: the abnormal times are removed from the times of the read operation corresponding to each time interval in the current preset period, then the second read operation frequency of the stored file in the current preset period is calculated by using the remaining times of the read operation, and the first read operation frequency of the stored file in the future is estimated according to the second read operation frequency and the first read operation frequency estimated by the stored file in the previous preset period of the current preset period, so that the accuracy of the first read operation frequency in the future is improved.
The frequent times of the elimination are special values influenced by some accidental business activities, holidays and the like, and the abnormal times refer to the maximum reading times and the minimum reading times.
The process of predicting the first read operation frequency of the stored file in the future according to the second read operation frequency and the first read operation frequency predicted by the stored file in the last preset period of the current preset period can also be calculated by adopting a one-time exponential smoothing prediction formula, namely the calculation formula of the first read operation frequency is as follows:
YC_CR(t)=b*Y_CR(t)+(1-b)*YC_CR(t-1) (2)
the method comprises the steps of obtaining a first reading operation frequency, a second reading operation frequency and a third reading operation frequency, wherein YC _ CR (t) is a first future reading operation frequency, Y _ CR (t) is a second future reading operation frequency of a current preset period, and YC _ CR (t-1) is a first predicted reading operation frequency of a last preset period of the current preset period; b is a weighting coefficient, the value range (0, 1) of b is 0.5 by default, and the configuration can be carried out according to the actual service scene. The smaller the value of b is, the smaller the change of the predicted value is, and the larger the value of b is, the larger the change of the predicted value is.
Still taking the preset cycle described in the above step 101 as 2 weeks, with 1 day time interval as an example, the array of the number of read operations recorded in the current 2 weeks is CR 0 to CR 13, i.e. 14 numbers of read operations, after the maximum number and the minimum number are removed from the 14 numbers of read operations, the remaining 12 numbers of read operations are counted, then the second read operation frequency is calculated using the remaining 12 numbers of read operations, and the future first read operation frequency YC _ CR [ t ] is estimated using the above formula (2).
Wherein, if the time interval is 1 day, the unit of the second reading frequency is also (times/day).
Based on the above description of step 101 and step 105, it can be known that, by recording the number of file operations performed by the user according to a certain period (e.g. 2 weeks), and counting the operation frequency in one period each time, only the number of file operations performed by the user in one period each time is stored, and the occupied additional storage space is small, and no additional excessive cost is caused.
Step 106: and determining a compression strategy corresponding to the first operating frequency.
In this embodiment, after the first write storage of a file, as the business develops, the newly written file will be frequently read and written, but as time goes on, the read operation frequency and write operation frequency of the file will gradually decrease, even to almost no access. For example, during a business development process, it is most common that files are not written any more but are often read (recalled, viewed) some time later, and that files are rarely read and written and become extremely cold files some time later. The rule that the read operation frequency and the write operation frequency of the file are gradually reduced along with the time is basically suitable for all files, so that the number of times that the file is operated needs to be recorded according to a certain period, the operation frequency is counted periodically, the cold and hot degree of the file is judged, and a proper compression strategy is matched, so that the storage cost is saved.
Therefore, based on the compression rate, the compression speed and the decompression speed of the file by each compression tool shown in table 1, three compression strategies, namely a low-frequency compression strategy, a medium-frequency compression strategy and a high-frequency compression strategy, are conceived by setting a threshold value.
Wherein, the low-frequency compression strategy can be suitable for files which have low operation frequency and are hardly accessed; the medium-frequency compression strategy can be suitable for files with medium operating frequency, and the files can be written once and read for many times; the strategy can be applied to files with high operation frequency for high-frequency compression.
In one example, for a low-frequency compression strategy, a Bzip2Method compression tool can be adopted, the compression rate of the compression tool is high, but the compression speed and the decompression speed are low, so that the storage cost can be reduced; for the intermediate frequency compression strategy, a SnappyMethod compression tool can be adopted, the compression rate of the compression tool is moderate, the decompression efficiency is high, and the performance requirement of the service can be met; for the high-frequency compression strategy, an Lz4Mehtod compression tool can be adopted, and the compression speed and the decompression speed of the compression tool are high, so that the requirements of quick reading and writing can be met.
It will be understood by those skilled in the art that the three compression strategies are merely illustrative, and that more levels of compression strategies may be subdivided to further reduce the storage cost.
Based on this, in step 106, the determination process of the compression policy may be: if the first write operation frequency is lower than a first preset write operation frequency, or the first read operation frequency is lower than a first preset read operation frequency, selecting a low-frequency compression strategy for the stored file; if the first write operation frequency is between a first preset write operation frequency and a second preset write operation frequency, and the first read operation frequency is not lower than the first preset read operation frequency, selecting an intermediate frequency compression strategy for the stored file; and if the first write operation frequency is higher than the second preset write operation frequency and the first read operation frequency is not lower than the first preset read operation frequency, selecting a high-frequency compression strategy for the stored file.
For example, the first write operation frequency is Fw, the first read operation frequency is Fr, the first preset write operation frequency and the second preset write operation frequency are W1 and W2, respectively, and W1< W2, the first preset read operation frequency is R1.
Selecting a low frequency compression strategy if Fw < W1 or Fr < R1; if W1 is more than or equal to Fw and less than or equal to W2 and Fr is more than or equal to R1, selecting an intermediate frequency compression strategy; if Fw > W2 and Fr ≧ R1, a high-frequency compression strategy is selected.
It should be noted that, after the first time of the file writing and storing, the newly written file will be frequently read and written along with the business development, so that the default compression policy adopted when the file is first written and stored may be a high-frequency compression policy, that is, an Lz4Mehtod compression tool, so as to meet the requirement of subsequent fast reading and writing.
Step 107: if the compression strategy is different from the currently applied compression strategy of the stored file, the compression strategy is used for compressing the stored file again, and the compressed file is used for covering the stored file.
In step 107, since the stored file is a compressed file, when the stored file is compressed again by using a new compression policy, the currently applied compression policy may be used to decompress the stored file first, and then the new compression policy is used to compress the decompressed file, so as to obtain the compressed file.
It should be noted that, if the type of compression policy is the same as the compression policy currently applied to the stored file, the compression policy does not need to be adjusted, and the current process is ended.
As can be seen from the above description of step 106 and step 107, in the file life cycle process, the compression strategy is continuously and intelligently adjusted according to the periodically counted operation frequency, so that the storage cost is unknowingly reduced, and the service requirement is satisfied.
So far, the flow shown in fig. 1 is completed, the number of times that the user operates each file and the total operation time are recorded according to a certain period, and the operation frequency in one period is counted each time, so that the number of times that the user operates the file and the total operation time in one period are stored each time, the occupied additional storage space is small, and extra and excessive cost is not caused. Due to the fact that different file types and different decompression time are different, when a compression strategy is adjusted, the compression strategy can be adjusted according to the file types, the average consumed time of unit capacity operation of each file type is further obtained through the recorded consumed time of the operation, then a system busy degree coefficient for operating the corresponding file type is obtained according to the average consumed time of each file type, when the compression strategy adjustment of the files in the corresponding file types is determined according to the system busy degree coefficient, the compression strategy of the files can be intelligently adjusted in time according to the operation frequency obtained through statistics, the access service requirement can be met, the storage requirement can be guaranteed, the file types and the system busy degree are combined in the adjustment process of the file compression strategy, the adjustment of the file compression strategy can meet the requirement of the consumed time of user operation, the access efficiency is improved, and user experience is improved. And because the time period of each file requiring adjustment of the compression strategy is relatively fixed, the process of frequently adjusting the compression strategy does not occur, so that the quantity of the files requiring adjustment in each matching check is not large, and the extra calculation cost is not large. This scheme can be applied to wisdom city field, can promote the construction in wisdom city through this scheme.
Fig. 2 is a schematic diagram illustrating a hardware structure of a computer device according to an exemplary embodiment of the present invention. As shown in fig. 2, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can make the processor realize the file compression storage method described above when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of file compression storage. The network interface of the computer device is used for connecting and communicating with the terminal.
Those skilled in the art will appreciate that the configuration shown in fig. 2 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Corresponding to the embodiment of the file compression storage method, the invention also provides an embodiment of a file compression storage device.
Fig. 3 is a flowchart illustrating an embodiment of a file compression storage apparatus according to an exemplary embodiment of the present invention, where the file compression storage apparatus may be applied to a computer device, as shown in fig. 3, where the file compression storage apparatus includes:
the recording module 310 is configured to record, according to a preset period, the number of times that a user operates each stored file and the total operation time; wherein, each stored file is correspondingly recorded with the file type and the file size;
an average time consumption calculating module 320, configured to calculate, for each file type, an average time consumption of unit capacity operation of the file type according to the file size of the stored file belonging to the file type and a total time consumption of operations of the stored file belonging to the file type recorded in a current preset period;
a determining module 330, configured to determine a system busy degree coefficient for operating the file type according to the average consumed time, and determine whether a compression policy needs to be determined for the stored file belonging to the file type according to the system busy degree coefficient;
the recompression module 340 is configured to, when a compression policy needs to be determined, count a first operation frequency according to the number of times of the stored file recorded in a current preset period, determine a compression policy corresponding to the first operation frequency, and if the compression policy is different from a compression policy currently applied to the stored file, re-compress the stored file using the compression policy, and cover the stored file with the compressed file.
In an optional implementation manner, the determining module 330 is specifically configured to determine, in the process of determining a system busy degree coefficient for operating the file type according to the average consumed time, that the system busy degree coefficient is a first coefficient if the average consumed time is less than a first preset consumed time; if the average consumed time is between a first preset consumed time and a second preset consumed time, determining a system busy degree coefficient as a first coefficient; if the average consumed time is larger than a second preset consumed time, determining a system busy degree coefficient as a third coefficient; wherein the average elapsed time is inversely related to the system busy level coefficient.
In an optional implementation manner, the determining module 330 is specifically configured to determine that a compression policy needs to be determined for all stored files belonging to the file type if the system busy degree coefficient is a first coefficient in a process of determining whether a compression policy needs to be determined for the stored files belonging to the file type according to the system busy degree coefficient; if the system busyness coefficient is a second coefficient, generating a random probability for each stored file belonging to the file type, and if the random probability is smaller than a preset probability, determining that a compression strategy needs to be determined for the stored files; and if the system busy degree coefficient is a third coefficient, determining that the compression strategy does not need to be determined for the stored file belonging to the file type.
In an optional implementation manner, the recording module 310 is specifically configured to divide the preset period into a plurality of time intervals in each preset period; recording the times of operating the stored files by the user in each time interval and the time consumed for operating the times; and recording the sum of the time consumption recorded in the plurality of time intervals as the total operation time consumption.
In an optional implementation manner, the operations include write operations and read operations, and the recompression module 340 is specifically configured to, in the process of counting the first operation frequency according to the number of times of the stored file recorded in the current preset period, remove abnormal times from the number of times of write operations corresponding to each time interval in the current preset period; calculating a second write operation frequency of the stored file in the current preset period by using the number of the remaining write operations; and predicting the future first write operation frequency of the stored file according to the second write operation frequency and the first write operation frequency predicted by the stored file in the last preset period of the current preset period.
In an optional implementation manner, the recompression module 340 is specifically configured to, in the process of counting the first operation frequency according to the number of times of the stored file recorded in the current preset period, remove abnormal number of times from the number of times of read operations corresponding to each time interval in the current preset period; calculating a second reading operation frequency of the stored file in the current preset period by using the number of the remaining reading operations; and estimating a first reading operation frequency of the stored file in the future according to the second reading operation frequency and a first reading operation frequency estimated by the stored file in the previous preset period of the current preset period.
In an optional implementation manner, the formula for predicting the future first write operation frequency of the stored file according to the second write operation frequency and the first write operation frequency predicted by the stored file in the previous preset period of the current preset period is as follows: YC _ CW (t) = a × Y _ CW (t) + (1-a) × YC _ CW (t-1); the method comprises the following steps that YC _ CW (t) is a first future writing operation frequency, Y _ CW (t) is the second writing operation frequency, YC _ CW (t-1) is a first writing operation frequency estimated in a last preset period, and a is a first preset weight;
the formula for estimating the future first read operation frequency of the stored file according to the second read operation frequency and the estimated first read operation frequency of the stored file in the previous preset period of the current preset period is as follows: YC _ CR (t) = b × Y _ CR (t) + (1-b) × YC _ CR (t-1); the first read frequency is a first future read frequency, the second read frequency is a second future read frequency, the first read frequency is a first predicted read frequency of a previous predetermined period, and the second read frequency is a second predicted read frequency of a previous predetermined period.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The present invention also provides another embodiment, which is to provide a computer-readable storage medium, having a computer program stored thereon, where the computer program is executable by at least one processor to cause the at least one processor to perform the steps of any one of the file compression storage methods described above.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A file compression storage method is characterized by comprising the following steps:
recording the times of operating each stored file by a user and the total operation time according to a preset period; wherein, each stored file is correspondingly recorded with the file type and the file size;
for each file type, calculating the average consumed time of unit capacity operation of the file type according to the file size of the stored file belonging to the file type and the total consumed time of the operation of the stored file belonging to the file type, which is recorded in the current preset period; in each preset period, dividing the preset period into a plurality of time intervals, recording the times of operating the stored files in each time interval by a user and the consumed time of operating the times, and recording the sum of the consumed time recorded in the time intervals as the total consumed time of the operation;
determining a system busy degree coefficient for operating the file type according to the average consumed time, and determining whether a compression strategy needs to be determined for the stored files belonging to the file type according to the system busy degree coefficient; the larger the average consumed time is, the more busy the system is, and the longer the time required for operating the file is, wherein the system busy degree coefficient includes a first coefficient, a second coefficient, and a third coefficient, and if the system busy degree coefficient is the first coefficient, it is determined that a compression policy needs to be determined for all stored files belonging to the file type; if the system busyness coefficient is a second coefficient, generating a random probability for each stored file belonging to the file type, and if the random probability is smaller than a preset probability, determining that a compression strategy needs to be determined for the stored files; if the system busy degree coefficient is a third coefficient, determining that a compression strategy does not need to be determined for the stored file belonging to the file type;
if so, counting a first operating frequency according to the number of times of the stored file recorded in the current preset period, determining a compression strategy corresponding to the first operating frequency, if the compression strategy is different from the currently applied compression strategy of the stored file, re-compressing the stored file by using the compression strategy, and covering the stored file by using the compressed file.
2. The method of claim 1, wherein determining a system busy level coefficient for operating on the file type according to the average elapsed time comprises:
if the average consumed time is less than a first preset consumed time, determining a system busy degree coefficient as a first coefficient;
if the average consumed time is between a first preset consumed time and a second preset consumed time, determining a system busy degree coefficient as a second coefficient;
if the average consumed time is larger than a second preset consumed time, determining a system busy degree coefficient as a third coefficient;
wherein the average elapsed time is inversely related to the system busy level coefficient.
3. The method of claim 1, wherein the operations comprise a write operation and a read operation, and the counting the first operating frequency according to the number of times of the stored file recorded in the current preset period comprises:
eliminating abnormal times from the times of write operation corresponding to each time interval in the current preset period;
calculating a second write operation frequency of the stored file in the current preset period by using the number of the remaining write operations;
and predicting the future first write operation frequency of the stored file according to the second write operation frequency and the first write operation frequency predicted by the stored file in the last preset period of the current preset period.
4. The method of claim 3, wherein counting the first operating frequency according to the number of times of the stored file recorded in the current predetermined period comprises:
eliminating abnormal times from the times of the reading operation corresponding to each time interval in the current preset period;
calculating a second reading operation frequency of the stored file in the current preset period by using the number of the remaining reading operations;
and estimating a first reading operation frequency of the stored file in the future according to the second reading operation frequency and a first reading operation frequency estimated by the stored file in the previous preset period of the current preset period.
5. The method of claim 4, wherein the formula for estimating the future first write frequency of the stored file based on the second write frequency and the estimated first write frequency of the stored file at a previous predetermined period of the current predetermined period is: YC _ CW (t) = a × Y _ CW (t) + (1-a) × YC _ CW (t-1); YC _ CW (t) is a first future write operation frequency, Y _ CW (t) is the second write operation frequency, YC _ CW (t-1) is a first write operation frequency estimated in a last preset period, and a is a first preset weight;
the formula for estimating the future first read operation frequency of the stored file according to the second read operation frequency and the estimated first read operation frequency of the stored file in the previous preset period of the current preset period is as follows: YC _ CR (t) = b × Y _ CR (t) + (1-b) × YC _ CR (t-1); the first read operation frequency is a first read operation frequency in the future, the second read operation frequency is a second read operation frequency Y _ CR (t), the first read operation frequency estimated in the last preset period is a YC _ CR (t-1), and the second preset weight is a b.
6. A file compression storage apparatus, the apparatus comprising:
the recording module is used for recording the times of the user operating each stored file and the total operation time according to a preset period; wherein, each stored file is correspondingly recorded with the file type and the file size;
the average time consumption calculation module is used for calculating the average time consumption of unit capacity operation of each file type according to the file size of the stored file belonging to the file type and the total operation time consumption of the stored file belonging to the file type, which is recorded in the current preset period; in each preset period, dividing the preset period into a plurality of time intervals, recording the times of operating the stored files in each time interval by a user and the consumed time of operating the times, and recording the sum of the consumed time recorded in the time intervals as the total consumed time of the operation;
the judging module is used for determining a system busy degree coefficient for operating the file type according to the average consumed time and determining whether a compression strategy needs to be determined for the stored files belonging to the file type according to the system busy degree coefficient; the larger the average consumed time is, the more busy the system is, the longer the time required for operating the file is, the system busy degree coefficient comprises a first coefficient, a second coefficient and a third coefficient, and if the system busy degree coefficient is the first coefficient, it is determined that a compression strategy needs to be determined for all the stored files belonging to the file type; if the system busyness coefficient is a second coefficient, generating a random probability for each stored file belonging to the file type, and if the random probability is smaller than a preset probability, determining that a compression strategy needs to be determined for the stored files; if the system busy degree coefficient is a third coefficient, determining that a compression strategy does not need to be determined for the stored file belonging to the file type;
and the recompression module is used for counting a first operation frequency according to the number of times of the stored file recorded in the current preset period when the compression strategy is required to be determined, determining the compression strategy corresponding to the first operation frequency, if the compression strategy is different from the currently applied compression strategy of the stored file, recompressing the stored file by using the compression strategy again, and covering the stored file by using the compressed file.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 5 when executing the program.
8. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the method for compressing and storing a file according to any one of claims 1 to 5.
CN202011027617.4A 2020-09-25 2020-09-25 File compression storage method, device, equipment and storage medium Active CN112100143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011027617.4A CN112100143B (en) 2020-09-25 2020-09-25 File compression storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011027617.4A CN112100143B (en) 2020-09-25 2020-09-25 File compression storage method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112100143A CN112100143A (en) 2020-12-18
CN112100143B true CN112100143B (en) 2023-03-21

Family

ID=73755610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011027617.4A Active CN112100143B (en) 2020-09-25 2020-09-25 File compression storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112100143B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461589B (en) * 2021-08-24 2023-04-11 荣耀终端有限公司 Method for reading compressed file, file system and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675789A (en) * 1992-10-22 1997-10-07 Nec Corporation File compression processor monitoring current available capacity and threshold value
WO2016058333A1 (en) * 2014-10-15 2016-04-21 中兴通讯股份有限公司 Data recovery method and device for database, and computer storage medium
CN107589910A (en) * 2017-09-01 2018-01-16 厦门集微科技有限公司 The method and system of the high in the clouds data management of user's custom strategies
US9984090B1 (en) * 2014-03-13 2018-05-29 EMC IP Holding Company LLC Method and system for compressing file system namespace of a storage system
CN109800182A (en) * 2019-01-18 2019-05-24 深圳忆联信息系统有限公司 It is a kind of to reduce the data storage handling method and its system for writing amplification

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9952937B2 (en) * 2013-10-28 2018-04-24 Openet Telecom Ltd. Method and system for reducing journaling log activity in databases

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675789A (en) * 1992-10-22 1997-10-07 Nec Corporation File compression processor monitoring current available capacity and threshold value
US9984090B1 (en) * 2014-03-13 2018-05-29 EMC IP Holding Company LLC Method and system for compressing file system namespace of a storage system
WO2016058333A1 (en) * 2014-10-15 2016-04-21 中兴通讯股份有限公司 Data recovery method and device for database, and computer storage medium
CN105573859A (en) * 2014-10-15 2016-05-11 中兴通讯股份有限公司 Data recovery method and device of database
CN107589910A (en) * 2017-09-01 2018-01-16 厦门集微科技有限公司 The method and system of the high in the clouds data management of user's custom strategies
CN109800182A (en) * 2019-01-18 2019-05-24 深圳忆联信息系统有限公司 It is a kind of to reduce the data storage handling method and its system for writing amplification

Also Published As

Publication number Publication date
CN112100143A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
US10620839B2 (en) Storage pool capacity management
US9116936B2 (en) Inline learning-based selective deduplication for primary storage systems
JP5699715B2 (en) Data storage device and data storage method
CN102511043A (en) Method for replacing cache files, device and system thereof
CN106649145A (en) Self-adaptive cache strategy updating method and system
CN109144791A (en) Data conversion storage method, apparatus and data management server
US10248618B1 (en) Scheduling snapshots
WO2019225652A1 (en) Model generation device for lifespan prediction, model generation method for lifespan prediction, and storage medium storing model generation program for lifespan prediction
US20140258672A1 (en) Demand determination for data blocks
CN112100143B (en) File compression storage method, device, equipment and storage medium
CN112734982A (en) Storage method and system for unmanned vehicle driving behavior data
CN109597800A (en) A kind of log distribution method and device
CN107168643A (en) A kind of date storage method and device
US10983888B1 (en) System and method for generating dynamic sparse exponential histograms
US20160253591A1 (en) Method and apparatus for managing performance of database
US10802943B2 (en) Performance management system, management device, and performance management method
CN110187840A (en) A kind of data migration method, device, server and storage medium
KR102212108B1 (en) Storage Orchestration Learning Optimization Target Volume Selection Method
CN115499513A (en) Data request processing method and device, computer equipment and storage medium
CN109634525B (en) Method and system for estimating effective capacity of storage system and related components
CN112669091A (en) Data processing method, device and storage medium
CN112306824B (en) Disk performance evaluation method, system, device and computer readable storage medium
CN116661683B (en) Wear balance management method, system, equipment and medium for flash memory
CN112306823B (en) Disk management method, system, device and computer readable storage medium
CN113641681B (en) Space self-adaptive mass data query method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant