CN113641308B

CN113641308B - Compressed file index increment updating method and device and electronic equipment

Info

Publication number: CN113641308B
Application number: CN202110926907.0A
Authority: CN
Inventors: 顾凌云; 郭志攀; 王伟; 李军军
Original assignee: Nanjing Bingjian Information Technology Co ltd
Current assignee: Nanjing Bingjian Information Technology Co ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2024-04-23
Anticipated expiration: 2041-08-12
Also published as: CN113641308A

Abstract

The application provides a compressed file index increment updating method, a device and electronic equipment, wherein the method comprises the following steps: obtaining a compressed index file; determining the initial position of a target data block; the target data block comprises a header file and a data file; reading a header file of the target data block according to the starting position of the target data block; determining whether the target data block is compressed according to the information recorded in the header file of the target data block; determining the length of the whole target data block according to whether the target data block is compressed or not; determining the starting position of the next data block according to the length of the whole target data block and the starting position of the target data block, and writing the starting position of the next data block into a compressed index file; the next data block is used as a new target data block, and the incremental updating of the compressed index file can be realized by cycling in this way, so that the full updating of the whole compressed index file every time is avoided, and the processing time and the computing resources are saved.

Description

Compressed file index increment updating method and device and electronic equipment

Technical Field

The application relates to the technical field of big data processing, in particular to a compressed file index increment updating method and device and electronic equipment.

Background

In the big data processing process, the acquired data is often required to be compressed and stored regularly, so as to save the storage space. The data blocks are formed after the file is compressed, and data block indexes are built for facilitating subsequent distributed computation. Currently, after a new data block is generated by updating the data to be processed, the index file is usually regenerated for all the data blocks. In a big data scenario, the file size is often large, and the time required for regenerating the index is long, which wastes a lot of computation resources and time.

Disclosure of Invention

In order to overcome the above-mentioned shortcomings in the prior art, an object of the present application is to provide a method for updating compressed file index increment, which includes:

Obtaining a compressed index file, wherein the compressed index file is used for indicating the position of each data block in a compressed data file;

According to the compressed index file, determining the starting position of the last data block with the established index recorded in the compressed data file as the starting position of the target data block; the target data block comprises a header file and a data file;

reading a header file of the target data block according to the starting position of the target data block;

Determining whether the target data block is compressed according to the information recorded in the header file of the target data block;

Determining the length of the whole target data block according to whether the target data block is compressed or not;

Determining the starting position of the next data block according to the length of the whole target data block and the starting position of the target data block, and writing the starting position of the next data block into the compressed index file;

And taking the next data block as a new target data block, and repeatedly executing from the initial position according to the target data block to read the header file of the target data block until all the data blocks are processed.

Optionally, the header file includes a pre-compression file length field, a post-compression file length field, and a check code field;

The step of determining whether the target data block is compressed according to the information recorded in the header file of the target data block includes:

Acquiring the file length before compression and the file length after compression of the target data block;

Detecting whether the file length before compression is equal to the file length after compression;

If the file length before compression is equal to the file length after compression, determining that the target data block is compressed;

If the file length before compression and the file length after compression are not equal, determining that the target data block is not compressed:

The step of determining the length of the whole target data block according to whether the target data block is compressed or not includes:

If the target data block is compressed, determining that the length of the check code field is the length of the check code of the data before compression plus the length of the check code of the data after compression;

if the target data block is not compressed, the length of the check code field is the length of the check code of the data before compression;

And determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed data file is an LZO file; the step of determining the starting position of the last data block of the established index recorded in the compressed data file as the starting position of the target data block according to the compressed index file comprises the following steps:

acquiring data of the last 8 bytes of the compressed index file;

and converting the acquired data into long-form data and then taking the long-form data as the starting position of the last data block of the established index.

Optionally, the step of obtaining the file length before compression and the file length after compression of the target data block includes:

And reading data with the length of 4 bits from the file head of the target data block as the file length before compression, and reading data with the length of 4 bits back again as the file length after compression.

Optionally, the method further comprises:

obtaining data to be processed according to a first set period, and generating a data block according to the data to be processed;

executing the step of acquiring the compressed index file and the subsequent steps according to a second set period;

Wherein the second set period is greater than the first set period.

The application also provides a compressed file index increment updating device, which comprises:

the index acquisition module is used for acquiring a compressed index file, wherein the compressed index file is used for indicating the position of each data block in the compressed data file;

The data block determining module is used for determining the starting position of the last data block of the established index recorded in the compressed data file as the starting position of the target data block according to the compressed index file; the target data block comprises a header file and a data file;

The data block reading module is used for reading the header file of the target data block according to the starting position of the target data block;

The data block detection module is used for determining whether the target data block is compressed or not according to the information recorded in the header file of the target data block; determining the length of the whole target data block according to whether the target data block is compressed or not;

The index updating module is used for determining the starting position of the next data block according to the length of the whole target data block and the starting position of the target data block and writing the starting position of the next data block into the compressed index file;

And the circulation processing module is used for taking the next data block as a new target data block, and repeatedly executing the reading of the header file of the target data block from the starting position according to the target data block until all the data blocks are processed.

The data block detection module is specifically configured to obtain a file length before compression and a file length after compression of the target data block; detecting whether the file length before compression is equal to the file length after compression; if the file length before compression is equal to the file length after compression, determining that the target data block is compressed; if the file length before compression and the file length after compression are not equal, determining that the target data block is not compressed; if the target data block is compressed, determining that the length of the check code field is the length of the data check code before compression plus the length of the data check code after compression; if the target data block is not compressed, the length of the check code field is the length of the data check code before compression; and determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed file is an LZO file;

The data block determining module is specifically configured to obtain data of last 8 bytes of the compressed index file; converting the acquired data into long data serving as the starting position of the last data block of the established index;

The data block reading module is specifically configured to read, from the header of the target data block, data with a length of 4 bits as the file length before compression, and read, again, data with a length of 4 bits back as the file length after compression.

The application also provides an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions which, when executed by the processor, implement the compressed file index increment updating method provided by the application.

The present application also provides a machine-readable storage medium storing machine-executable instructions that, when executed by one or more processors, implement the compressed file index delta updating method provided by the present application.

Compared with the prior art, the application has the following beneficial effects:

According to the method, the device and the electronic equipment for updating the increment of the compressed file index, after the compressed data file is increased, the starting position of the last data block of the established index recorded in the compressed index file is determined to be the starting position of the target data block, the head file of the target data block is read, and then the starting address of the next data block is determined to be written into the compressed index file according to whether the target data block is updated or not, so that the increment of the compressed index file can be updated in a circulating way, the whole compressed index file is prevented from being updated in a full amount each time, and the processing time and the computing resources are saved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a method for updating compressed file index increment according to an embodiment of the present application;

fig. 2 is a schematic diagram of an electronic device according to an embodiment of the present application;

Fig. 3 is a schematic functional block diagram of a compressed file index increment updating apparatus according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present application, it should be noted that the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

Referring to fig. 1, fig. 1 is a schematic diagram of a method for updating an increment of a compressed file index according to the present embodiment, and the method may include the following steps.

Step S110, a compressed index file is obtained, wherein the compressed index file is used for indicating the position of each data block in the compressed data file.

In this embodiment, the start address of each data block may be recorded in the compressed index file. Alternatively, the compressed data file is a LZO file, and the compressed index file may be an index of the LZO file.

Step S120, determining, according to the compressed index file, a start position of a last data block of the established index recorded in the compressed data file as a start position of a target data block. The target data block includes a header file and a data file.

In this embodiment, the address of the last data block recorded in the compressed index file is recorded as the start address of the target database, that is, the last data block with an index already established.

Optionally, in this embodiment, when the compressed data file is an LZO file, the act of determining, according to the compressed index file, a start position of a last data block of the established index recorded in the compressed data file as a start position of a target data block may include: acquiring data of the last 8 bytes of the compressed index file; and converting the acquired data into long-form data and then taking the long-form data as the starting position of the last data block of the established index.

Step S130, reading the header file of the target data block according to the starting position of the target data block.

In this embodiment, after determining the start position of the target data block, the header file of the target data block may be read from the start position.

Step S140, determining whether the target data block is compressed according to the information recorded in the header file of the target data block.

Specifically, in this embodiment, the header file includes a pre-compression file length field, a post-compression file length field, and a check code field. Alternatively, in this embodiment, when the compressed data file is an LZO file, data with a length of 4 bits may be read as the pre-compression file length from the file header of the target data block, and data with a length of 4 bits may be read back again as the post-compression file length.

Since the data block is generated, if the data to be processed is too small, if the data to be processed is compressed, the compressed data may be larger than the original data, and thus some data in the data block may not be compressed. Whereas the header file sizes of the compressed data blocks and the uncompressed data blocks are different. In this embodiment, therefore, the overall size of the target database needs to be determined according to whether the data block is compressed or not.

Specifically, the pre-compression file length and the post-compression file length of the target data block may be acquired, and then it is detected whether the pre-compression file length and the post-compression file length are equal.

and if the file length before compression and the file length after compression are not equal, determining that the target data block is not compressed.

Step S150, determining the length of the whole target data block according to whether the target data block is compressed.

Specifically, if the target data block is compressed, determining that the length of the check code field is the length of the check code of the data before compression plus the length of the check code of the data after compression. Namely, the check code field contains the check code of the data before compression and the check code of the data after compression.

And if the target data block is not compressed, the length of the check code field is the length of the check code of the data before compression. I.e. the check code field has no check code of the data after compression and only has the check code of the data before compression.

And determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression. In this embodiment, the length of the entire target data block is the sum of the length of the file length field before compression, the length of the file length field after compression, the length of the check code field, and the length of the file after compression.

Step S160, determining a starting position of a next data block according to the length of the entire target data block and the starting position of the target data block, and writing the starting position into the compressed index file.

In this embodiment, the length of the entire target data block may be added to the starting position of the target data block, so that the starting position of the next data block may be obtained, and then the starting position of the next data block may be additionally written into the compressed index file.

Step S170, taking the next data block as a new target data block, and repeating the steps from the initial position according to the target data block to read the header file of the target data block until all the data blocks are processed.

In other words, after determining the starting position of the new target database, the subsequent steps may be repeatedly performed from step S130, knowing that the indexes of all the data blocks are written to the compressed index file.

Also in this embodiment, the method according to claim 1, characterized in that the method further comprises:

And acquiring data to be processed according to a first set period, and generating a data block according to the data to be processed.

Wherein the second set period is greater than the first set period.

For example, the data to be processed is acquired every 5 minutes, and whether the data block is generated after compression is selected according to the size of the acquired data to be processed. The compressed index file is then incrementally updated every 1 hour based on the generated data.

Referring to fig. 2, fig. 1 is an electronic device 100 according to an embodiment of the present application, where the electronic device 100 may be, but is not limited to, an electronic device with digital processing capability such as a server, a personal computer, etc. The electronic device 100 comprises a compressed file index increment updating means 110, a machine readable storage medium 120, a processor 130, a communication unit 140.

The machine-readable storage medium 120, the processor 130, and the communication unit 140 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The compressed file index increment updating means 110 comprises at least one software function module which may be stored in the machine readable storage medium 120 in the form of software or firmware (firmware) or cured in an Operating System (OS) of the electronic device 100. The processor 130 is configured to execute executable modules stored in the machine-readable storage medium 120, such as software functional modules and computer programs included in the compressed file index increment updating apparatus 110.

The machine-readable storage medium 120 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. Wherein the machine-readable storage medium 120 is used to store a program, and the processor 130 executes the program after receiving the execution instruction.

The processor 130 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 3, the present embodiment further provides a compressed file index increment updating apparatus 110, where the compressed file index increment updating apparatus 110 includes at least one functional module that can be stored in a machine-readable storage medium 120 in a software form. Functionally divided, the compressed file index increment updating apparatus 110 may include an index acquisition module 111, a data block determination module 112, a data block reading module 113, a data block detection module 114, an index update module 115, and a loop processing module 116.

The index obtaining module 111 is configured to obtain a compressed index file, where the compressed index file is used to indicate a location of each data block in the compressed data file.

In this embodiment, the index obtaining module 111 may be used to perform step S110 shown in fig. 1, and a specific description of the index obtaining module 111 may refer to a description of the step S110.

The data block determining module 112 is configured to determine, according to the compressed index file, a start position of a last data block of the established index recorded in the compressed data file as a start position of a target data block. The target data block includes a header file and a data file.

In this embodiment, the data block determining module 112 may be configured to perform step S120 shown in fig. 1, and a specific description of the data block determining module 112 may refer to a description of the step S120.

And the data block reading module 113 is configured to read a header file of the target data block according to the starting position of the target data block.

In this embodiment, the data block reading module 113 may be used to perform step S130 shown in fig. 1, and a specific description of the data block reading module 113 may refer to a description of the step S130.

The data block detection module 114 determines whether the target data block is compressed according to the information recorded in the header file of the target data block. And determining the length of the whole target data block according to whether the target data block is compressed or not.

In this embodiment, the data block detecting module 114 may be used to perform steps S140 and S150 shown in fig. 1, and a specific description of the data block detecting module 114 may refer to descriptions of the steps S140 and S150.

The index updating module 115 is configured to determine, according to the length of the entire target data block and the start position of the target data block, that the start position of the next data block is written into the compressed index file, and take the next data block as a new target data block.

In this embodiment, the index updating module 115 may be used to perform step S160 shown in fig. 1, and a specific description of the index updating module 115 may refer to a description of step S160.

And the loop processing module 116 is configured to repeatedly perform reading the header file of the target data block from the starting position according to the target data block according to the starting position of the new target data block until all the data blocks are processed.

In this embodiment, the loop processing module 116 may be used to execute step S170 shown in fig. 1, and a specific description of the loop processing module 116 may refer to a description of step S170.

Optionally, the header file includes a pre-compression file length field, a post-compression file length field, and a check code field.

The data block detection module 114 is specifically configured to obtain a pre-compression file length and a post-compression file length of the target data block. And detecting whether the file length before compression and the file length after compression are equal. And if the file length before compression is equal to the file length after compression, determining that the target data block is compressed. And if the file length before compression and the file length after compression are not equal, determining that the target data block is not compressed. And if the target data block is compressed, determining that the length of the check code field is the length of the data check code before compression plus the length of the data check code after compression. And if the target data block is not compressed, the length of the check code field is the length of the data check code before compression. And determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed file is an LZO file.

The data block determining module 112 is specifically configured to obtain the last 8 bytes of data of the compressed index file. And converting the acquired data into long data serving as the starting position of the last data block of the established index.

The data block reading module 113 is specifically configured to read, from the header of the target data block, data with a length of 4 bits as the file length before compression, and read, back again, data with a length of 4 bits as the file length after compression.

In summary, according to the method, the device and the electronic equipment for updating the increment of the compressed file index provided by the application, after the compressed data file is increased, the starting position of the last data block with the established index recorded in the compressed index file is determined as the starting position of the target data block, the header file of the target data block is read, and then the starting address of the next data block is determined to be written into the compressed index file according to whether the target data block is updated or not, so that the increment of the compressed index file can be updated in a circulating way, thereby avoiding the total update of the whole compressed index file each time, and saving the processing time and the computing resources.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is merely illustrative of various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present application, and the application is intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for updating compressed file index delta, the method comprising:

2. The method of claim 1, wherein the header file includes a pre-compression file length field, a post-compression file length field, and a check code field;

if the file length before compression and the file length after compression are not equal, determining that the target data block is not compressed;

3. The method of claim 2, wherein the compressed data file is an LZO file; the step of determining the starting position of the last data block of the established index recorded in the compressed data file as the starting position of the target data block according to the compressed index file comprises the following steps:

acquiring data of the last 8 bytes of the compressed index file;

4. The method of claim 2, wherein the step of obtaining the pre-compression file length and the post-compression file length of the target data block comprises:

5. The method according to claim 1, wherein the method further comprises:

Wherein the second set period is greater than the first set period.

6. A compressed file index delta update apparatus, the apparatus comprising:

7. The apparatus of claim 6, wherein the header file includes a pre-compression file length field, a post-compression file length field, and a check code field;

8. The apparatus of claim 7, wherein the compressed file is an LZO file;

9. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions which, when executed by the processor, implement the method of any one of claims 1-7.

10. A machine-readable storage medium storing machine-executable instructions which, when executed by one or more processors, implement the method of any one of claims 1-7.