CN113641308A

CN113641308A - Compressed file index increment updating method and device and electronic equipment

Info

Publication number: CN113641308A
Application number: CN202110926907.0A
Authority: CN
Inventors: 顾凌云; 郭志攀; 王伟; 李军军
Original assignee: Nanjing Bingjian Information Technology Co ltd
Current assignee: Nanjing Bingjian Information Technology Co ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-12
Anticipated expiration: 2041-08-12
Also published as: CN113641308B

Abstract

The application provides a method and a device for updating index increment of a compressed file and electronic equipment, wherein the method comprises the following steps: acquiring a compressed index file; determining the starting position of the target data block; the target data block comprises a header file and a data file; reading a header file of the target data block according to the initial position of the target data block; determining whether the target data block is compressed according to information recorded in a header file of the target data block; determining the length of the whole target data block according to whether the target data block is compressed or not; determining the initial position of the next data block according to the length of the whole target data block and the initial position of the target data block and writing the initial position into a compressed index file; the next data block is taken as a new target data block, and the incremental updating of the compressed index file can be realized by the circulation, so that the whole compressed index file is prevented from being updated in a full amount every time, and the processing time and the computing resources are saved.

Description

Compressed file index increment updating method and device and electronic equipment

Technical Field

The application relates to the technical field of big data processing, in particular to a method and a device for updating index increment of a compressed file and electronic equipment.

Background

In the process of big data processing, the obtained data is often required to be compressed and stored regularly, so as to save the storage space. And after the file is compressed, a data block is formed, and a data block index is established for facilitating subsequent distributed computation. Currently, after a data block to be processed is updated and a new data block is generated, an index file is generated again for all data blocks. In a large data scenario, the file size is often large, and the time required for re-generating the index is long, which wastes a large amount of computing resources and time.

Disclosure of Invention

In order to overcome the above-mentioned deficiencies in the prior art, the present application aims to provide a compressed file index increment updating method, which includes:

acquiring a compressed index file, wherein the compressed index file is used for indicating the position of each data block in a compressed data file;

according to the compressed index file, determining the initial position of the last data block of the established index recorded in the compressed data file as the initial position of the target data block; the target data block comprises a header file and a data file;

reading a header file of the target data block according to the initial position of the target data block;

determining whether the target data block is compressed according to information recorded in a header file of the target data block;

determining the length of the whole target data block according to whether the target data block is compressed or not;

determining the initial position of the next data block according to the length of the whole target data block and the initial position of the target data block and writing the initial position into the compressed index file;

and taking the next data block as a new target data block, and repeatedly executing from the beginning of reading the header file of the target data block according to the initial position of the target data block until all data blocks are processed.

Optionally, the header file includes a file length field before compression, a file length field after compression, and a check code field;

the step of determining whether the target data block is compressed according to information recorded in a header file of the target data block includes:

acquiring the file length of the target data block before compression and the file length of the target data block after compression;

detecting whether the length of the file before compression is equal to that of the file after compression;

if the length of the file before compression is equal to that of the file after compression, determining that the target data block is compressed;

if the file length before the compression is not equal to the file length after the compression, determining that the target data block is not compressed:

the step of determining the length of the entire target data block according to whether the target data block is compressed includes:

if the target data block is compressed, determining the length of the check code field as the length of the check code of the data before compression plus the length of the check code of the data after compression;

if the target data block is not compressed, the length of the check code field is the length of the check code of the data before compression;

and determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed data file is an LZO file; the step of determining the start position of the last data block of the established index recorded in the compressed data file as the start position of the target data block according to the compressed index file includes:

acquiring the data of the last 8 bytes of the compressed index file;

and converting the acquired data into long and integer data and then taking the long and integer data as the initial position of the last data block of the established index.

Optionally, the step of obtaining the file length of the target data block before compression and the file length of the target data block after compression includes:

and reading 4-bit-length data from the file header of the target data block as the file length before compression, and reading 4-bit-length data backwards again as the file length after compression.

Optionally, the method further comprises:

acquiring data to be processed according to a first set period, and generating a data block according to the data to be processed;

executing the step of obtaining the compressed index file and the subsequent steps according to a second set period;

wherein the second setting period is greater than the first setting period.

The present application further provides a compressed file index increment updating apparatus, the apparatus comprising:

the index acquisition module is used for acquiring a compressed index file, and the compressed index file is used for indicating the position of each data block in a compressed data file;

a data block determining module, configured to determine, according to the compressed index file, a start position of a last data block of an established index recorded in the compressed data file as a start position of a target data block; the target data block comprises a header file and a data file;

the data block reading module is used for reading a header file of the target data block according to the initial position of the target data block;

the data block detection module is used for determining whether the target data block is compressed according to the information recorded in the header file of the target data block; determining the length of the whole target data block according to whether the target data block is compressed or not;

the index updating module is used for determining the initial position of the next data block according to the length of the whole target data block and the initial position of the target data block and writing the initial position into the compressed index file;

and the cyclic processing module is used for taking the next data block as a new target data block, and repeatedly executing the data block after reading the header file of the target data block according to the initial position of the target data block until all the data blocks are processed.

the data block detection module is specifically used for acquiring the file length before compression and the file length after compression of the target data block; detecting whether the length of the file before compression is equal to that of the file after compression; if the length of the file before compression is equal to that of the file after compression, determining that the target data block is compressed; if the length of the file before compression is not equal to that of the file after compression, determining that the target data block is not compressed; if the target data block is compressed, determining the length of the check code field as the length of the data check code before compression plus the length of the data check code after compression; if the target data block is not compressed, the length of the check code field is the length of the data check code before compression; and determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed file is an LZO file;

the data block determining module is specifically configured to obtain data of the last 8 bytes of the compressed index file; converting the acquired data into long and integer data as the initial position of the last data block of the established index;

the data block reading module is specifically configured to read 4-bit-length data from the file header of the target data block as the file length before compression, and read 4-bit-length data backwards again as the file length after compression.

The application also provides an electronic device, which comprises a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed by the processor, the method for updating the index increment of the compressed file provided by the application is realized.

The present application further provides a machine-readable storage medium having stored thereon machine-executable instructions that, when executed by one or more processors, implement the compressed file index delta update method provided herein.

Compared with the prior art, the method has the following beneficial effects:

according to the method, the device and the electronic equipment for updating the compressed file index increment, after the compressed data file is increased, the initial position of the last data block of the established index recorded in the compressed index file is determined as the initial position of the target data block, the header file of the target data block is read, and then the initial address of the next data block is written into the compressed index file according to whether the target data block is updated or not.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a diagram illustrating a compressed file index increment updating method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an electronic device provided in an embodiment of the present application;

fig. 3 is a schematic diagram of functional modules of a compressed file index increment updating apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is noted that the terms "first", "second", "third", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.

In the description of the present application, it is further noted that, unless expressly stated or limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The meaning of the above terms in this application can be understood in a specific context by those skilled in the art.

Referring to fig. 1, fig. 1 is a schematic diagram of a compressed file index increment updating method according to this embodiment, where the method includes the following steps.

Step S110, a compressed index file is obtained, where the compressed index file is used to indicate the position of each data block in the compressed data file.

In this embodiment, the compressed index file may have a start address of each data block recorded therein. Alternatively, the compressed data file may mean an LZO file, and the compressed index file may be an index of the LZO file.

Step S120, according to the compressed index file, determining a start position of a last data block of the established index recorded in the compressed data file as a start position of a target data block. The target data block includes a header file and a data file.

In this embodiment, the address of the last data block recorded in the compressed index file is referred to as the starting address of the target database, i.e. the last indexed data block.

Optionally, in this embodiment, when the compressed data file is an LZO file, the act of determining, according to the compressed index file, a start position of a last data block of an established index recorded in the compressed data file as a start position of a target data block may include: acquiring the data of the last 8 bytes of the compressed index file; and converting the acquired data into long and integer data and then taking the long and integer data as the initial position of the last data block of the established index.

Step S130, reading a header file of the target data block according to the start position of the target data block.

In this embodiment, after the start position of the target data block is determined, the header file of the target data block can be read from the start position.

Step S140, determining whether the target data block is compressed according to the information recorded in the header file of the target data block.

Specifically, in this embodiment, the header file includes a file length field before compression, a file length field after compression, and a check code field. Alternatively, in this embodiment, when the compressed data file is an LZO file, data of 4-bit length may be read from the file header of the target data block as the pre-compression file length, and data of 4-bit length may be read backward again as the post-compression file length.

When the data block is generated, if the data to be processed is too small, the compressed data may be larger than the original data if compression is performed, and therefore, some data in the data block may not be compressed. And the header file sizes of the compressed data blocks and the uncompressed data blocks are different. Therefore, in this embodiment, the overall size of the target database needs to be determined according to whether the data block is compressed or not.

Specifically, the file length before compression and the file length after compression of the target data block may be obtained, and then it is detected whether the file length before compression and the file length after compression are equal.

and if the length of the file before the compression is not equal to the length of the file after the compression, determining that the target data block is not compressed.

Step S150, determining the length of the whole target data block according to whether the target data block is compressed.

Specifically, if the target data block is compressed, it is determined that the length of the check code field is the length of the check code of the data before compression plus the length of the check code of the data after compression. That is, the check code field contains the check code of the data before compression and the check code of the data after compression.

And if the target data block is not compressed, the length of the check code field is the length of the check code of the data before compression. That is, the check code field has no check code of the compressed data, and only has the check code of the data before compression.

And determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression. In this embodiment, the length of the entire target data block is the sum of the length of the file length field before compression, the length of the file length field after compression, the length of the check code field, and the length of the file after compression.

Step S160, determining the starting position of the next data block according to the length of the whole target data block and the starting position of the target data block, and writing the starting position into the compressed index file.

In this embodiment, the starting position of the next data block may be obtained by adding the length of the entire target data block to the starting position of the target data block, and then the starting position of the next data block may be additionally written into the compressed index file.

Step S170, taking the next data block as a new target data block, and repeatedly executing from reading the header file of the target data block according to the starting position of the target data block until all data blocks are processed.

In other words, after determining the starting position of the new target database, the subsequent steps can be repeated from step S130 until all the indexes of the data blocks are written into the compressed index file.

In this embodiment, 5. the method of claim 1, further comprising:

and acquiring data to be processed according to a first set period, and generating a data block according to the data to be processed.

wherein the second setting period is greater than the first setting period.

For example, the data to be processed is acquired every 5 minutes, and whether to generate the data block after compression is selected according to the size of the acquired data to be processed. And then, the compressed index file is subjected to incremental updating once every 1 hour according to the generated data.

Referring to fig. 2, fig. 1 is an electronic device 100 according to an embodiment of the present disclosure, where the electronic device 100 may be, but is not limited to, an electronic device with digital processing capability, such as a server, a personal computer, and the like. The electronic device 100 includes a compressed file index increment updating apparatus 110, a machine-readable storage medium 120, a processor 130, and a communication unit 140.

The elements of the machine-readable storage medium 120, the processor 130, and the communication unit 140 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The compressed file index increment updating apparatus 110 includes at least one software function module which can be stored in the machine-readable storage medium 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 100. The processor 130 is configured to execute executable modules stored in the machine-readable storage medium 120, such as software functional modules and computer programs included in the compressed file index increment updating apparatus 110.

The machine-readable storage medium 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The machine-readable storage medium 120 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction.

The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 3, the present embodiment further provides a compressed file index increment updating apparatus 110

The index obtaining module 111 is configured to obtain a compressed index file, where the compressed index file is used to indicate a position of each data block in a compressed data file.

In this embodiment, the index obtaining module 111 may be configured to execute step S110 shown in fig. 1, and for a detailed description of the index obtaining module 111, reference may be made to the description of step S110.

And a data block determining module 112, configured to determine, according to the compressed index file, a start position of a last data block of the established index recorded in the compressed data file as a start position of a target data block. The target data block includes a header file and a data file.

In this embodiment, the data block determination module 112 may be configured to execute step S120 shown in fig. 1, and the detailed description about the data block determination module 112 may refer to the description about step S120.

A data block reading module 113, configured to read a header file of the target data block according to the start position of the target data block.

In this embodiment, the data block reading module 113 may be configured to execute step S130 shown in fig. 1, and reference may be made to the description of step S130 for a detailed description of the data block reading module 113.

The data block detection module 114 determines whether the target data block is compressed according to the information recorded in the header file of the target data block. And determining the length of the whole target data block according to whether the target data block is compressed or not.

In this embodiment, the data block detection module 114 may be configured to execute steps S140 and S150 shown in fig. 1, and the detailed description about the data block detection module 114 may refer to the description about the steps S140 and S150.

And an index updating module 115, configured to determine, according to the length of the entire target data block and the start position of the target data block, that the start position of the next data block is written in the compressed index file, and use the next data block as a new target data block.

In this embodiment, the index updating module 115 may be configured to execute step S160 shown in fig. 1, and the detailed description about the index updating module 115 may refer to the description about step S160.

And the loop processing module 116 is configured to, according to the start position of the new target data block, read the header file of the target data block from the start position of the target data block, and repeatedly execute the process until all data blocks are processed.

In this embodiment, the loop processing module 116 may be configured to execute step S170 shown in fig. 1, and reference may be made to the description of step S170 for a detailed description of the loop processing module 116.

Optionally, the header file includes a file length field before compression, a file length field after compression, and a check code field.

The data block detection module 114 is specifically configured to obtain a file length before compression and a file length after compression of the target data block. And detecting whether the length of the file before compression is equal to that of the file after compression. And if the length of the file before compression is equal to that of the file after compression, determining that the target data block is compressed. And if the length of the file before the compression is not equal to the length of the file after the compression, determining that the target data block is not compressed. And if the target data block is compressed, determining the length of the check code field as the length of the data check code before compression plus the length of the data check code after compression. And if the target data block is not compressed, the length of the check code field is the length of the data check code before compression. And determining the length of the whole target data block according to the length of the file length field before compression, the length of the file length field after compression, the length of the check code field and the length of the file after compression.

Optionally, the compressed file is an LZO file.

The data block determining module 112 is specifically configured to obtain the last 8 bytes of data of the compressed index file. And converting the acquired data into long and integer data as the starting position of the last data block of the established index.

The data block reading module 113 is specifically configured to read data with a length of 4 bits from the file header of the target data block as the file length before compression, and read data with a length of 4 bits backwards again as the file length after compression.

In summary, according to the method, the device and the electronic device for updating the compressed file index increment provided by the application, after the compressed data file is increased, the start position of the last data block of the established index recorded in the compressed index file is determined as the start position of the target data block, the header file of the target data block is read, and then the start address of the next data block is written into the compressed index file according to whether the target data block is updated or not.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be essentially implemented or contributed to by the prior art or parts thereof in the form of a software product stored in a storage medium, and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for updating compressed file index increments, the method comprising:

2. The method of claim 1, wherein the header file comprises a pre-compression file length field, a post-compression file length field, and a check code field;

if the length of the file before compression is not equal to that of the file after compression, determining that the target data block is not compressed;

3. The method of claim 2, wherein the compressed data file is an LZO file; the step of determining the start position of the last data block of the established index recorded in the compressed data file as the start position of the target data block according to the compressed index file includes:

acquiring the data of the last 8 bytes of the compressed index file;

4. The method of claim 2, wherein the step of obtaining the pre-compression file length and the post-compression file length of the target data block comprises:

5. The method of claim 1, further comprising:

wherein the second setting period is greater than the first setting period.

6. An apparatus for incremental updating of compressed file indices, the apparatus comprising:

7. The apparatus of claim 6, wherein the header file comprises a pre-compression file length field, a post-compression file length field, and a check code field;

8. The apparatus of claim 7, wherein the compressed file is an LZO file;

9. An electronic device comprising a processor and a machine-readable storage medium having stored thereon machine-executable instructions that, when executed by the processor, implement the method of any of claims 1-7.

10. A machine-readable storage medium having stored thereon machine-executable instructions which, when executed by one or more processors, perform the method of any one of claims 1-7.