CN115061986B - Data writing method, data compression method and data decompression method - Google Patents

Data writing method, data compression method and data decompression method Download PDF

Info

Publication number
CN115061986B
CN115061986B CN202210984207.1A CN202210984207A CN115061986B CN 115061986 B CN115061986 B CN 115061986B CN 202210984207 A CN202210984207 A CN 202210984207A CN 115061986 B CN115061986 B CN 115061986B
Authority
CN
China
Prior art keywords
data
cache page
target file
thread
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210984207.1A
Other languages
Chinese (zh)
Other versions
CN115061986A (en
Inventor
周鹏
余昇锦
胡翔
陈毅翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uniontech Software Technology Co Ltd
Original Assignee
Uniontech Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uniontech Software Technology Co Ltd filed Critical Uniontech Software Technology Co Ltd
Priority to CN202210984207.1A priority Critical patent/CN115061986B/en
Publication of CN115061986A publication Critical patent/CN115061986A/en
Application granted granted Critical
Publication of CN115061986B publication Critical patent/CN115061986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • G06F16/1767Concurrency control, e.g. optimistic or pessimistic approaches
    • G06F16/1774Locking methods, e.g. locking methods for file systems allowing shared and concurrent access to files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, and discloses a data writing method, a data compression method and a data decompression method, wherein the data writing method comprises the following steps: acquiring a file lock of a target file before the target file is written into the thread; if the thread fails to acquire the file lock, applying for a first cache page, and writing first data into the first cache page; storing the first cache page in a first data structure corresponding to the target file, and judging whether all data to be written into the target file in the thread are written; if the writing is not finished, the steps from the application of the first cache page to the judgment of whether all the data of the target file to be written in the thread are completely written are repeatedly executed; and if the writing is finished, storing all the first cache pages into the second data structure corresponding to the target file. By the technical scheme of the invention, the processing time for writing data into the file can be greatly shortened.

Description

Data writing method, data compression method and data decompression method
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data writing method and apparatus, a data compression method, a data decompression method, a computing device, and a readable storage medium.
Background
In each operating system, a data packet generated after initial data compression is called a compressed packet by using a data compression tool, and the volume can be compressed to be a fraction of the original volume or even smaller. Compression is generally a mechanism for reducing the size of a file through a specific algorithm, by which the total number of bytes of the file can be reduced, the disk space occupied by the file can be reduced, and the file can be transmitted faster even though the file is connected through a slower network.
The existing compression method is to compress an original file according to the sequence of data to generate one or more sub-compression packets, and then to merge the sub-compression packets into an integral compression packet by adopting a single thread. The existing decompression mode is to decompress a compressed packet into a plurality of sub-packets, and then to merge the plurality of sub-packets into an original file in sequence by adopting a single thread. However, the merging process using a single thread is to concatenate the sub-compressed packets/sub-packets, which is especially time-consuming when processing large files or files with a large number. On the basis of the scheme, even if the original file is compressed/decompressed through multiple threads, namely the original file is split into N parts of data through multiple threads during compression and then sequentially merged into one compressed packet, the compressed packet is restored into N parts of data through multiple threads during decompression and then sequentially restored into the original file, the time consumed by the compression/decompression algorithm can be reduced by running the compression/decompression algorithm through multiple threads, but the final merging and restoring process is still executed in a mode of connecting single threads in series, complete multithreading is not realized, and performance loss exists.
Therefore, the present invention provides a data writing scheme to solve the problems in the prior art.
Disclosure of Invention
To this end, the present invention provides a data writing method, an apparatus, a data compression method, a data decompression method, a computing device and a readable storage medium to solve or at least alleviate the above-existing problems.
According to a first aspect of the present invention, there is provided a data writing method, the method comprising: before the thread writes the target file, acquiring a file lock of the target file; if the thread fails to acquire the file lock: applying for a first cache page, and writing first data into the first cache page; storing the first cache page in a first data structure corresponding to the target file, and judging whether all data to be written into the target file in the thread are written; if the writing is not finished, the steps from the application of the first cache page to the judgment of whether all the data of the target file to be written in the thread are completely written are repeatedly executed; and if the writing is finished, storing all the first cache pages into the second data structure corresponding to the target file.
Optionally, in the data writing method according to the present invention, the method further includes: if the thread successfully acquires the file lock, acquiring a writable cache page, and writing the first data into the writable cache page; if the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file; judging whether all data to be written into the target file in the thread are written; if the writing is not completed, the step of obtaining the writable cache page is repeatedly executed.
Optionally, in the data writing method according to the present invention, further comprising: when the first cache page or the second cache page is stored in the second data structure corresponding to the target file, if the cache page already exists in the second data structure corresponding to the target file, the existing cache page is merged with the first cache page or the second cache page, and repeated data is removed.
Optionally, in the data writing method according to the present invention, the first data structure includes a linked list, and the second data structure includes a radix tree.
According to a second aspect of the present invention, there is provided a data compression method, the method comprising: when the first data to be processed is compressed, compressing the first data to be processed into a plurality of compressed data packets through a plurality of threads; and each thread in the plurality of threads executes the method, and each compressed data packet in the plurality of compressed data packets is written into the second data structure corresponding to the target file, so as to generate a compressed file of the first data to be processed.
According to a third aspect of the present invention, there is provided a data decompression method, the method comprising: when the second data to be processed is decompressed, decompressing the second data to be processed into a plurality of decompressed data packets through a plurality of threads; and each thread in the plurality of threads executes the method, each decompressed data packet in the plurality of decompressed data packets is written into the second data structure corresponding to the target file, and the decompressed data of the second data to be processed is generated.
According to a fourth aspect of the present invention, there is provided a data writing apparatus comprising: the file lock acquisition unit is used for acquiring a file lock of the target file before the thread is written into the target file; the data writing unit is suitable for applying for a first cache page when the thread fails to acquire the file lock, writing first data into the first cache page, storing the first cache page in a first data structure corresponding to a target file, judging whether all data of the target file to be written in the thread are written in, and if the data are not written in, repeatedly executing the steps from applying for the first cache page to judging whether all data of the target file to be written in the thread are written in; and the cache page storage unit is suitable for storing all the first cache pages into the second data structure corresponding to the target file when the target file to be written in the thread is completely written.
Optionally, in the data writing device according to the present invention, the data writing unit is further adapted to, when the thread successfully acquires the file lock, acquire a writable cache page, and write the first data into the writable cache page; when the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file; judging whether all data of the target file to be written in the thread are completely written; if the writing is not completed, the step of obtaining the writable cache page is repeatedly executed.
According to a fifth aspect of the present invention, there is provided a computing device comprising: at least one processor; a memory storing program instructions configured to be suitable for execution by the at least one processor, the program instructions comprising instructions for performing the method as described above.
According to a sixth aspect of the present invention, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method as described above.
According to the technical scheme, the file lock of the target file is acquired before the thread writes the target file, if the thread fails to acquire the file lock, the first cache page is applied, the first data is written into the first cache page, the first cache page is stored in the first data structure corresponding to the target file, the process is repeated, all data to be written into the target file by the thread are written into the first cache page, and finally all the first cache pages are stored into the second data structure corresponding to the target file, so that the waiting time consumed by failure in acquiring the file lock is avoided, the data are written into the second data structure corresponding to the target file in parallel through multiple threads, the file lock does not need to be waited, and the processing time for writing the data into the file is greatly shortened.
Furthermore, when the first to-be-processed data is compressed and the second to-be-processed data is decompressed, in the multiple threads, each of the multiple compressed data packets is written into the second data structure corresponding to the target file, and the data is compressed/decompressed through multiple threads, so that the time spent on data compression/decompression is greatly shortened, and the efficiency of data compression/decompression is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 illustrates a block diagram of the physical components (i.e., hardware) of a computing device 100;
FIG. 2 shows a flow diagram of a data writing method 200 according to one embodiment of the invention;
FIG. 3 illustrates a schematic diagram of writing data by multiple threads according to one embodiment of the invention;
FIG. 4 illustrates a flow diagram of a data writing method 400 according to another embodiment of the invention;
fig. 5 shows a schematic diagram of a data writing apparatus 500 according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 illustrates a block diagram of the physical components (i.e., hardware) of a computing device 100. In a basic configuration, computing device 100 includes at least one processing unit 102 and system memory 104. According to one aspect, the processing unit 102 may be implemented as a processor depending on the configuration and type of computing device. The system memory 104 includes, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories. According to one aspect, operating system 105 and program modules 106 are included in system memory 104, and data writing means 500 is included in program modules 106 for performing the data writing method of the present invention.
According to one aspect, the operating system 105 is, for example, suitable for controlling the operation of the computing device 100. Further, the examples are practiced in conjunction with a graphics library, other operating systems, or any other application program, and are not limited to any particular application or system. This basic configuration is illustrated in fig. 1 by those components within dashed line 108. According to one aspect, computing device 100 has additional features or functionality. For example, according to one aspect, computing device 100 includes additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 1 by removable storage device 109 and non-removable storage device 110.
As stated hereinabove, according to one aspect, program module 106 is stored in system memory 104. According to one aspect, program modules 106 may include one or more applications, the invention is not limited to the type of application, for example, applications may include: email and contacts applications, word processing applications, spreadsheet applications, database applications, slide show applications, drawing or computer-aided applications, web browser applications, and the like.
According to one aspect, examples may be practiced in a circuit comprising discrete electronic elements, a packaged or integrated electronic chip containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, an example may be practiced via a system on a chip (SOC) in which each or many of the components shown in fig. 1 may be integrated on a single integrated circuit. According to one aspect, such SOC devices may include one or more processing units, graphics units, communication units, system virtualization units, and various application functions, all integrated (or "burned") onto a chip substrate as a single integrated circuit. When operating via an SOC, the functions described herein may be operated via application-specific logic integrated with other components of the computing device 100 on a single integrated circuit (chip). Embodiments of the invention may also be practiced using other technologies capable of performing logical operations (e.g., AND, OR, AND NOT), including but NOT limited to mechanical, optical, fluidic, AND quantum technologies. In addition, embodiments of the invention may be practiced within a general purpose computer or in any other circuit or system.
According to one aspect, computing device 100 may also have one or more input devices 112, such as a keyboard, mouse, pen, voice input device, touch input device, or the like. Output device(s) 114 such as a display, speakers, printer, etc. may also be included. The foregoing devices are examples and other devices may also be used. Computing device 100 may include one or more communication connections 116 that allow communication with other computing devices 118. Examples of suitable communication connections 116 include, but are not limited to: RF transmitter, receiver and/or transceiver circuitry; universal Serial Bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein includes computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. System memory 104, removable storage 109, and non-removable storage 110 are all examples of computer storage media (i.e., memory storage). Computer storage media may include Random Access Memory (RAM), read Only Memory (ROM), electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture that can be used to store information and that can be accessed by the computer device 100. According to one aspect, any such computer storage media may be part of computing device 100. Computer storage media does not include a carrier wave or other propagated data signal.
According to one aspect, communication media is embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal (e.g., a carrier wave or other transport mechanism) and includes any information delivery media. According to one aspect, the term "modulated data signal" describes a signal that has one or more feature sets or that has been altered in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio Frequency (RF), infrared, and other wireless media.
In one embodiment of the invention, computing device 100 includes one or more processors and one or more readable storage media storing program instructions. The program instructions, when configured to be executed by one or more processors, cause a computing device to perform a data writing method in an embodiment of the present invention.
For the user layer, a file is a segment of accessible memory, and a program of the user layer can write data to any position of the file at will. When a file is compressed/decompressed by a single thread, the file is divided into a plurality of sub-files, then the sub-files are compressed/decompressed, and then the plurality of compressed sub-files are written into a final file. Because the single thread needs to calculate first and then write data, the execution speed is slow, and great performance loss is caused. If a file is compressed/decompressed by multithreading, it can be realized that the algorithm of compression/decompression is executed simultaneously to perform compression/decompression and write data, but if data is written by multithreading, there is a problem of concurrent access to the file. Because a file lock exists in a file, only the thread obtaining the file lock can execute writing, and each thread needs to wait until the file is not written, the file lock causes performance bottleneck during multi-thread compression/decompression. Therefore, the invention provides a new data writing method, when the file lock is occupied, the file lock does not need to be waited, and a scheme of multithreading parallel data writing is realized.
FIG. 2 shows a flow diagram of a data writing method 200 according to one embodiment of the invention. The method 200 is suitable for execution in a computing device, such as the computing device 100 described above. As shown in fig. 2, method 200 begins at 210.
210. And acquiring the file lock of the target file before the thread writes the target file.
According to the embodiment of the invention, the target file corresponds to an index node (inode), and the index node of the target file stores file meta information, file size, creator, creation time and other information of the target file. The index node for the target file corresponds to the file lock for the target file.
220. If the thread fails to acquire the file lock: applying for a first cache page, writing first data into the first cache page, storing the first cache page in a first data structure corresponding to a target file, and judging whether all data to be written into the target file in a thread are written.
According to the embodiment of the invention, the file system is improved, and the thread can write data into the first cache page even if the thread fails to acquire the file lock. Wherein, the first cache page is a free cache page. And writing the first data into corresponding positions in the first cache page according to the positions of the first data in all data to be written into the target file. And writing the first cache page into a corresponding position in the first data structure according to the position of the first cache page in the first data structure. Optionally, the first data structure comprises a linked list. Optionally, writing the first cache page to the corresponding location in the first data structure is implemented by defining an interface for writing the first data structure, and an exemplary interface putpg _ to _ inode is provided below:
int putpg_to_inode(struct page *page, struct inode *inode,int offset)
the execution logic of the interface is to store the written first cache page into an i _ pgcache _ list corresponding to the inode, where the i _ pgcache _ list represents a linked list of a target file and is used to find a corresponding position for the cache page through an offset (offset) and store the corresponding position in the cache page. The positions of different cache pages are different, and the positions where the cache pages should be stored are determined by the offset of the phase difference between the different positions of the cache pages.
230. And if the writing is not finished, repeatedly executing the steps from the application of the first cache page to the judgment of whether all the data of the target file to be written in the thread are completely written. Specifically, if the writing is not completed, the step of applying for the first cache page in the step 220 is repeatedly executed, the first data is written into the first cache page, the first cache page is stored in the first data structure corresponding to the target file, and the step of determining whether all the data to be written into the target file in the thread are completely written is repeated until all the data to be written into the target file in the thread are completely written.
240. And if the writing is finished, storing all the first cache pages into the second data structure corresponding to the target file.
Optionally, the second data structure comprises a radix tree.
In addition, according to the embodiment of the present invention, after 210, if the thread succeeds in acquiring the file lock, the writable cache page is acquired, and the first data is written into the writable cache page; if the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file; judging whether all data to be written into the target file in the thread are written; if the write is not completed, the step of obtaining the writable cache page is repeated. Here, the previous step of obtaining the writable cache page is returned, and then the steps after this step are executed until all the data to be written into the target file by the thread are completely written.
The writable cache page represents a cache page in a not-full state, and the second cache page is a free cache page. Optionally, the second data structure comprises a radix tree.
Optionally, when the first cache page or the second cache page is stored in the second data structure corresponding to the target file, if a cache page already exists in the second data structure corresponding to the target file, the existing cache page is merged with the first cache page or the second cache page, and duplicate data is removed. Optionally, the cache page is saved into the second data structure and redundant data is removed by defining an interface for saving and deduplication, and an exemplary interface putpgs _ to _ radix _ tree is provided below:
int putpgs_to_radix_tree(struct file*f , struct page *page,int region)
the execution logic of the interface is to store the cache pages into a radix _ tree and process redundant data, the cache pages of the file are managed through a radix tree, when the cache pages are stored into the radix _ tree, if the cache pages already exist in the radix _ tree, the cache pages are merged, redundant parts are removed, and then the existing cache pages are released.
By the data writing method, data can be synchronously written into the second data structure corresponding to the target file in a plurality of threads. FIG. 3 shows a schematic diagram of writing data by multiple threads, according to one embodiment of the invention. When the thread A executes the write-in process, firstly acquiring a file lock, if the acquisition is successful, executing the steps executed under the condition that the thread acquires the file lock successfully, if the thread B (or the thread C and the thread D) also needs to execute the write-in process at the moment and already has the file lock, firstly applying for a cache page by the thread B (or the thread C and the thread D), then writing data into the cache page, after all data are written in, storing the cache page into a second data structure (radix _ tree corresponding to the inode) corresponding to the target file, and solving the repeated part. The scheme of the invention writes data into the target file in parallel, and does not need threads running in parallel to wait for file locks, thereby greatly shortening the processing time of writing data into the file.
In order to further verify the effect of the data writing scheme of the present invention in the aspect of efficiency improvement, the data writing method of the present invention was tested. The following is an exemplary test code:
fd = open(a, O_RDWR|OCREAT);
thread1:pwrite(fd, buff, 4096*1024, 0);
thread2:pwrite(fd, buff, 4096*1024, 4096*1024);
the using time of the thread1 (thread 1) and the thread2 (thread 2) for data writing in the conventional method in the test is 2684us and 4333us respectively, and the using time of the thread1 and the thread2 for data writing in the method of the present invention is 2172us and 2653us respectively, wherein the longest consumed time can represent the total time spent in data writing, namely 4333us spent in the conventional method, and 2172us spent in the method of the present invention, and the time spent in data writing in a file can be significantly reduced by using the technical scheme of the present invention.
In order to more clearly describe the technical solution of the present invention, the data writing method of the present invention is described below with reference to a specific embodiment. Fig. 4 shows a flow diagram of a data writing method 400 according to another embodiment of the invention. Method 400 illustrates the execution logic of any of a plurality of threads that execute the data writing method of the present invention. As shown in fig. 4, method 400 begins at 401.
401. And acquiring a file lock of the target file. If the acquisition of the file lock is successful, 402 is performed, and if the acquisition of the file lock is unsuccessful, 408 is performed.
402. A writable cache page is obtained. If there is a writable cache page then 403 is performed and if there is no writable cache page then 404 is performed.
403. The first data is written to a writable cache page. Then, step 407 is performed.
404. Apply for a second cache page.
405. The first data is written into a second cache page.
406. And storing the second cache page into a second data structure corresponding to the target file.
407. And judging whether all the data to be written into the target file in the thread are written into. If all the writes are completed, execution ends. If the write is not complete, return is made to 402.
408. A first cache page is applied.
409. The first data is written into a first cache page.
410. And storing the first cache page in a first data structure corresponding to the target file.
411. And judging whether all the data to be written into the target file in the thread are written into. If the write is not complete, execution is returned 408. If both writes are complete, 412 is performed.
412. And storing all the first cache pages into a second data structure corresponding to the target file.
It should be noted that the working principle and the flow of the method 400 provided in this embodiment are similar to those of the method 200, and reference may be made to the description of the method 200 for relevant points, which are not repeated herein.
According to an embodiment of the present invention, a data compression method is provided, which is suitable for increasing the compression speed when data is compressed. Specifically, when the first data to be processed is subjected to compression processing, the first data to be processed is compressed into a plurality of compressed data packets by a plurality of threads. Each of the plurality of threads writes each of the plurality of compressed data packets as all data to be written into the destination file into the second data structure corresponding to the destination file by executing the foregoing method 200 or 400, thereby generating a compressed file of the first data to be processed.
According to another embodiment of the invention, a data decompression method is provided, and the data decompression method is suitable for increasing the decompression speed when data is decompressed. Specifically, when the second to-be-processed data is decompressed, the second to-be-processed data is decompressed into a plurality of decompressed data packets by the plurality of threads. Each of the plurality of threads writes each of the plurality of decompressed data packets into a second data structure corresponding to the destination file as all data to be written into the destination file by executing the foregoing method 200 or 400, and generates decompressed data of second data to be processed.
The invention also provides a data writing device. Fig. 5 shows a schematic diagram of a data writing apparatus 500 according to an embodiment of the present invention. As shown in fig. 5, the data writing apparatus 500 includes a file lock acquisition unit 510, a data writing unit 520, and a cache page storage unit 530. The data writing apparatus 500 may further include a cache page merge unit 540.
The file lock acquiring unit 510 is adapted to acquire a file lock of the target file before the thread writes the target file.
The data writing unit 520 is adapted to apply for a first cache page when the thread fails to acquire the file lock, write the first data into the first cache page, store the first cache page in the first data structure corresponding to the target file, and determine whether all data of the target file to be written in the thread are completely written, and if the data are not completely written, repeat the steps from applying for the first cache page to determining whether all data of the target file to be written in the thread are completely written.
The data writing unit 520 is further adapted to, when the thread successfully acquires the file lock, acquire a writable cache page, and write the first data into the writable cache page; when the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file; judging whether all data of the target file to be written in the thread are completely written; if the writing is not completed, the step of obtaining the writable cache page is repeatedly executed.
The cache page storage unit 530 is adapted to store all first cache pages into the second data structure corresponding to the target file when the target file to be written in the thread has completed writing.
The cache page merging unit 540 is adapted to, when the first cache page or the second cache page is stored in the second data structure corresponding to the target file, merge the existing cache page with the first cache page or the second cache page and remove duplicate data if the cache page already exists in the second data structure corresponding to the target file.
It should be noted that the working principle and the flow of the apparatus 500 provided in the present embodiment are similar to those of the method 200 or 400, and reference may be made to the description of the method 200 or 400 for relevant points, which are not repeated herein.
According to the technical scheme, the file lock of the target file is obtained before the thread writes the target file, if the thread fails to obtain the file lock, a first cache page is applied, first data is written into the first cache page, the first cache page is stored in a first data structure corresponding to the target file, the process is repeated, all data to be written into the target file by the thread are written into the first data structure, and finally all the first cache pages are stored into a second data structure corresponding to the target file. The waiting time consumed by failure of obtaining the file lock is avoided, the data is written into the second data structure corresponding to the target file in parallel through multiple threads, the file lock does not need to be waited, and the processing time of writing the data into the file is greatly shortened.
Furthermore, when the first to-be-processed data is compressed and the second to-be-processed data is decompressed, in the multiple threads, each of the multiple compressed data packets is written into the second data structure corresponding to the target file, and the data is compressed/decompressed through multiple threads, so that the time spent on data compression/decompression is greatly shortened, and the efficiency of data compression/decompression is improved.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the mobile terminal generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the data writing method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the device in this example. The modules in the foregoing examples may be combined into one module or may additionally be divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification, and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except that at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification may be replaced by an alternative feature serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Claims (5)

1. A method of writing data, the method comprising:
when compressing first data to be processed, compressing the first data to be processed into a plurality of compressed data packets through a plurality of threads;
each thread in the plurality of threads writes each compressed data packet in the plurality of compressed data packets into a second data structure corresponding to a target file by executing the following steps to generate a compressed file of first data to be processed:
before each thread writes a target file, acquiring a file lock of the target file;
if the thread fails to acquire the file lock, then:
applying for a first cache page, and writing first data into the first cache page;
storing the first cache page in a first data structure corresponding to the target file, and judging whether each compressed data packet in the plurality of compressed data packets in the thread is written in;
if the writing is not finished, the steps from the application of the first cache page to the judgment of whether each compressed data packet in the plurality of compressed data packets in the thread is completely written are repeatedly executed;
if the writing is finished, storing all the first cache pages into a second data structure corresponding to the target file;
if the thread successfully acquires the file lock, acquiring a writable cache page, and writing the first data into the writable cache page;
if the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file;
if the cache page exists in the second data structure corresponding to the target file, combining the existing cache page with the first cache page or the second cache page, and removing repeated data;
judging whether each compressed data packet in the plurality of compressed data packets in the thread is written in;
if the writing is not finished, the step of obtaining the writable cache page is repeatedly executed;
wherein the first data structure comprises a linked list and the second data structure comprises a radix tree.
2. A method of data decompression, the method comprising:
when the second data to be processed is decompressed, decompressing the second data to be processed into a plurality of decompressed data packets through a plurality of threads;
each thread in the plurality of threads writes each decompressed data packet in the plurality of decompressed data packets into a second data structure corresponding to a target file by executing the following steps to generate decompressed data of second data to be processed:
before a thread writes a target file, acquiring a file lock of the target file;
if the thread fails to acquire the file lock, then:
applying for a first cache page, and writing first data into the first cache page;
storing the first cache page in a first data structure corresponding to the target file, and judging whether each of the plurality of decompressed data packets in the thread is written in;
if the writing is not finished, the steps from the application of the first cache page to the judgment of whether each decompressed data packet in the plurality of decompressed data packets in the thread is completely written are repeatedly executed;
if the writing is finished, all the first cache pages are stored in a second data structure corresponding to the target file;
if the thread successfully acquires the file lock, acquiring a writable cache page, and writing the first data into the writable cache page;
if the writable cache page is failed to be acquired, applying for a second cache page, writing the first data into the second cache page, and storing the second cache page into a second data structure corresponding to the target file;
judging whether each decompressed data packet in the plurality of decompressed data packets in the thread is written in;
if the writing is not finished, the step of obtaining the writable cache page is repeatedly executed;
when the first cache page or the second cache page is stored in the second data structure corresponding to the target file, if the cache page already exists in the second data structure corresponding to the target file, merging the existing cache page with the first cache page or the second cache page, and removing repeated data;
wherein the first data structure comprises a linked list and the second data structure comprises a radix tree.
3. A data writing apparatus comprising:
the file lock acquisition unit is suitable for acquiring a file lock of a target file before each thread writes the target file;
a data writing unit adapted to apply for a first cache page when the thread fails to acquire the file lock, write first data into the first cache page, store the first cache page in a first data structure corresponding to the target file, and determine whether each of the plurality of compressed data packets to be written in the thread completes writing, if not, repeat the steps from applying for the first cache page to determining whether each of the plurality of compressed data packets in the thread completes writing, and further adapted to compress the first data to be processed into a plurality of compressed data packets by a plurality of threads when performing compression processing on the first data to be processed, each of the plurality of threads writes each of the plurality of compressed data packets into a second data structure corresponding to the target file, generating a compressed file of first data to be processed, further being adapted to, when the thread successfully acquires the file lock, acquire a writable cache page, write the first data into the writable cache page, when the acquisition of the writable cache page fails, apply for a second cache page, write the first data into the second cache page, and store the second cache page into a second data structure corresponding to the target file, if a cache page already exists in the second data structure corresponding to the target file, merge the existing cache page with the first cache page or the second cache page, and remove duplicated data, determine whether each of the plurality of compressed data packets in the thread completes writing, if the writing is not completed, repeat the step of acquiring the writable cache page, the first data structure comprises a linked list and the second data structure comprises a radix tree;
and the cache page storage unit is suitable for storing all the first cache pages into a second data structure corresponding to the target file when the target file to be written in the thread is completely written.
4. A computing device, comprising:
at least one processor; and
a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the method of claim 1 or 2.
5. A readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method of claim 1 or 2.
CN202210984207.1A 2022-08-17 2022-08-17 Data writing method, data compression method and data decompression method Active CN115061986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210984207.1A CN115061986B (en) 2022-08-17 2022-08-17 Data writing method, data compression method and data decompression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210984207.1A CN115061986B (en) 2022-08-17 2022-08-17 Data writing method, data compression method and data decompression method

Publications (2)

Publication Number Publication Date
CN115061986A CN115061986A (en) 2022-09-16
CN115061986B true CN115061986B (en) 2022-12-02

Family

ID=83207832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210984207.1A Active CN115061986B (en) 2022-08-17 2022-08-17 Data writing method, data compression method and data decompression method

Country Status (1)

Country Link
CN (1) CN115061986B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543970B (en) * 2022-11-29 2023-03-03 本原数据(北京)信息技术有限公司 Data page processing method, data page processing device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033359A (en) * 2018-07-26 2018-12-18 北京天地和兴科技有限公司 A kind of method of multi-process secure access sqlite
CN110399227A (en) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 A kind of data access method, device and storage medium
CN114691549A (en) * 2022-03-24 2022-07-01 统信软件技术有限公司 File writing method and device and computing equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109725840B (en) * 2017-10-30 2022-04-05 伊姆西Ip控股有限责任公司 Throttling writes with asynchronous flushing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033359A (en) * 2018-07-26 2018-12-18 北京天地和兴科技有限公司 A kind of method of multi-process secure access sqlite
CN110399227A (en) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 A kind of data access method, device and storage medium
CN114691549A (en) * 2022-03-24 2022-07-01 统信软件技术有限公司 File writing method and device and computing equipment

Also Published As

Publication number Publication date
CN115061986A (en) 2022-09-16

Similar Documents

Publication Publication Date Title
US9377959B2 (en) Data storage method and apparatus
JP5735654B2 (en) Deduplication method for stored data, deduplication apparatus for stored data, and deduplication program
CN105843551B (en) Data integrity and loss resistance in high performance and large capacity storage deduplication
US9916211B2 (en) Relational database recovery
CN107526535B (en) Method and system for managing storage system
WO2015145647A1 (en) Storage device, data processing method, and storage system
US20120136842A1 (en) Partitioning method of data blocks
CN115061986B (en) Data writing method, data compression method and data decompression method
CN107193503B (en) Data deduplication method and storage device
CN107665219B (en) Log management method and device
CN111125040A (en) Method, apparatus and storage medium for managing redo log
JP2013513862A (en) Consistency without order dependencies
CN115098046B (en) Disk array initialization method, system, electronic device and storage medium
CN115408411A (en) Data writing method and device, electronic equipment and storage medium
CN114138200A (en) Pre-writing log method and system based on rocksDB
CN114816772B (en) Debugging method, debugging system and computing device for application running based on compatible layer
US11803317B2 (en) Interrupted replicated write recognition
CN115348276A (en) Data storage method and device, computer equipment and storage medium
CN108090128A (en) A kind of merging memory space recovery method, device and electronic equipment
US20220214833A1 (en) Method, device and computer program product for data writing
US20230385240A1 (en) Optimizations for data deduplication operations
CN109542900B (en) Data processing method and device
US10949412B2 (en) Log marking dependent on log sub-portion
CN114896179A (en) Memory page copying method and device, computing equipment and readable storage medium
CN115576864A (en) Processing method of virtual memory space of file and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant