CN108228673B - Method and system for rapidly merging files - Google Patents
Method and system for rapidly merging files Download PDFInfo
- Publication number
- CN108228673B CN108228673B CN201611199551.0A CN201611199551A CN108228673B CN 108228673 B CN108228673 B CN 108228673B CN 201611199551 A CN201611199551 A CN 201611199551A CN 108228673 B CN108228673 B CN 108228673B
- Authority
- CN
- China
- Prior art keywords
- file
- newly
- merging
- files
- extended
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
Abstract
The invention discloses a method and a system for rapidly merging files, which enable the merging of the files to consume a small part of the read-write capacity of a disk and also enable the merging to be rapidly carried out. The technical scheme is as follows: newly building an extension structure in an extension header of an index node of a first file; assigning the information of the expansion head of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series; and deleting the second file through a file system command to complete the combination of the first file and the second file.
Description
Technical Field
The invention relates to a data storage technology in the field of computers, in particular to a method and a system for quickly merging different files.
Background
The development of the cloud computing technology has more application scenes: and segmenting a certain file, then carrying out concurrent processing, and splicing the files after the processing is finished.
The purpose of splitting the files is to accelerate the processing process after the files are concurrently split so as to achieve higher efficiency, but since the splicing needs to be read from a disk and written back to split the files into one file, the processing logic can only be performed in series, and the overall efficiency is not obviously improved.
Specifically, after slicing, concurrent processing is performed, and a plurality of slice files are formed after the processing is completed. The current common practice is to open the first slice file and prepare to add content to this file; opening the second file, reading the content, and adding the read content to the first slice file; and so on to the last slice file; and finally deleting other slice files except the first file.
As can be seen from the above flow, in addition to the first slice file, other slice files need to be read once, and then the data is written into the first slice file. The disadvantages of this are: on one hand, the read-write performance of the disk is greatly consumed, and on the other hand, the read-write needs to be time-consuming, so that the time consumption of the merging process is long, and the performance of the whole work is influenced.
Disclosure of Invention
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The present invention is directed to solve the above problems, and provides a method and a system for quickly merging files, so that a small portion of the read-write capability of a disk is consumed for file merging, and the merging can be performed quickly.
The technical scheme of the invention is as follows: the invention discloses a method for rapidly merging files, which comprises the following steps:
newly building an extension structure in an extension header of an index node of a first file;
assigning the information of the expansion head of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series;
and deleting the second file through a file system command to complete the combination of the first file and the second file.
According to an embodiment of the method for rapidly merging files of the present invention, the newly created extension structure indicates logical blocks and blocks mapped onto physical devices.
According to an embodiment of the method for rapidly merging files of the present invention, the assignment process includes: and assigning the logic block of the second file to the newly-built expansion structure, and assigning the block of the second file mapped to the physical equipment to the newly-built expansion structure.
According to an embodiment of the method for rapidly merging files of the present invention, the first file and the second file are 4K byte aligned before merging.
The invention also discloses a system for rapidly merging files, which comprises the following steps:
the system comprises an extended structure building module, a first file creating module and a second file creating module, wherein the extended structure building module is used for building an extended structure in an extended head of an index node of a first file;
the extended structure updating module is used for assigning the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series;
and the file deleting module deletes the second file through a file system command to complete the combination of the first file and the second file.
According to an embodiment of the system for rapidly merging files in the present invention, the new extension structure of the new extension structure creation module indicates logical blocks and blocks mapped onto physical devices.
According to an embodiment of the system for rapidly merging files of the present invention, the extended structure updating module assigns the logical block of the second file to the newly created extended structure, and assigns the block of the second file mapped to the physical device to the newly created extended structure.
According to an embodiment of the system for fast merging of files of the present invention, the system further includes:
and the byte alignment module is used for carrying out 4K byte alignment on the first file and the second file before merging.
Compared with the prior art, the invention has the following beneficial effects: the invention does not relate to file reading and writing in the traditional technology, and only needs to construct the extension and set some parameters, thereby greatly improving the merging speed. In addition, because the contents of the two files do not need to be read and written, the disk loss does not exist, and the disk can provide all read-write capability for other applications. For example, for some video non-linear editing, the next editing task can be performed simultaneously.
Drawings
FIG. 1 is a flow chart illustrating an embodiment of a method for fast merging files of the present invention.
FIG. 2 illustrates a schematic diagram of an embodiment of a system for fast file merging of the present invention.
Fig. 3 and 4 show example diagrams of file rapid merging.
Detailed Description
The above features and advantages of the present disclosure will be better understood upon reading the detailed description of embodiments of the disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components having similar relative characteristics or features may have the same or similar reference numerals.
Embodiment of method for quickly merging files
FIG. 1 shows a flow chart of an embodiment of a method for fast merging of files of the present invention. Referring to fig. 1, the following is a detailed description of implementation steps of the file fast merging method of the present embodiment. In this embodiment, the merging of two files is taken as an example for description, the merging process of multiple files is similar, and the merging is taken as an example of the ext4 file system which is currently mainstream.
The ext4 file system stores files by using inodes (index nodes) in combination with data, wherein the inodes store metadata information such as file size, file owner, creation time, etc., and the data are stored in data blocks (data blocks). The index node inode and the data block are associated using an extension in ext4, and an extension is a set of data blocks with consecutive addresses. For example, a 10M file, may be stored in consecutive data blocks, and only one extension need be used to record this information. If the contiguous space is not large enough, multiple extend structures may be required to store the data information. When more extensions are needed (more than 3), the extensions can be stored through the cascade connection of the extensions. Taking two files a and B to be merged as an example, the storage manner of the files a and B on the disk is shown in fig. 1.
Step S1: and newly building an extension structure in the extension header of the index node of the first file.
The newly created extension structure indicates the logical blocks and the blocks mapped to the physical devices. For example, in the secondary ext4_ extended _ header of the inode (inode) of the file a, an ext4_ extension structure is newly created:
where ee _ block and ee _ len together constitute a logical block, and ee _ start _ hi and ee _ start _ lo constitute a block mapped on a physical device.
Step S2: and assigning the information of the expansion header of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series.
Assigning the information of the second-level ext4_ extended _ header of the file B to the newly created ext4_ extent in step S1, and then concatenating the content of the file B to the file a, so the ee _ block in the newly created ext4_ ext is ee _ block + ee _ len +1 of the last ext4_ ext of the file a, where "+ 1" refers to the beginning of the following part;
ee _ len in the newly created ext4_ ext is ee _ len in the ext of file B;
the last two items ee _ start _ hi and ee _ start _ lo in the newly created ext4_ extant can be copied from the extent structure of the file B.
Step S3: and deleting the second file through a command of the file system to complete the combination of the first file and the second file.
After the file B is deleted by the file system, the inode and the extend information of the file B are deleted. After the merging is completed, the file distribution in the file system is shown in fig. 2.
The present invention also has certain requirements for files that require 4 kbyte alignment. Considering that the application scenario of the present invention is a specific type of scenario, which has high performance requirement, the present invention is highly required to improve the performance, and therefore the application needs to align the processed content 4K. One possible application scenario of the present invention is the non-linear editing of video, which can fill data blocks to achieve 4k alignment. Many related applications have the requirement to fill in data to align, and therefore this requirement is relatively easy to meet.
Embodiments of a System for fast merging of files
Fig. 2 illustrates the principle of an embodiment of the system for fast merging of files of the present invention. Referring to fig. 2, the system of the present embodiment includes: the system comprises an extended structure new building module 1, an extended structure updating module 2 and a file deleting module 3. The expansion structure new building module 1 is connected with the expansion structure updating module 2, and the expansion structure updating module 2 is connected with the file deleting module 3.
The extended structure creating module 1 creates an extended structure in an extended header of an inode of the first file.
The newly created extension structure indicates the logical blocks and the blocks mapped to the physical devices. For example, in the secondary ext4_ extended _ header of the inode (inode) of the file a, an ext4_ extension structure is newly created:
where ee _ block and ee _ len together constitute a logical block, and ee _ start _ hi and ee _ start _ lo constitute a block mapped on a physical device.
And the extended structure updating module 2 assigns the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series.
Assigning the information of the second-level ext4_ extended _ header of the file B to the newly created ext4_ extent in step S1, and then concatenating the content of the file B to the file a, so the ee _ block in the newly created ext4_ ext is ee _ block + ee _ len +1 of the last ext4_ ext of the file a, where "+ 1" refers to the beginning of the following part;
ee _ len in the newly created ext4_ ext is ee _ len in the ext of file B;
the last two items ee _ start _ hi and ee _ start _ lo in the newly created ext4_ extant can be copied from the extent structure of the file B.
And the file deleting module 3 deletes the second file through a file system command to complete the combination of the first file and the second file. After the file B is deleted by the file system, the inode and the extend information of the file B are deleted. After the merging is completed, the file distribution in the file system is shown in fig. 2.
The present invention also has certain requirements for files that require 4 kbyte alignment. Considering that the application scenario of the present invention is a specific type of scenario, which has high performance requirement, the present invention is highly required to improve the performance, and therefore the application needs to align the processed content 4K. The byte alignment module 4 performs 4K byte alignment on the first file and the second file before merging. One possible application scenario of the present invention is the non-linear editing of video, which can fill data blocks to achieve 4k alignment. Many related applications have the requirement to fill in data to align, and therefore this requirement is relatively easy to meet.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (4)
1. A method for rapidly merging files is characterized by comprising the following steps:
newly building an expansion structure in an expansion head of the index node of the first file, wherein the newly built expansion structure indicates a logical block and a block mapped to the physical equipment;
assigning the information of the extension header of the index node of the second file to the newly-built extension structure of the first file so as to enable the content of the second file to be connected to the first file in series, wherein the assigning process comprises the following steps: assigning the logic block of the second file to the newly-built expansion structure, and assigning the block of the second file mapped to the physical equipment to the newly-built expansion structure;
and deleting the second file through a file system command to complete the combination of the first file and the second file.
2. The method of claim 1, wherein the first file and the second file are 4K byte aligned before being merged.
3. A system for fast merging of files, comprising:
the system comprises an extended structure creating module, a first file creating module and a second file creating module, wherein the extended structure is newly created in an extended head of an index node of the first file, and a logic block and a block mapped to physical equipment are indicated in the extended structure newly created by the extended structure creating module;
the extended structure updating module assigns the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series, wherein the extended structure updating module assigns the logic block of the second file to the newly-built extended structure and assigns the block of the second file mapped to the physical equipment to the newly-built extended structure;
and the file deleting module deletes the second file through a file system command to complete the combination of the first file and the second file.
4. The system for fast merging of files according to claim 3, further comprising:
and the byte alignment module is used for carrying out 4K byte alignment on the first file and the second file before merging.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611199551.0A CN108228673B (en) | 2016-12-22 | 2016-12-22 | Method and system for rapidly merging files |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611199551.0A CN108228673B (en) | 2016-12-22 | 2016-12-22 | Method and system for rapidly merging files |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108228673A CN108228673A (en) | 2018-06-29 |
CN108228673B true CN108228673B (en) | 2021-09-03 |
Family
ID=62656250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611199551.0A Active CN108228673B (en) | 2016-12-22 | 2016-12-22 | Method and system for rapidly merging files |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108228673B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102164161A (en) * | 2011-01-10 | 2011-08-24 | 清华大学 | Method and device for performing file layout extraction on parallel file system |
CN102707987A (en) * | 2011-03-15 | 2012-10-03 | 微软公司 | Extent virtualization |
CN102982151A (en) * | 2012-11-27 | 2013-03-20 | 南开大学 | Method for merging multiple physical files into one logic file |
CN103605726A (en) * | 2013-11-15 | 2014-02-26 | 中安消技术有限公司 | Method and system for accessing small files, control node and storage node |
CN104572670A (en) * | 2013-10-15 | 2015-04-29 | 方正国际软件(北京)有限公司 | Small file storage, query and deletion method and system |
CN105528348A (en) * | 2014-09-28 | 2016-04-27 | 阿里巴巴集团控股有限公司 | Media file processing method and apparatus |
CN105956183A (en) * | 2016-05-30 | 2016-09-21 | 广东电网有限责任公司电力调度控制中心 | Method and system for multi-stage optimization storage of a lot of small files in distributed database |
CN106021585A (en) * | 2016-06-02 | 2016-10-12 | 同济大学 | Traffic incident video access method and system based on time-space characteristics |
WO2016175880A1 (en) * | 2015-04-29 | 2016-11-03 | Hewlett Packard Enterprise Development Lp | Merging incoming data in a database |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104899225B (en) * | 2014-03-07 | 2018-10-16 | 北京四达时代软件技术股份有限公司 | Object Relation Mapping method, apparatus and processor |
CN105224607B (en) * | 2015-09-06 | 2019-05-24 | 浪潮(北京)电子信息产业有限公司 | A kind of Virtual File System design method for simulating cloud storage equipment |
-
2016
- 2016-12-22 CN CN201611199551.0A patent/CN108228673B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102164161A (en) * | 2011-01-10 | 2011-08-24 | 清华大学 | Method and device for performing file layout extraction on parallel file system |
CN102707987A (en) * | 2011-03-15 | 2012-10-03 | 微软公司 | Extent virtualization |
CN102982151A (en) * | 2012-11-27 | 2013-03-20 | 南开大学 | Method for merging multiple physical files into one logic file |
CN104572670A (en) * | 2013-10-15 | 2015-04-29 | 方正国际软件(北京)有限公司 | Small file storage, query and deletion method and system |
CN103605726A (en) * | 2013-11-15 | 2014-02-26 | 中安消技术有限公司 | Method and system for accessing small files, control node and storage node |
CN105528348A (en) * | 2014-09-28 | 2016-04-27 | 阿里巴巴集团控股有限公司 | Media file processing method and apparatus |
WO2016175880A1 (en) * | 2015-04-29 | 2016-11-03 | Hewlett Packard Enterprise Development Lp | Merging incoming data in a database |
CN105956183A (en) * | 2016-05-30 | 2016-09-21 | 广东电网有限责任公司电力调度控制中心 | Method and system for multi-stage optimization storage of a lot of small files in distributed database |
CN106021585A (en) * | 2016-06-02 | 2016-10-12 | 同济大学 | Traffic incident video access method and system based on time-space characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN108228673A (en) | 2018-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101908073B (en) | Method for deleting duplicated data in file system in real time | |
KR102128138B1 (en) | Hydration and dehydration with placeholders | |
CN101917396B (en) | Real-time repetition removal and transmission method for data in network file system | |
WO2020041928A1 (en) | Data storage method and system and terminal device | |
CN107766374B (en) | Optimization method and system for storage and reading of massive small files | |
US20100114843A1 (en) | Index Compression In Databases | |
US20170031948A1 (en) | File synchronization method, server, and terminal | |
CN111291023A (en) | Data migration method, system, device and medium | |
CN102272751B (en) | Data integrity in a database environment through background synchronization | |
CN111352586B (en) | Directory aggregation method, device, equipment and medium for accelerating file reading and writing | |
CN103544077A (en) | Data processing method and device and shared memory device | |
US20070156778A1 (en) | File indexer | |
CN110799961B (en) | System and method for creating and deleting tenants in database | |
CN110569147B (en) | Deleted file recovery method based on index, terminal device and storage medium | |
CN109240607B (en) | File reading method and device | |
CN110647423B (en) | Method, device and readable medium for creating storage volume mirror image based on application | |
CN114237519A (en) | Method, device, equipment and medium for migrating object storage data | |
CN103793475A (en) | Distributed file system data migration method | |
CN103744875A (en) | Data rapid transferring method and system based on file system | |
TWI397060B (en) | Disk layout method for object-based storage device | |
US8589454B2 (en) | Computer data file merging based on file metadata | |
CN106326040A (en) | Method and device for managing snapshot metadata | |
CN103713926A (en) | Method and system for updating software of client in advance | |
CN115114232A (en) | Method, device and medium for enumerating historical version objects | |
CN104572492A (en) | Method and apparatus of burning data to FAT (File Allocation Table)32 partition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Method and system for quickly merging files Effective date of registration: 20220330 Granted publication date: 20210903 Pledgee: Societe Generale Bank Co.,Ltd. Qingpu Branch of Shanghai Pledgor: SHANGHAI KAIXIANG INFORMATION TECHNOLOGY CO.,LTD. Registration number: Y2022980003249 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |