CN108228673B - Method and system for rapidly merging files - Google Patents

Method and system for rapidly merging files Download PDF

Info

Publication number
CN108228673B
CN108228673B CN201611199551.0A CN201611199551A CN108228673B CN 108228673 B CN108228673 B CN 108228673B CN 201611199551 A CN201611199551 A CN 201611199551A CN 108228673 B CN108228673 B CN 108228673B
Authority
CN
China
Prior art keywords
file
newly
merging
files
extended
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611199551.0A
Other languages
Chinese (zh)
Other versions
CN108228673A (en
Inventor
丁晓杰
颜新波
曹敬涛
王磊
徐启亮
张海圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kaixiang Information Technology Co ltd
Original Assignee
Shanghai Kaixiang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kaixiang Information Technology Co ltd filed Critical Shanghai Kaixiang Information Technology Co ltd
Priority to CN201611199551.0A priority Critical patent/CN108228673B/en
Publication of CN108228673A publication Critical patent/CN108228673A/en
Application granted granted Critical
Publication of CN108228673B publication Critical patent/CN108228673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Abstract

The invention discloses a method and a system for rapidly merging files, which enable the merging of the files to consume a small part of the read-write capacity of a disk and also enable the merging to be rapidly carried out. The technical scheme is as follows: newly building an extension structure in an extension header of an index node of a first file; assigning the information of the expansion head of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series; and deleting the second file through a file system command to complete the combination of the first file and the second file.

Description

Method and system for rapidly merging files
Technical Field
The invention relates to a data storage technology in the field of computers, in particular to a method and a system for quickly merging different files.
Background
The development of the cloud computing technology has more application scenes: and segmenting a certain file, then carrying out concurrent processing, and splicing the files after the processing is finished.
The purpose of splitting the files is to accelerate the processing process after the files are concurrently split so as to achieve higher efficiency, but since the splicing needs to be read from a disk and written back to split the files into one file, the processing logic can only be performed in series, and the overall efficiency is not obviously improved.
Specifically, after slicing, concurrent processing is performed, and a plurality of slice files are formed after the processing is completed. The current common practice is to open the first slice file and prepare to add content to this file; opening the second file, reading the content, and adding the read content to the first slice file; and so on to the last slice file; and finally deleting other slice files except the first file.
As can be seen from the above flow, in addition to the first slice file, other slice files need to be read once, and then the data is written into the first slice file. The disadvantages of this are: on one hand, the read-write performance of the disk is greatly consumed, and on the other hand, the read-write needs to be time-consuming, so that the time consumption of the merging process is long, and the performance of the whole work is influenced.
Disclosure of Invention
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The present invention is directed to solve the above problems, and provides a method and a system for quickly merging files, so that a small portion of the read-write capability of a disk is consumed for file merging, and the merging can be performed quickly.
The technical scheme of the invention is as follows: the invention discloses a method for rapidly merging files, which comprises the following steps:
newly building an extension structure in an extension header of an index node of a first file;
assigning the information of the expansion head of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series;
and deleting the second file through a file system command to complete the combination of the first file and the second file.
According to an embodiment of the method for rapidly merging files of the present invention, the newly created extension structure indicates logical blocks and blocks mapped onto physical devices.
According to an embodiment of the method for rapidly merging files of the present invention, the assignment process includes: and assigning the logic block of the second file to the newly-built expansion structure, and assigning the block of the second file mapped to the physical equipment to the newly-built expansion structure.
According to an embodiment of the method for rapidly merging files of the present invention, the first file and the second file are 4K byte aligned before merging.
The invention also discloses a system for rapidly merging files, which comprises the following steps:
the system comprises an extended structure building module, a first file creating module and a second file creating module, wherein the extended structure building module is used for building an extended structure in an extended head of an index node of a first file;
the extended structure updating module is used for assigning the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series;
and the file deleting module deletes the second file through a file system command to complete the combination of the first file and the second file.
According to an embodiment of the system for rapidly merging files in the present invention, the new extension structure of the new extension structure creation module indicates logical blocks and blocks mapped onto physical devices.
According to an embodiment of the system for rapidly merging files of the present invention, the extended structure updating module assigns the logical block of the second file to the newly created extended structure, and assigns the block of the second file mapped to the physical device to the newly created extended structure.
According to an embodiment of the system for fast merging of files of the present invention, the system further includes:
and the byte alignment module is used for carrying out 4K byte alignment on the first file and the second file before merging.
Compared with the prior art, the invention has the following beneficial effects: the invention does not relate to file reading and writing in the traditional technology, and only needs to construct the extension and set some parameters, thereby greatly improving the merging speed. In addition, because the contents of the two files do not need to be read and written, the disk loss does not exist, and the disk can provide all read-write capability for other applications. For example, for some video non-linear editing, the next editing task can be performed simultaneously.
Drawings
FIG. 1 is a flow chart illustrating an embodiment of a method for fast merging files of the present invention.
FIG. 2 illustrates a schematic diagram of an embodiment of a system for fast file merging of the present invention.
Fig. 3 and 4 show example diagrams of file rapid merging.
Detailed Description
The above features and advantages of the present disclosure will be better understood upon reading the detailed description of embodiments of the disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components having similar relative characteristics or features may have the same or similar reference numerals.
Embodiment of method for quickly merging files
FIG. 1 shows a flow chart of an embodiment of a method for fast merging of files of the present invention. Referring to fig. 1, the following is a detailed description of implementation steps of the file fast merging method of the present embodiment. In this embodiment, the merging of two files is taken as an example for description, the merging process of multiple files is similar, and the merging is taken as an example of the ext4 file system which is currently mainstream.
The ext4 file system stores files by using inodes (index nodes) in combination with data, wherein the inodes store metadata information such as file size, file owner, creation time, etc., and the data are stored in data blocks (data blocks). The index node inode and the data block are associated using an extension in ext4, and an extension is a set of data blocks with consecutive addresses. For example, a 10M file, may be stored in consecutive data blocks, and only one extension need be used to record this information. If the contiguous space is not large enough, multiple extend structures may be required to store the data information. When more extensions are needed (more than 3), the extensions can be stored through the cascade connection of the extensions. Taking two files a and B to be merged as an example, the storage manner of the files a and B on the disk is shown in fig. 1.
Step S1: and newly building an extension structure in the extension header of the index node of the first file.
The newly created extension structure indicates the logical blocks and the blocks mapped to the physical devices. For example, in the secondary ext4_ extended _ header of the inode (inode) of the file a, an ext4_ extension structure is newly created:
Figure BDA0001188755860000041
where ee _ block and ee _ len together constitute a logical block, and ee _ start _ hi and ee _ start _ lo constitute a block mapped on a physical device.
Step S2: and assigning the information of the expansion header of the index node of the second file to the newly-built expansion structure of the first file so as to enable the content of the second file to be connected to the first file in series.
Assigning the information of the second-level ext4_ extended _ header of the file B to the newly created ext4_ extent in step S1, and then concatenating the content of the file B to the file a, so the ee _ block in the newly created ext4_ ext is ee _ block + ee _ len +1 of the last ext4_ ext of the file a, where "+ 1" refers to the beginning of the following part;
ee _ len in the newly created ext4_ ext is ee _ len in the ext of file B;
the last two items ee _ start _ hi and ee _ start _ lo in the newly created ext4_ extant can be copied from the extent structure of the file B.
Step S3: and deleting the second file through a command of the file system to complete the combination of the first file and the second file.
After the file B is deleted by the file system, the inode and the extend information of the file B are deleted. After the merging is completed, the file distribution in the file system is shown in fig. 2.
The present invention also has certain requirements for files that require 4 kbyte alignment. Considering that the application scenario of the present invention is a specific type of scenario, which has high performance requirement, the present invention is highly required to improve the performance, and therefore the application needs to align the processed content 4K. One possible application scenario of the present invention is the non-linear editing of video, which can fill data blocks to achieve 4k alignment. Many related applications have the requirement to fill in data to align, and therefore this requirement is relatively easy to meet.
Embodiments of a System for fast merging of files
Fig. 2 illustrates the principle of an embodiment of the system for fast merging of files of the present invention. Referring to fig. 2, the system of the present embodiment includes: the system comprises an extended structure new building module 1, an extended structure updating module 2 and a file deleting module 3. The expansion structure new building module 1 is connected with the expansion structure updating module 2, and the expansion structure updating module 2 is connected with the file deleting module 3.
The extended structure creating module 1 creates an extended structure in an extended header of an inode of the first file.
The newly created extension structure indicates the logical blocks and the blocks mapped to the physical devices. For example, in the secondary ext4_ extended _ header of the inode (inode) of the file a, an ext4_ extension structure is newly created:
Figure BDA0001188755860000051
where ee _ block and ee _ len together constitute a logical block, and ee _ start _ hi and ee _ start _ lo constitute a block mapped on a physical device.
And the extended structure updating module 2 assigns the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series.
Assigning the information of the second-level ext4_ extended _ header of the file B to the newly created ext4_ extent in step S1, and then concatenating the content of the file B to the file a, so the ee _ block in the newly created ext4_ ext is ee _ block + ee _ len +1 of the last ext4_ ext of the file a, where "+ 1" refers to the beginning of the following part;
ee _ len in the newly created ext4_ ext is ee _ len in the ext of file B;
the last two items ee _ start _ hi and ee _ start _ lo in the newly created ext4_ extant can be copied from the extent structure of the file B.
And the file deleting module 3 deletes the second file through a file system command to complete the combination of the first file and the second file. After the file B is deleted by the file system, the inode and the extend information of the file B are deleted. After the merging is completed, the file distribution in the file system is shown in fig. 2.
The present invention also has certain requirements for files that require 4 kbyte alignment. Considering that the application scenario of the present invention is a specific type of scenario, which has high performance requirement, the present invention is highly required to improve the performance, and therefore the application needs to align the processed content 4K. The byte alignment module 4 performs 4K byte alignment on the first file and the second file before merging. One possible application scenario of the present invention is the non-linear editing of video, which can fill data blocks to achieve 4k alignment. Many related applications have the requirement to fill in data to align, and therefore this requirement is relatively easy to meet.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. A method for rapidly merging files is characterized by comprising the following steps:
newly building an expansion structure in an expansion head of the index node of the first file, wherein the newly built expansion structure indicates a logical block and a block mapped to the physical equipment;
assigning the information of the extension header of the index node of the second file to the newly-built extension structure of the first file so as to enable the content of the second file to be connected to the first file in series, wherein the assigning process comprises the following steps: assigning the logic block of the second file to the newly-built expansion structure, and assigning the block of the second file mapped to the physical equipment to the newly-built expansion structure;
and deleting the second file through a file system command to complete the combination of the first file and the second file.
2. The method of claim 1, wherein the first file and the second file are 4K byte aligned before being merged.
3. A system for fast merging of files, comprising:
the system comprises an extended structure creating module, a first file creating module and a second file creating module, wherein the extended structure is newly created in an extended head of an index node of the first file, and a logic block and a block mapped to physical equipment are indicated in the extended structure newly created by the extended structure creating module;
the extended structure updating module assigns the information of the extended header of the index node of the second file to the newly-built extended structure of the first file so as to enable the content of the second file to be connected to the first file in series, wherein the extended structure updating module assigns the logic block of the second file to the newly-built extended structure and assigns the block of the second file mapped to the physical equipment to the newly-built extended structure;
and the file deleting module deletes the second file through a file system command to complete the combination of the first file and the second file.
4. The system for fast merging of files according to claim 3, further comprising:
and the byte alignment module is used for carrying out 4K byte alignment on the first file and the second file before merging.
CN201611199551.0A 2016-12-22 2016-12-22 Method and system for rapidly merging files Active CN108228673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611199551.0A CN108228673B (en) 2016-12-22 2016-12-22 Method and system for rapidly merging files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611199551.0A CN108228673B (en) 2016-12-22 2016-12-22 Method and system for rapidly merging files

Publications (2)

Publication Number Publication Date
CN108228673A CN108228673A (en) 2018-06-29
CN108228673B true CN108228673B (en) 2021-09-03

Family

ID=62656250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611199551.0A Active CN108228673B (en) 2016-12-22 2016-12-22 Method and system for rapidly merging files

Country Status (1)

Country Link
CN (1) CN108228673B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164161A (en) * 2011-01-10 2011-08-24 清华大学 Method and device for performing file layout extraction on parallel file system
CN102707987A (en) * 2011-03-15 2012-10-03 微软公司 Extent virtualization
CN102982151A (en) * 2012-11-27 2013-03-20 南开大学 Method for merging multiple physical files into one logic file
CN103605726A (en) * 2013-11-15 2014-02-26 中安消技术有限公司 Method and system for accessing small files, control node and storage node
CN104572670A (en) * 2013-10-15 2015-04-29 方正国际软件(北京)有限公司 Small file storage, query and deletion method and system
CN105528348A (en) * 2014-09-28 2016-04-27 阿里巴巴集团控股有限公司 Media file processing method and apparatus
CN105956183A (en) * 2016-05-30 2016-09-21 广东电网有限责任公司电力调度控制中心 Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN106021585A (en) * 2016-06-02 2016-10-12 同济大学 Traffic incident video access method and system based on time-space characteristics
WO2016175880A1 (en) * 2015-04-29 2016-11-03 Hewlett Packard Enterprise Development Lp Merging incoming data in a database

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899225B (en) * 2014-03-07 2018-10-16 北京四达时代软件技术股份有限公司 Object Relation Mapping method, apparatus and processor
CN105224607B (en) * 2015-09-06 2019-05-24 浪潮(北京)电子信息产业有限公司 A kind of Virtual File System design method for simulating cloud storage equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164161A (en) * 2011-01-10 2011-08-24 清华大学 Method and device for performing file layout extraction on parallel file system
CN102707987A (en) * 2011-03-15 2012-10-03 微软公司 Extent virtualization
CN102982151A (en) * 2012-11-27 2013-03-20 南开大学 Method for merging multiple physical files into one logic file
CN104572670A (en) * 2013-10-15 2015-04-29 方正国际软件(北京)有限公司 Small file storage, query and deletion method and system
CN103605726A (en) * 2013-11-15 2014-02-26 中安消技术有限公司 Method and system for accessing small files, control node and storage node
CN105528348A (en) * 2014-09-28 2016-04-27 阿里巴巴集团控股有限公司 Media file processing method and apparatus
WO2016175880A1 (en) * 2015-04-29 2016-11-03 Hewlett Packard Enterprise Development Lp Merging incoming data in a database
CN105956183A (en) * 2016-05-30 2016-09-21 广东电网有限责任公司电力调度控制中心 Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN106021585A (en) * 2016-06-02 2016-10-12 同济大学 Traffic incident video access method and system based on time-space characteristics

Also Published As

Publication number Publication date
CN108228673A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN101908073B (en) Method for deleting duplicated data in file system in real time
KR102128138B1 (en) Hydration and dehydration with placeholders
CN101917396B (en) Real-time repetition removal and transmission method for data in network file system
WO2020041928A1 (en) Data storage method and system and terminal device
CN107766374B (en) Optimization method and system for storage and reading of massive small files
US20100114843A1 (en) Index Compression In Databases
US20170031948A1 (en) File synchronization method, server, and terminal
CN111291023A (en) Data migration method, system, device and medium
CN102272751B (en) Data integrity in a database environment through background synchronization
CN111352586B (en) Directory aggregation method, device, equipment and medium for accelerating file reading and writing
CN103544077A (en) Data processing method and device and shared memory device
US20070156778A1 (en) File indexer
CN110799961B (en) System and method for creating and deleting tenants in database
CN110569147B (en) Deleted file recovery method based on index, terminal device and storage medium
CN109240607B (en) File reading method and device
CN110647423B (en) Method, device and readable medium for creating storage volume mirror image based on application
CN114237519A (en) Method, device, equipment and medium for migrating object storage data
CN103793475A (en) Distributed file system data migration method
CN103744875A (en) Data rapid transferring method and system based on file system
TWI397060B (en) Disk layout method for object-based storage device
US8589454B2 (en) Computer data file merging based on file metadata
CN106326040A (en) Method and device for managing snapshot metadata
CN103713926A (en) Method and system for updating software of client in advance
CN115114232A (en) Method, device and medium for enumerating historical version objects
CN104572492A (en) Method and apparatus of burning data to FAT (File Allocation Table)32 partition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and system for quickly merging files

Effective date of registration: 20220330

Granted publication date: 20210903

Pledgee: Societe Generale Bank Co.,Ltd. Qingpu Branch of Shanghai

Pledgor: SHANGHAI KAIXIANG INFORMATION TECHNOLOGY CO.,LTD.

Registration number: Y2022980003249

PE01 Entry into force of the registration of the contract for pledge of patent right