CN108959313B - Concurrent processing method and device for massive small files and storage medium - Google Patents

Concurrent processing method and device for massive small files and storage medium Download PDF

Info

Publication number
CN108959313B
CN108959313B CN201710370949.4A CN201710370949A CN108959313B CN 108959313 B CN108959313 B CN 108959313B CN 201710370949 A CN201710370949 A CN 201710370949A CN 108959313 B CN108959313 B CN 108959313B
Authority
CN
China
Prior art keywords
virtual data
module
metadata
block
small files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710370949.4A
Other languages
Chinese (zh)
Other versions
CN108959313A (en
Inventor
高丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Chongqing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Chongqing Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201710370949.4A priority Critical patent/CN108959313B/en
Publication of CN108959313A publication Critical patent/CN108959313A/en
Application granted granted Critical
Publication of CN108959313B publication Critical patent/CN108959313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a concurrent processing method, a concurrent processing device and a storage medium for massive small files. The method comprises the following steps: receiving concurrent processing requests of a plurality of nodes to the mass small files; based on the concurrent processing request, calling a metadata block for storing the massive small files; respectively establishing temporary virtual data spaces for a plurality of nodes, and virtualizing metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks; and receiving a second virtual data block generated by the plurality of nodes performing virtual processing on the first virtual data block in the virtual data space, and integrating the second virtual data block. Therefore, the embodiment of the invention can virtualize a single metadata block into a plurality of temporary virtual data blocks, so that the original small files processed by a single node can be concurrently processed in the temporary virtual data spaces on a plurality of nodes, thereby not only reducing the hardware cost, but also greatly improving the processing efficiency of massive small files.

Description

Concurrent processing method and device for massive small files and storage medium
Technical Field
The invention relates to the technical field of network communication, in particular to a concurrent processing method, a concurrent processing device and a storage medium for massive small files.
Background
With the popularity of the internet and the high-speed development of the 4th Generation communication system (4G), data information on the internet is increasing geometrically. Network data such as a General Packet Radio Service (GPRS) ticket of a user form a massive high-frequency small file. At the present stage, the user frequently reads, stores, modifies and the like the small files. The access and use of small file data become a common data processing form.
Compared with the slicing based on the striping technology of the large file, the concurrency processing mode of the user for the file access is improved, and the small file is not beneficial to striping, so that the traditional data processing method aiming at the small file generally adopts a method of storing a single small file on a single data server. However, when the number of small files reaches a certain order of magnitude, repeated access to the small files in large numbers will cause performance burden and input/output (I/O) bottleneck problem for the data server. Therefore, the small files cannot divide data into a plurality of nodes as large files, and the performance of data processing is improved by improving concurrent tasks.
In addition, for most parallel file systems, this is typically accomplished through a conventional locking mechanism. As the amount of data increases and the number of requesting nodes increases, lock requests cause lock contention, resulting in severe performance degradation. Thus, there is still a bottleneck to the processing of high-volume concurrent doclet data. In addition, for the read-write operation of multiple tasks on the same data block at the same time, in order to ensure data consistency, the traditional mechanism can wait for the lock of the data block to be released and then perform the next operation, so that the parallel processing cannot be performed. The existing serial processing mode causes the efficiency of a magnetic disk to be sharply reduced, and in addition, the service life of bottom storage is easily reduced due to the fact that a large number of small files are frequently read and written.
How to perform effective concurrent processing for massive small files and reduce hardware overhead becomes a problem to be solved urgently in the industry.
Disclosure of Invention
In order to effectively perform concurrent processing on massive small concurrent files and reduce hardware overhead, embodiments of the present invention provide a concurrent processing method, apparatus and storage medium for massive small files.
In a first aspect, a concurrent processing method for massive small files is provided. The method comprises the following steps:
receiving concurrent processing requests of a plurality of nodes to the mass small files;
based on the concurrent processing request, calling a metadata block for storing the massive small files;
respectively establishing temporary virtual data spaces for a plurality of nodes, and virtualizing metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks;
and receiving a second virtual data block generated by the plurality of nodes performing virtual processing on the first virtual data block in the virtual data space, and integrating the second virtual data block.
In a second aspect, a concurrent processing apparatus for massive small files is provided. The device includes:
the application interface module is used for receiving concurrent processing requests of a plurality of nodes for the mass small files;
the kernel extension module is used for calling the metadata block for storing the massive small files based on the concurrent processing request;
the temporary virtual space module is used for respectively establishing temporary virtual data spaces for the nodes and virtualizing metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks;
and the cooperative working module is used for receiving a second virtual data block generated by the virtual processing of the first virtual data block in the virtual data space by the nodes and integrating the second virtual data block.
In a third aspect, a concurrent processing apparatus for massive small files is provided. The device includes:
a memory for storing a program;
a processor for executing the program stored by the memory, the program causing the processor to perform the method of the aspects described above.
In a fourth aspect, a storage medium is provided. The storage medium is computer readable. The storage medium has stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
Therefore, the embodiment of the invention establishes the temporary virtual data spaces for the plurality of nodes respectively, virtualizes the metadata block in the temporary virtual data spaces to obtain the first virtual data block, performs virtual processing on the first virtual data block by the plurality of nodes, and integrates the virtual processing result, so that the single metadata block can be virtualized into the plurality of temporary virtual data blocks, the original small files processed by the single node can be concurrently processed in the temporary virtual data spaces on the plurality of nodes, the hardware cost can be reduced, and the processing efficiency of massive small files can be greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a concurrent processing system oriented to a large number of small files according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a concurrent processing method for massive small files according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a concurrent processing method for massive small files according to another embodiment of the present invention;
fig. 4 is a schematic structural diagram of a concurrent processing apparatus for massive small files according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a concurrent processing apparatus for massive small files according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a schematic structural diagram of a concurrent processing system oriented to a large amount of small files according to an embodiment of the present invention.
As shown in FIG. 1, the architecture employs a three-node parallel file system deployment. The architecture may include: node (node)101, node102, node103, parallel file system 104, storage disk (disk)105, disk106, and disk 107. node101, node102 and node103 share underlying disk105, disk106 and disk107 by means of san connections.
Among them, the node101, the node102, and the node103 may be various electronic devices such as an application server. An application (App) for processing a large amount of small files in the storage disk can be installed in the application server. The parallel file system 104 may support parallel applications. Under the environment of a parallel file system, a plurality of nodes can read and write the same file at the same time. disk105, disk106, and disk107 may be used to store file data (e.g., a huge amount of small files) on the bottom layer, so that node101, node102, and node103 can concurrently read and write the same file at the same time through the parallel file system 104.
It will be appreciated that the architecture may also include a number of auxiliary devices, such as media network devices for providing communication links between various electronic devices. In particular, the network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
It should be understood that the number of devices in fig. 1 is merely illustrative. The flexible adjustment can be carried out according to the actual needs, for example, the number of the nodes and disks is increased. The following embodiments can apply the system architecture of the present embodiment to perform concurrent processing on a large number of small files.
It is understood that a large number of small files refers to a much larger number of small files, for example, 10 ten thousand small files. Since mass and small is just one case versus small and large. The embodiment does not limit the mass and the byte number of the small file. The embodiment is also applicable to concurrent processing of a small number of small files or a large file, and the content in this aspect is not limited. The embodiment of the invention has the advantages of large processing quantity and small file byte number.
Fig. 2 is a flowchart illustrating a concurrent processing method for massive small files according to an embodiment of the present invention.
As shown in fig. 2, the method comprises the steps of: s210, receiving a concurrent processing request of a plurality of nodes for a large amount of small files; s220, calling metadata blocks for storing massive small files based on the concurrent processing request; s230, respectively establishing temporary virtual data spaces for a plurality of nodes, and virtualizing metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks; s240, receiving a second virtual data block generated by the plurality of nodes performing virtual processing on the first virtual data block in the virtual data space, and integrating the second virtual data block.
In step S210, the plurality of nodes may be a plurality of application nodes, for example, a plurality of application servers. The small file can be a massive high-frequency small file formed by network data such as a GPRS ticket and the like of a user. The concurrent processing request may be, for example, a request for multiple application servers to access and change shared data simultaneously.
In step S220, data information of a huge number of small files may be included in the metadata block (e.g., represented by B1). The data information may further include: and storing data attribute information such as data position information, block size information of the data block, modification record and the like.
In step S230, the temporary virtual data space may be used for temporarily storing virtual data, such as data for virtualizing metadata blocks, i.e., mapping data of metadata. For example, virtualizing metadata chunk B1 in node 1's temporary virtual data space results in a first virtual data chunk node1B 1. Virtualizing metadata chunk B1 in node 2's temporary virtual data space results in a first virtual data chunk node2B 1. Virtualizing metadata chunk B1 in node 3's temporary virtual data space results in a first virtual data chunk node3B 1.
In step S240, each node may perform virtual processing, such as I/O processing, read/write processing, and changing a shared small file, on a first virtual data block in the disk in a respective virtual data space to obtain a second virtual data block. node1 may be node1B1+ ch1 after virtually processing node1B1 in its virtual data space. node2 may be node2B1+ ch2 after virtually processing node2B1 in its virtual data space. Similarly, the nodeN can virtually process the nodeNB1 in its virtual data space to obtain nodeNB1+ chN. B1ch1ch2.. chN can be obtained by integrating node1B1+ ch1 and node2B1+ ch2.. nodeNB1+ chN, wherein N can be a natural number.
Therefore, the embodiment of the invention establishes the temporary virtual data spaces for the plurality of nodes respectively, virtualizes the metadata block in the temporary virtual data spaces to obtain the first virtual data block, performs virtual processing on the first virtual data block by the plurality of nodes, and integrates the virtual processing result, so that the single metadata block can be virtualized into the plurality of temporary virtual data blocks, the original small files processed by the single node can be concurrently processed in the temporary virtual data spaces on the plurality of nodes, the hardware cost can be reduced, and the processing efficiency of massive small files can be greatly improved.
As a specific embodiment of the embodiment of fig. 2, after the metadata block storing the massive small files is called, the following steps may be added: locking the metadata block in response to the call; and after the second virtual data block is integrated, unlocking the metadata block. This step is a self-defined locking mechanism in the embodiment of the present invention. Specifically, the main implementation manner of the mechanism may be: once the kernel extension module calls the storage data block, the locking mechanism is triggered. The lock module communicates with the metadata module through the kernel extension module, and the metadata of the storage data block in the metadata module is locked. And after the virtual data block is integrated, the kernel extension module triggers the locking mechanism to close. And during the triggering period of the locking mechanism, the kernel extension module calls the data attribute information of the metadata module according to the IO request, and virtualizes a corresponding virtual storage data block in the temporary virtual space module. The respective hosts may complete the entire IO modification operation in their respective virtual storage data blocks in parallel. Therefore, the embodiment of the invention can avoid the problem of low disk reading and writing efficiency caused by disk lock protection under the condition of processing massive high-concurrency small file data.
In some embodiments, after the virtual storage data block is consolidated, the steps of: and performing disk write-back operation on the second virtual data block. Therefore, the embodiment of the invention can reduce the storage read-write times, reduce the disk read-write pressure of the bottom storage and effectively prolong the service life of the bottom storage by combining a plurality of concurrent operations in a plurality of nodes into one disk write-back operation.
In some embodiments, before invoking a metadata block storing a large number of small files, the steps of: one or more than two of the following data attribute information of the mass small files are stored in the metadata block in advance: file location information, data block size information, and file modification records.
In some embodiments, after the virtual storage data block is consolidated, the steps of: the second virtual data block is synchronized to the metadata block and the file system.
Fig. 3 is a flowchart illustrating a concurrent processing method for massive small files according to another embodiment of the present invention.
The embodiment can be applied to a scene of high concurrent data processing which often occurs when a simulation system processes a large number of small files. In this scenario, since the application calls a huge amount of small files, node1, node2, and node3 need to simultaneously initiate IO modification operations on the underlying data block B1 (i.e., metadata block). As shown in fig. 3, the method may include the steps of:
s310-1, an application initiates an IO request, a node1 host node simultaneously responds to the IO request of the data block B1 through an application interface, converts the IO request into an IO request which can be identified by a kernel extension module and feeds the IO request back to the kernel extension module for processing.
In some embodiments, the kernel extension module may also be replaced with a kernel module.
Similarly, in S310-2, the node2 host node responds to the IO request of the data block B1 through the application interface at the same time, converts the IO request into an IO request that can be recognized by the kernel extension module, and feeds the IO request back to the kernel extension module for processing.
S310-3, the node3 host node simultaneously responds to the IO request of the data block B1 through the application interface, converts the IO request into an IO request which can be recognized by the kernel extension module and feeds the IO request back to the kernel extension module for processing.
S320, the kernel extension module opens file system management.
S330, the kernel extension module calls data information in the metadata module according to the IO request.
S340, the kernel extension module obtains the storage address of the data information through the metadata module, and the bottom storage data block B1 is called.
S350, B1 is called to trigger a locking mechanism, the locking module communicates with the metadata module through the kernel extension module, and the data block B1 is locked.
S360-1, the node1 kernel extension module calls the data information (data block B1) of the metadata module according to the IO request, and virtualizes the corresponding virtual storage data block in the temporary virtual space module to obtain a node1B1 (first virtual data block).
S360-2, the node2 kernel extension module calls the data information of the metadata module according to the IO request, and virtualizes a corresponding virtual storage data block node2B1 in the temporary virtual space module.
S360-3, locking the metadata of the B1 data block, calling the data information of the metadata module by the node3 kernel extension module according to the IO request, and virtualizing a corresponding virtual storage data block node3B1 in the temporary virtual space module.
S370-1, the application in node1 operates node1B1 to generate node1B1+ ch 1.
S370-2, the application in node2 operates node2B1 to generate node2B1+ ch2.
S370-3, the application in node3 operates node3B1 to generate node3B1+ ch 3.
And S380, in the temporary virtual data spaces of the three nodes, the kernel extension modules of the three hosts are communicated through the cooperative working module to complete temporary virtual space integration. IO operations by three hosts on a B1 data block are eventually integrated into B1ch1ch2ch 3.
And S390, after the virtual data block is integrated, the kernel extension module triggers the locking mechanism to close, and the metadata of the storage module in the metadata module is unlocked.
S3100, merging a plurality of concurrent operations of the node1, the node2 and the node3 on the node B1 into a disk write-back operation.
S3110, synchronizing the data information of the B1ch1ch2ch3 to the metadata module and the file system management module, and completing the concurrent IO operation on the B1 data block.
In addition, in the case of no conflict, those skilled in the art can flexibly adjust the order of the above operation steps or flexibly combine the above steps according to actual needs. Various implementations are not described again for the sake of brevity. In addition, the contents of the various embodiments may be mutually incorporated by reference.
Fig. 4 is a schematic structural diagram of a concurrent processing apparatus for massive small files according to an embodiment of the present invention. As shown in fig. 4, the apparatus may include: an application interface module 410, a kernel extension module 420, a temporary virtual space module 430, and a co-operation module 440. The application interface module 410 may be configured to receive a concurrent processing request from multiple nodes for a large amount of small files; the kernel extension module 420 may be configured to call a metadata block storing a large number of small files based on a concurrent processing request; the temporary virtual space module 430 may be configured to respectively establish temporary virtual data spaces for the plurality of nodes, and virtualize a metadata block in the temporary virtual data spaces to obtain a first virtual data block; the cooperative work module 440 may be configured to receive a second virtual data block generated by a plurality of nodes performing virtual processing on the first virtual data block in the virtual data space, and integrate the second virtual data block.
It should be noted that the implementation manner of the functional modules shown in the present embodiment may be hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
In some embodiments, on the basis of the embodiment of fig. 4, it is also possible to add: and a lock module. The locking module may be configured to lock the metadata block, and after the second virtual data block is integrated, unlock the metadata block. In some embodiments, the collaboration module is further to: and performing disk write-back operation on the second virtual data block.
In some embodiments, on the basis of the embodiment of fig. 4, it is also possible to add: and a metadata module. The metadata module may be configured to store one or more of the following data attribute information of the large number of small files in the metadata block in advance: file location information, data block size information, and file modification records.
In some embodiments, on the basis of the embodiment of fig. 4, it is also possible to add: and a file system management module. And the file system management module is used for synchronizing the second virtual data block to the metadata block and the file system.
It should be noted that the apparatuses in the foregoing embodiments can be used as the execution main body in the methods in the foregoing embodiments, and can implement corresponding processes in the methods to achieve the same technical effects, and for brevity, the contents of this aspect are not described again.
Fig. 5 is a schematic structural diagram of a concurrent processing apparatus for massive small files according to another embodiment of the present invention. As shown in fig. 5, the apparatus may include: an application interface module 410, a kernel extension module 420, a temporary virtual space module 430 and a co-operation module 440, a file system management module 450, a metadata module 460 and a lock module 470.
The application interface module 410 may be configured to respond to IO requests of various system applications, convert the IO requests into IO requests that can be recognized by the kernel extension module, and feed the IO requests back to the kernel extension for processing.
The kernel extension module 420 may include an application interface module interface, a file system management module interface, a metadata module interface, a temporary virtual space module interface, a lock module interface, and a co-operation module interface. The kernel extension module 420 may connect to a system kernel. The kernel extension module 420 may call and control a file system management module, a temporary virtual space module, a metadata module, a collaborative work module, and a lock module. And the system kernel completes the calling and control of the whole system through the kernel extension module.
File system management module 450 may primarily store various configuration files and state information for the file system. The file system management module 450 may communicate with the kernel extension module and the IO request is communicated to the file system management module through the kernel control.
The metadata module 460 may primarily contain data location information, block size, modification records, and other data attribute information. The metadata module is communicated with the kernel extension module, and the kernel extension module calls data information of the metadata module according to the IO request.
The temporary virtual space module 430 may be a plurality of temporary virtual data blocks pre-deployed by the system using a memory space, and simulates a disk data type for the node to call. The temporary virtual space is communicated with the kernel extension module 420, and after the lock mechanism is triggered, the kernel extension module 420 calls the data information of the metadata module 460 according to the IO request, so as to virtualize a corresponding virtual storage data block in the temporary virtual space module. And complete the entire IO modify operation thereon.
The lock module 470 defines a lock mechanism, which is implemented mainly as follows: once the kernel extension module 420 calls the storage block, a lock mechanism is triggered, mode locking, 470 communicates with the metadata module 460 through the kernel extension module 420, and the metadata of the storage block in the metadata module 460 is locked. When the virtual data block is completely integrated, the kernel extension module 420 triggers the lock mechanism to be closed, and the metadata of the storage module in the metadata module 460 is unlocked. And during the triggering period of the locking mechanism, the kernel extension module calls the data attribute information of the metadata module according to the IO request, virtualizes a virtual storage data block in the temporary virtual space module, and completes the whole IO modification operation on the virtual storage data block.
The cooperative work module 440 may be used to mainly connect the extension cores of the respective node hosts, for completing the integration of the temporary virtual space. The system working module is communicated with the kernel extension module. After the IO modification of the temporary virtual modules of the node hosts is completed, the kernel extension module 420 of each node host integrates the data blocks of the temporary virtual space module through the cooperative work module 440. After the integration is completed, the metadata of the storage module in the metadata module 460 is unlocked, and the integrated data block is written back to the storage. The entire IO operation is completed. Meanwhile, the kernel extension module 420 synchronizes the new data block information to the metadata module 460 and the file system management module 450 of each node.
As can be seen from the above, in one aspect, in the embodiment of the present invention, a single metadata block is virtualized into a plurality of temporary virtual data blocks according to the data processing performed by the components such as the extended kernel module 420, the temporary virtual space module 430, the lock module 470, and the cooperative work module 440 on the parallel file system host, so that the mode of processing a small file by an original single task is changed, so that the small file processed by an original single node can be concurrently processed in the temporary virtual data spaces on multiple nodes, and the parallel processing efficiency of massive high-concurrency small file data is greatly improved.
On the other hand, in the embodiment of the present invention, the temporary virtual space module 430 is a plurality of temporary virtual data blocks pre-deployed by a system using a memory space, and the module simulates a disk data type for being called by a node. The temporary virtual space communicates with the kernel extension module 420, and after the locking mechanism is triggered, the kernel extension module 420 calls the data information of the metadata module 460 according to the IO request, virtualizes a corresponding virtual storage data block in the temporary virtual space module 430, and completes the whole IO modification operation thereon.
In yet another aspect, embodiments of the present invention customize a locking mechanism. The mechanism is mainly realized by the following steps: once the kernel extension module 420 invokes the storage data block, triggering the lock mechanism, the lock module 470 communicates with the metadata module 460 through the kernel extension module 420, and the metadata of the storage data block in the metadata module 460 is locked. When the virtual data block integration is completed, the kernel extension module 420 triggers the lock mechanism to close. The metadata for the storage module in the metadata module 460 is unlocked. During the triggering of the lock mechanism, the kernel extension module calls the data attribute information of the metadata module 460 according to the IO request, and virtualizes the virtual storage data block in the temporary virtual space module. And complete the entire IO modify operation thereon.
In another aspect, the kernel extension module 420 in the embodiment of the present invention may include an application interface module interface, a file system management module interface, a metadata module interface, a temporary virtual space module interface, a lock module interface, and a cooperative module interface. The kernel extension module 420 interfaces with the system kernel. The kernel extension module 420 calls and controls the file system management module 450, the temporary virtual space module 430, the metadata module 460, the cooperative work module 440, and the lock module 470. The system kernel completes the calling and control of the whole system through the kernel extension module 420.
On the other hand, in the embodiment of the present invention, the cooperative work module 440 is mainly connected to the extension kernel of each node host, and is configured to complete the integration of the temporary virtual space. The co-operation module 440 communicates with the kernel extension module 420. After the IO modification is completed on the temporary virtual modules of the node hosts, the kernel extension module 420 of each node host integrates the data blocks of the temporary virtual space module through the cooperative working module. After the integration is completed, the metadata of the storage module in the metadata module 460 is unlocked, and the integrated data block is written back to the storage. The entire IO operation is completed. Meanwhile, the kernel extension module 420 synchronizes the new data block information to the metadata module 460 and the file system management module 450 of each node.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A concurrent processing method for massive small files is characterized by comprising the following steps:
receiving concurrent processing requests of a plurality of nodes to the mass small files;
based on the concurrent processing request, calling a metadata block for storing the massive small files;
respectively establishing temporary virtual data spaces for the nodes, and virtualizing the metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks;
and receiving second virtual data blocks generated by the plurality of nodes respectively performing virtual processing on the first virtual data blocks in the virtual data space, and integrating the second virtual data blocks.
2. The method of claim 1, wherein after the invoking of the metadata block storing the mass small file, further comprises:
locking the metadata block in response to the call;
and after the second virtual data block is integrated, unlocking the metadata block.
3. The method of claim 2, wherein after said integrating the second virtual data block, further comprising:
and performing disk write-back operation on the second virtual data block.
4. The method of claim 1, wherein before the invoking of the metadata block for storing the mass small file, further comprising:
one or more than two of the following data attribute information of the mass small files are stored in the metadata block in advance: file location information, data block size information, and file modification records.
5. The method of any of claims 1-4, wherein after said integrating the virtual storage data block, further comprising:
synchronizing the second virtual data block into the metadata block and a file system.
6. A concurrent processing device for massive small files is characterized by comprising:
the application interface module is used for receiving concurrent processing requests of a plurality of nodes for the mass small files;
the kernel extension module is used for calling the metadata block for storing the massive small files based on the concurrent processing request;
the temporary virtual space module is used for respectively establishing temporary virtual data spaces for the nodes and virtualizing the metadata blocks in the temporary virtual data spaces to obtain first virtual data blocks;
and the cooperative working module is used for receiving a second virtual data block generated by the plurality of nodes respectively performing virtual processing on the first virtual data block in the virtual data space, and integrating the second virtual data block.
7. The apparatus of claim 6, further comprising:
and the locking module is used for locking the metadata block and unlocking the metadata block after the second virtual data block is integrated.
8. The apparatus of claim 7, wherein the co-operating module is further configured to:
and performing disk write-back operation on the second virtual data block.
9. The apparatus of claim 6, further comprising:
a metadata module, configured to store one or more of the following data attribute information of the mass small files in the metadata block in advance: file location information, data block size information, and file modification records.
10. The apparatus of any one of claims 6-9, further comprising:
and the file system management module is used for synchronizing the second virtual data block to the metadata block and the file system.
11. A concurrent processing device for massive small files is characterized by comprising:
a memory for storing a program;
a processor for executing a program stored by the memory, the program causing the processor to perform the method of any of claims 1-5.
12. A storage medium, which is computer-readable, comprising instructions, which when run on a computer, cause the computer to perform the method of any one of claims 1-5.
CN201710370949.4A 2017-05-23 2017-05-23 Concurrent processing method and device for massive small files and storage medium Active CN108959313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710370949.4A CN108959313B (en) 2017-05-23 2017-05-23 Concurrent processing method and device for massive small files and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710370949.4A CN108959313B (en) 2017-05-23 2017-05-23 Concurrent processing method and device for massive small files and storage medium

Publications (2)

Publication Number Publication Date
CN108959313A CN108959313A (en) 2018-12-07
CN108959313B true CN108959313B (en) 2021-03-05

Family

ID=64493826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710370949.4A Active CN108959313B (en) 2017-05-23 2017-05-23 Concurrent processing method and device for massive small files and storage medium

Country Status (1)

Country Link
CN (1) CN108959313B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347763B (en) * 2018-09-11 2021-11-05 北京邮电大学 Data scheduling method, device and system based on data queue length

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN105138571A (en) * 2015-07-24 2015-12-09 四川长虹电器股份有限公司 Distributed file system and method for storing lots of small files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855239A (en) * 2011-06-28 2013-01-02 清华大学 Distributed geographical file system
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files
CN105138571A (en) * 2015-07-24 2015-12-09 四川长虹电器股份有限公司 Distributed file system and method for storing lots of small files

Also Published As

Publication number Publication date
CN108959313A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
US10209910B2 (en) Copy-redirect on write
US10649953B2 (en) Blockchain-based data migration method and apparatus
US10782880B2 (en) Apparatus and method for providing storage for providing cloud services
US11977734B2 (en) Storage block balancing using volume part migration
US11093148B1 (en) Accelerated volumes
US9678680B1 (en) Forming a protection domain in a storage architecture
US10042714B2 (en) Point-in-time copy on write for golden image
US20120047115A1 (en) Extent reference count update system and method
CN114064563A (en) Data migration method and server based on object storage
EP3317764B1 (en) Data access accelerator
CN111881476B (en) Object storage control method, device, computer equipment and storage medium
CN108470054A (en) A kind of data access method and system
KR20220125198A (en) Data additional writing method, apparatus, electronic device, storage medium and computer programs
US20140082275A1 (en) Server, host and method for reading base image through storage area network
CN114371811A (en) Method, electronic device and computer program product for storage management
CN114996750A (en) Data sharing method and device
CN115129625A (en) Enhanced storage protocol emulation in a peripheral device
CN108959313B (en) Concurrent processing method and device for massive small files and storage medium
CN110750221B (en) Volume cloning method, apparatus, electronic device and machine-readable storage medium
US11099767B2 (en) Storage system with throughput-based timing of synchronous replication recovery
US9864643B1 (en) Using locks of different scopes in a data storage system to optimize performance and complexity
US10831794B2 (en) Dynamic alternate keys for use in file systems utilizing a keyed index
CN114003342A (en) Distributed storage method and device, electronic equipment and storage medium
US11971855B2 (en) Supporting multiple operations in transaction logging for a cloud-enabled file system
CN110058790B (en) Method, apparatus and computer program product for storing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant