CN108958659A - A kind of small documents polymerization, device and the medium of distributed memory system - Google Patents

A kind of small documents polymerization, device and the medium of distributed memory system Download PDF

Info

Publication number
CN108958659A
CN108958659A CN201810700396.9A CN201810700396A CN108958659A CN 108958659 A CN108958659 A CN 108958659A CN 201810700396 A CN201810700396 A CN 201810700396A CN 108958659 A CN108958659 A CN 108958659A
Authority
CN
China
Prior art keywords
small documents
polymerization
subdirectory
server
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810700396.9A
Other languages
Chinese (zh)
Inventor
李晓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201810700396.9A priority Critical patent/CN108958659A/en
Publication of CN108958659A publication Critical patent/CN108958659A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses small documents polymerization, device and the media of a kind of distributed memory system, this method is applied on individual server or multiple servers, for the method being applied on individual server, the whole small documents and subdirectory under source directory are obtained first, then whole small documents are recycled and is read, after every data for reading a small documents, just the data of the small documents are written to polymerizeing big file, to complete the polymerization of the small documents, until the data of whole small documents being all written in the big file of polymerization.It can be seen that, the small documents formulated under catalogue can be all polymerized to big file by this method, and big file can sufficiently use the storage capacity of network bandwidth and server, there is extraordinary performance, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.In addition, the small documents polyplant and medium of distributed memory system provided by the present invention, also have said effect.

Description

A kind of small documents polymerization, device and the medium of distributed memory system
Technical field
The present invention relates to distributed memory system fields, polymerize more particularly to a kind of small documents of distributed memory system Method, apparatus and medium.
Background technique
In distributed memory system of today, data volume geometry grade increases, and especially small documents (refer to capacity in several K File between~tens K sizes) data are huge more, if stored one by one using normal flow, a large amount of I/O operation can be occupied, The pressure of disk is caused to increase, degraded performance.Also, more seriously, since the IO such as the reading and writing of All Files, deletion are grasped Make, is required to remove the metadata information of demand file to Metadata Service, and the ability of a Metadata Service is limited, works as IOPS When higher, Metadata Service is caused bottleneck occur.
In the prior art, with the increase of the portfolio of distributed memory system, a large amount of small documents are produced in system, With the continuous accumulation of small documents, cause system processing capacity insufficient.
It can be seen that how to reduce large amount of small documents when producing large amount of small documents in distributed memory system and bring Disk I/O pressure and the pressure of Metadata Service be those skilled in the art's urgent problem to be solved.
Summary of the invention
The object of the present invention is to provide small documents polymerization, device and the media of a kind of distributed memory system, can Large amount of small documents polymerization in system is become into big file, the final pressure for reducing meta data server and the IO for improving system Energy.
In order to solve the above technical problems, the present invention provides a kind of small documents polymerization of distributed memory system, application In individual server, comprising:
Obtain the whole small documents and subdirectory under source directory;
Judge whether the small documents of the whole in the source directory read to finish;
If it is not, then reading the data of remaining one of small documents in the source directory;
The data read are written in the big file of polymerization, and return to the whole in the judgement source directory Whether the small documents read the step of finishing;
If it is, terminating.
Preferably, the whole small documents obtained under source directory and subdirectory are specifically by calling readdir function Obtain whole small documents and the subdirectory under the source directory.
Preferably, the data for reading remaining one of small documents in the source directory are specially according to the source Sequence in catalogue is successively read the data of remaining one of small documents.
In order to solve the above technical problems, being answered the present invention also provides a kind of small documents polyplant of distributed memory system For individual server, comprising:
Acquiring unit, for obtaining whole small documents and subdirectory under source directory;
Judging unit, whether the whole small documents for judging in the source directory, which read, finishes, if it is not, then touching Send out reading unit;
The reading unit, for reading the data of remaining one of small documents in the source directory;
Writing unit for the data read to be written in the big file of polymerization, and triggers the judging unit.
In order to solve the above technical problems, the present invention also provides a kind of small documents polyplant of distributed memory system, packet Memory is included, for storing computer program;
Processor realizes the small documents of distributed memory system as described above when for executing the computer program The step of polymerization.
In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, it is described computer-readable Computer program is stored on storage medium, the computer program realizes distribution as described above when being executed by processor The step of small documents polymerization of storage system.
In order to solve the above technical problems, being answered the present invention also provides a kind of small documents polymerization of distributed memory system For multiple servers, comprising:
Server-side traverses source directory, with the subdirectory of the bottom traversed out, and records the single small text in ergodic process Part;
The server-side is divided the subdirectory to each client by C/S model;
The server-side executes following polymerization to whole single small documents:
Judge all whether the single small documents read to finish;
If it is not, then reading the data of one of them remaining single small documents, the data read are write Enter to polymerizeing in big file, and returns and described judge whether the whole single small documents read the step of finishing;
If it is, the server-side terminates the polymerization to the single small documents;
Each client executes following method to the subdirectory being assigned to:
The whole small documents and subdirectory for the subdirectory being assigned to described in acquisition;
Whether the whole for the subdirectory being assigned to described in the judgement small documents, which read, finishes;
If it is not, then in the subdirectory that is assigned to described in reading remaining one of file data;
The data read are written in the big file of polymerization, and return to the son being assigned to described in the judgement judgement Whether the whole of the catalogue small documents read the step of finishing;
If it is, the polymerization task of the client terminates.
Preferably, further includes:
The client sends task completed information to the server-side after completing polymerization task.
Preferably, further includes:
The server-side stores the pathname of the subdirectory of the bottom to default file;
Then the server-side is divided the subdirectory to each client by C/S model specifically:
Line number in the default file is split according to client terminal quantity;
Start socket service to communicate to connect to establish with each client, and according to segmentation result by the default text Corresponding file initial position and line number are sent to corresponding client in part.
The small documents polymerization of distributed memory system provided by the present invention, is realized, the party by individual server Method obtains whole small documents and subdirectory under source directory first, then recycles whole small documents and reads, and every reading is one small After the data of file, just the data of the small documents are written to big file is polymerize, so that the polymerization of the small documents is completed, until inciting somebody to action The data of whole small documents are all written in the big file of polymerization.It can be seen that this method can will formulate the small documents under catalogue It is all polymerized to big file, and big file can sufficiently use the storage capacity of network bandwidth and server, there is extraordinary property Can, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.
In addition, the small documents polymerization of distributed memory system provided by the present invention, is realized by multiple servers, One is used as server-side in multiple servers, remaining to be used as client.Server-side traverses source directory, with the bottom traversed out Subdirectory, and record the single small documents in ergodic process;Subdirectory is divided to each client by C/S model, and right Whole single small documents circulations is read to be written in the big file of polymerization.It is complete in the subdirectory that each client distributes server-side Small documents circulation in portion's is read to be written in the big file of polymerization.It can be seen that this method by realize C/S mode realize server-side and The communication connection of client, so that the polymerization task of whole small documents under source directory is distributed to server-side and multiple client, Client resource is taken full advantage of, accelerates to complete polymerization task, greatly shortens the time of polymerization, reduce the shadow to regular traffic It rings.
Finally, the small documents polyplant and storage medium of distributed memory system provided by the present invention, equally have Above-mentioned beneficial effect.
Detailed description of the invention
In order to illustrate the embodiments of the present invention more clearly, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, drawings in the following description are only some embodiments of the invention, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart that a kind of individual server provided in an embodiment of the present invention executes small documents polymerization;
Fig. 2 is a kind of structure chart of the small documents polyplant of distributed memory system provided in an embodiment of the present invention;
Fig. 3 is the structure chart of the small documents polyplant of another distributed memory system provided in an embodiment of the present invention;
Fig. 4 is the flow chart that a kind of multiple servers provided in an embodiment of the present invention execute small documents polymerization;
Fig. 5 is the flow chart that the multiple servers of another kind provided in an embodiment of the present invention execute small documents polymerization.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole embodiments.Based on this Embodiment in invention, those of ordinary skill in the art are without making creative work, obtained every other Embodiment belongs to the scope of the present invention.
Core of the invention is to provide small documents polymerization, device and the medium of a kind of distributed memory system, can Large amount of small documents polymerization in system is become into big file, the final pressure for reducing meta data server and the IO for improving system Energy.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.
It should be noted that the small documents polymerization for the distributed memory system that the present invention mentions both can be applied to point In cloth storage system on some server, that is, it is applied to individual server, also can be applied to distributed memory system In multiple servers on.When being applied on multiple servers, a server in multiple servers as server-side, Remaining server is as client.Individual server just needs all small documents under source directory to aggregate into big file, and When multiple servers, then it is responsible for the distribution of polymerization task as the server of server-side and polymerize scattered small documents, as The server of client is whole small documents polymerization in the subdirectory for distributing to server-side, and End-Customer end and server-side are total With the polymerization task for completing all small documents under source directory.
Embodiment one
Fig. 1 is the flow chart that a kind of individual server provided in an embodiment of the present invention executes small documents polymerization.The party Method includes:
S10: the whole small documents and subdirectory under source directory are obtained.
It should be noted that the whole small documents obtained in this step can be the small documents arranged side by side with subdirectory, it can also It to be the small documents inside some subdirectory, and can also include subdirectory, referred to as subdirectory in subdirectory.
The quantity either path that small documents and subdirectory are only got in this step, is not really to get small text The data of part.Source directory refers to the catalogue specified in distributed memory system, can be home directory and is also possible to originate mesh Some subdirectory under record.
In specific implementation, based entirely on the library file system lib in distributed memory system, inside passes through calling Mount function is mounted to distributed memory system.
Preferably embodiment is small specifically by the whole for calling readdir function to obtain under source directory in S10 File and subdirectory.
S11: judging whether whole small documents in source directory read and finish, if it is not, then into S12, if it is, knot Beam.
S12: the data of remaining one of small documents in source directory are read.
S13: the data read are written in the big file of polymerization, and return to S11.
Under normal conditions, since the quantity of small documents is huge more, step S11 is to need to recycle to execute, and needs to illustrate , step S11 and S12 can also be call, when S11 is preceding, then when judging for the first time, is equivalent to and does not start to read in source directory Small documents, therefore, whether judging result, enters S12;If S12 preceding, when reading for the first time, only has read source directory In a small documents, there remains remaining small documents when entering S11, in source directory, therefore, judging result with regard to whether, after The step of continuing and enter S13, then returning again to reading.
It is understood that being the data for reading small documents in S12, these data are to constitute the element of small documents, when writing Enter to after polymerizeing big file, small documents are just provided with Aggregate attribute, to form a part for polymerizeing big file.Distribution is deposited Storage system studies have shown that system for big file operation, can sufficiently use the storage capacity of network bandwidth and server, have Extraordinary performance.In addition, above-mentioned steps can realize that specific procedure is repeated no more by C Plus Plus.
The small documents polymerization of distributed memory system provided in this embodiment, is realized, the party by individual server Method obtains whole small documents and subdirectory under source directory first, then recycles whole small documents and reads, and every reading is one small After the data of file, just the data of the small documents are written to big file is polymerize, so that the polymerization of the small documents is completed, until inciting somebody to action The data of whole small documents are all written in the big file of polymerization.It can be seen that this method can will formulate the small documents under catalogue It is all polymerized to big file, and big file can sufficiently use the storage capacity of network bandwidth and server, there is extraordinary property Can, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.
Embodiment two
On the basis of a upper embodiment, read source directory in remaining one of small documents data be specially according to Sequence in source directory is successively read the data of remaining one of small documents.
It is understood that it is also to press that the big file of polymerization, which is then written, if it is reading according to the sequence in source directory According to the sequence, when to all small documents under the catalogue, form is read in sequence again, so that it may be read to polymerizeing in big file It takes, to promote the performance that small documents sequence is read.
The embodiment of the embodiment two of small documents polymerization in embodiment one and to(for) distributed memory system carries out Illustrate, the present invention on this basis, also provides a kind of small documents polyplant of distributed memory system corresponding with this method. Wherein, the small documents polyplant of distributed memory system is divided into two embodiments again, respectively from the angle of functional unit and firmly The angle of part is illustrated.
Embodiment three
Fig. 2 is a kind of structure chart of the small documents polyplant of distributed memory system provided in an embodiment of the present invention.It should Device is applied to individual server, as shown in Figure 2, comprising:
Acquiring unit 10, for obtaining whole small documents and subdirectory under source directory.
Judging unit 11, whether whole small documents for judging in source directory, which read, finishes, if it is not, then triggering is read Unit 12.
Reading unit 12, for reading the data of remaining one of small documents in source directory.
Writing unit 13 for the data read to be written in the big file of polymerization, and triggers judging unit 11.
Since the embodiment of device part is corresponded to each other with the embodiment of method part, the embodiment of device part is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
The small documents polyplant of distributed memory system provided in this embodiment, is realized by individual server, the dress The whole small documents and subdirectory obtained under source directory first are set, then whole small documents are recycled and are read, every reading is one small After the data of file, just the data of the small documents are written to big file is polymerize, so that the polymerization of the small documents is completed, until inciting somebody to action The data of whole small documents are all written in the big file of polymerization.It can be seen that the present apparatus can will formulate the small documents under catalogue It is all polymerized to big file, and big file can sufficiently use the storage capacity of network bandwidth and server, there is extraordinary property Can, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.
Example IV
Fig. 3 is the structure chart of the small documents polyplant of another distributed memory system provided in an embodiment of the present invention. The device is applied to individual server, as shown in figure 3, a kind of small documents polyplant of distributed memory system, including storage Device 20, for storing computer program;
Processor 21 realizes the distributed storage as described in embodiment one or embodiment two when for executing computer program The step of small documents polymerization of system.
It is understood that the memory 20 and processor 21 in the present embodiment can be separately added to server, it can also To be the memory and processor of server itself.In specific implementation, processor 21 and memory 20 can by bus or its Its mode connects.
Since the embodiment of device part is corresponded to each other with the embodiment of method part, the embodiment of device part is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
The small documents polyplant of distributed memory system provided in this embodiment, including memory and processor, processing Device is able to carry out following method: then whole small documents and subdirectory first under acquisition source directory recycle whole small documents It reads, as soon as the data of the small documents are written to big file is polymerize, so that it is small to complete this after every data for reading a small documents The polymerization of file, until being all written to the data of whole small documents in the big file of polymerization.It can be seen that this method can will be made Determine the small documents under catalogue and is all polymerized to big file, and big file can sufficiently use the storage energy of network bandwidth and server Power has extraordinary performance, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.
Embodiment five
On the basis of embodiment one and embodiment two, the present invention also provides a kind of computer readable storage mediums.The meter It is stored with computer program on calculation machine readable storage medium storing program for executing, such as embodiment one or real is realized when computer program is executed by processor The step of applying the small documents polymerization of distributed memory system described in example two.
The functional units in various embodiments of the present invention may be integrated into one processing unit, is also possible to each Unit physically exists alone, and can also be integrated in one unit with two or more units.Above-mentioned integrated unit both may be used To use formal implementation of hardware, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution essence of the present embodiment On all or part of the part that contributes to existing technology or the technical solution can be with the shape of software product in other words Formula embodies, which is stored in a storage medium, and it is each that the present invention is executed when reading the medium The all or part of the steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk Etc. the various media that can store program code.
Computer storage medium provided in this embodiment is stored with the small documents polymerization for executing distributed memory system The program of method.This method obtains whole small documents and subdirectory under source directory first, then recycles whole small documents and reads It takes, as soon as the data of the small documents are written to big file is polymerize, to complete the small text after every data for reading a small documents The polymerization of part, until being all written to the data of whole small documents in the big file of polymerization.It can be seen that this method can will be formulated Small documents under catalogue are all polymerized to big file, and big file can sufficiently use the storage energy of network bandwidth and server Power has extraordinary performance, so that the IO pressure of disk and the pressure of Metadata Service significantly reduce.
The various embodiments described above are to be polymerize in the form of one process to small documents, although being able to solve distributed storage In system large amount of small documents there are the problem of, still, due to the method for operation of one process, the rate of polymerization is lower, ease for use and Availability may all be unable to complete the small documents processing capacity of hundred million ranks.In order to supply such problem, following Examples passes through The mode of script calls the corresponding program of the above method, can sufficiently call available server resource, reach multi-client, The purpose for converting small documents to aggregate file of multithreading, high concurrent.In specific implementation, python can be used and writes foot This.In addition, the corresponding program of the calling above method mentioned here be not to say that it is identical because, the clothes as server-side Business device needs to handle scattered small documents, and needs as the server of client to the whole in the subdirectory for distributing to it Small documents are handled, and the object only handled is different, and thought is identical.In other words, it is equivalent to a task Multiple subtasks are divided into, each server executes one of subtask.
Embodiment six
Fig. 4 is the flow chart that a kind of multiple servers provided in an embodiment of the present invention execute small documents polymerization.The party Method includes:
S20: server-side traverses source directory, with the subdirectory of the bottom traversed out, and records single in ergodic process Small documents.
In this step, bottom subdirectory is are as follows: without subdirectory, only file under this subdirectory.Get it is all most After bottom subdirectory, facilitate it is subsequent be split according to client, multi-process mode, each client uniformly handles identical Subdirectory number.Said herein is not uniformly the average in mathematical meaning, it is merely meant that the specific item of each client process The quantity gap for recording number is not excessive.
S21: server-side is divided subdirectory to each client by C/S model.
C/S model is also referred to as C/S structure, i.e. client-server structure.It is software system architecture, is passed through It can make full use of the advantage of both ends hardware environment, and task is reasonably allocated to client and server-side to realize, is reduced The communication-cost of system.
In another embodiment, between S20 and S21 further include: server-side is by the pathname of the subdirectory of the bottom It stores to default file.Then S21 specifically:
Line number in default file is split according to client terminal quantity;
Start socket service to communicate to connect to establish with each client, and will be corresponding in default file according to segmentation result File initial position and line number be sent to corresponding client.
By socket form in the present embodiment example, C/S mode is realized, can comprehensively utilize to all clients resource, accelerate Such polymerization work is completed, the influence to regular traffic is reduced.
S22: server-side executes polymerization to whole single small documents:
Judge all whether single small documents read to finish;
If it is not, then reading the data of one of them remaining single small documents, the data read are written to polymerization In big file, and returns and judge all whether single small documents read the step of finishing.
If it is, server-side terminates the polymerization to single small documents.
S20-S21 is equivalent to server-side and carries out sharing out the work for task, for server-side other than sharing out the work, also It needs the single small documents read executing polymerization.Since server-side is only to polymerize to single small documents, because This, the object involved in the step of judging is different from embodiment one, however, it will be understood that this is changing for adaptability Become, that is to say, that the two belongs to the same inventive concept, has unicity.The single small documents mentioned in the present embodiment refer to The small documents arranged side by side with first order subdirectory in source directory, rather than the small documents under a certain subdirectory.
Server-side is after terminating to the polymerization of single small documents, it is only necessary to client be waited to complete polymerization task.
S23: each client executes polymerization to the subdirectory being assigned to.Polymerization is as follows:
Obtain the whole small documents and subdirectory of the subdirectory being assigned to;
Judge whether the whole small documents for the subdirectory being assigned to read to finish;
If it is not, then reading the data of remaining one of file in the subdirectory being assigned to;
The data read are written in the big file of polymerization, and the whole for returning to the subdirectory that judgement judgement is assigned to is small Whether file reads the step of finishing;
If it is, the polymerization task of client terminates.
Each client is independent from each other when executing polymerization.When client and server-side are realized by socket When C/S model, client needs to pre-actuate socket service, is connected to the service of specified IP before executing polymerization End, the line number that the initial position and needs for receiving the reading default file that server-side sends over are read.Start thread pool, presses According to the subdirectory mean allocation that specified number of threads will be read, a certain number of subdirectories of per thread processing.Each line Journey starts to carry out all subdirectories the task that common small documents are converted into polymerization small documents by calling above-mentioned c program
The line number that client is read according to the initial position and needs that receive will need the subdirectory that processes from storage It is read out in default file in cluster.
Each server is able to carry out polymerization task for convenience, it is generally the case that writes script using python to adjust Corresponding program in aforementioned manners, this script contain server-side, the program of client, and when execution need to only specify in different forms Operation.
This script is run in the form of server-side first, the traversal work of source directory is first carried out, by journey after traversal completion Sequence handles the information states to be received such as socket.Also, server-side also carries out file polymerization according to the method described above, and waits all The state that client process is completed.
Run this script in the form of client one by one in multiple client, starting to execute will be small in all subdirectories File is polymerized to the task of big file.
The small documents polymerization of distributed memory system provided in an embodiment of the present invention, is realized by multiple servers, One is used as server-side in multiple servers, remaining to be used as client.Server-side traverses source directory, with the bottom traversed out Subdirectory, and record the single small documents in ergodic process;Subdirectory is divided to each client by C/S model, and right Whole single small documents circulations is read to be written in the big file of polymerization.It is complete in the subdirectory that each client distributes server-side Small documents circulation in portion's is read to be written in the big file of polymerization.It can be seen that this method by realize C/S mode realize server-side and The communication connection of client, so that the polymerization task of whole small documents under source directory is distributed to server-side and multiple client, Client resource is taken full advantage of, accelerates to complete polymerization task, greatly shortens the time of polymerization, reduce the shadow to regular traffic It rings.
Embodiment seven
Fig. 5 is the flow chart that the multiple servers of another kind provided in an embodiment of the present invention execute small documents polymerization.Such as Shown in Fig. 5, this method further include:
S24: client sends task completed information to server-side after completing polymerization task.
After each client waits all thread completion tasks, task completed information is sent to server-side, so that service End can determine the problem of whether client of current whole completes respective polymerization task, avoid the occurrence of unlimited waiting.
Small documents polymerization, device and the medium of distributed memory system provided by the present invention have been carried out in detail above It is thin to introduce.Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, also Can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the protection scope of the claims in the present invention It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (9)

1. a kind of small documents polymerization of distributed memory system is applied to individual server characterized by comprising
Obtain the whole small documents and subdirectory under source directory;
Judge whether the small documents of the whole in the source directory read to finish;
If it is not, then reading the data of remaining one of small documents in the source directory;
The data read are written in the big file of polymerization, and are returned described in the whole in the judgement source directory Whether small documents read the step of finishing;
If it is, terminating.
2. the small documents polymerization of distributed memory system according to claim 1, which is characterized in that the acquisition source Whole small documents and subdirectory under catalogue obtain the small text of whole under the source directory specifically by calling readdir function Part and subdirectory.
3. the small documents polymerization of distributed memory system according to claim 1, which is characterized in that the reading institute The data for stating remaining one of small documents in source directory are specially to be successively read residue according to the sequence in the source directory One of small documents data.
4. a kind of small documents polyplant of distributed memory system is applied to individual server characterized by comprising
Acquiring unit, for obtaining whole small documents and subdirectory under source directory;
Judging unit, whether the whole small documents for judging in the source directory, which read, finishes, if it is not, then triggering is read Take unit;
The reading unit, for reading the data of remaining one of small documents in the source directory;
Writing unit for the data read to be written in the big file of polymerization, and triggers the judging unit.
5. a kind of small documents polyplant of distributed memory system, which is characterized in that including memory, for storing computer Program;
Processor realizes distributed storage system as described in any one of claims 1 to 3 when for executing the computer program The step of small documents polymerization of system.
6. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize distributed memory system as described in any one of claims 1 to 3 when being executed by processor Small documents polymerization the step of.
7. a kind of small documents polymerization of distributed memory system is applied to multiple servers characterized by comprising
Server-side traverses source directory, with the subdirectory of the bottom traversed out, and records the single small documents in ergodic process;
The server-side is divided the subdirectory to each client by C/S model;
The server-side executes following polymerization to whole single small documents:
Judge all whether the single small documents read to finish;
If it is not, then reading the data of one of them remaining single small documents, the data read are written to It polymerize in big file, and returns to described judge all whether the single small documents read the step of finishing;
If it is, the server-side terminates the polymerization to the single small documents;
Each client executes following method to the subdirectory being assigned to:
The whole small documents and subdirectory for the subdirectory being assigned to described in acquisition;
Whether the whole for the subdirectory being assigned to described in the judgement small documents, which read, finishes;
If it is not, then in the subdirectory that is assigned to described in reading remaining one of file data;
The data read are written in the big file of polymerization, and return to the subdirectory being assigned to described in the judgement judgement The whole small documents whether read the step of finishing;
If it is, the polymerization task of the client terminates.
8. the small documents polymerization of distributed memory system according to claim 7, which is characterized in that further include:
The client sends task completed information to the server-side after completing polymerization task.
9. the small documents polymerization of distributed memory system according to claim 7, which is characterized in that further include:
The server-side stores the pathname of the subdirectory of the bottom to default file;
Then the server-side is divided the subdirectory to each client by C/S model specifically:
Line number in the default file is split according to client terminal quantity;
Start socket service to communicate to connect to establish with each client, and will be in the default file according to segmentation result Corresponding file initial position and line number are sent to corresponding client.
CN201810700396.9A 2018-06-29 2018-06-29 A kind of small documents polymerization, device and the medium of distributed memory system Pending CN108958659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810700396.9A CN108958659A (en) 2018-06-29 2018-06-29 A kind of small documents polymerization, device and the medium of distributed memory system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810700396.9A CN108958659A (en) 2018-06-29 2018-06-29 A kind of small documents polymerization, device and the medium of distributed memory system

Publications (1)

Publication Number Publication Date
CN108958659A true CN108958659A (en) 2018-12-07

Family

ID=64484733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810700396.9A Pending CN108958659A (en) 2018-06-29 2018-06-29 A kind of small documents polymerization, device and the medium of distributed memory system

Country Status (1)

Country Link
CN (1) CN108958659A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821164A (en) * 2021-08-20 2021-12-21 济南浪潮数据技术有限公司 Object aggregation method and device of distributed storage system
CN115499426A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Method, device, equipment and medium for transmitting mass small files
US11751722B2 (en) 2019-02-25 2023-09-12 Sharkninja Operating Llc Cooking device and components thereof

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235569A1 (en) * 2008-11-24 2010-09-16 Michael Nishimoto Storage Optimization System
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN104133882A (en) * 2014-07-28 2014-11-05 四川大学 HDFS (Hadoop Distributed File System)-based old file processing method
US20150347456A1 (en) * 2010-07-29 2015-12-03 International Business Machines Corporation Scalable and user friendly file virtualization for hierarchical storage
CN105426127A (en) * 2015-11-13 2016-03-23 浪潮(北京)电子信息产业有限公司 File storage method and apparatus for distributed cluster system
CN105573674A (en) * 2015-12-15 2016-05-11 西安交通大学 Distributed storage method oriented to a large number of small files
CN105653592A (en) * 2016-01-28 2016-06-08 浪潮软件集团有限公司 Small file merging tool and method based on HDFS
US20160224578A1 (en) * 2015-02-03 2016-08-04 Quantum Corporation Filter File System With Inode Number As Primary Database Key
CN106021585A (en) * 2016-06-02 2016-10-12 同济大学 Traffic incident video access method and system based on time-space characteristics
CN106776759A (en) * 2016-11-17 2017-05-31 郑州云海信息技术有限公司 The small documents pre-head method and system of distributed file system
CN107168651A (en) * 2017-05-19 2017-09-15 郑州云海信息技术有限公司 A kind of small documents polymerize storage processing method
CN107506447A (en) * 2017-08-25 2017-12-22 郑州云海信息技术有限公司 A kind of small documents reading/writing method and system based on local file system
CN107506466A (en) * 2017-08-30 2017-12-22 郑州云海信息技术有限公司 A kind of small documents storage method and system
CN107704203A (en) * 2017-09-27 2018-02-16 郑州云海信息技术有限公司 It polymerize delet method, device, equipment and the computer-readable storage medium of big file
US9906563B2 (en) * 2015-03-31 2018-02-27 Dell Products, Lp Policy setting for content sharing of a plurality of remotely connected computing devices in physical or virtualized space
WO2018205689A1 (en) * 2017-05-10 2018-11-15 华为技术有限公司 File merging method, storage device, storage apparatus, and storage medium
US10613988B2 (en) * 2016-09-28 2020-04-07 Micro Focus Llc Purging storage partitions of databases

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235569A1 (en) * 2008-11-24 2010-09-16 Michael Nishimoto Storage Optimization System
US20150347456A1 (en) * 2010-07-29 2015-12-03 International Business Machines Corporation Scalable and user friendly file virtualization for hierarchical storage
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN104133882A (en) * 2014-07-28 2014-11-05 四川大学 HDFS (Hadoop Distributed File System)-based old file processing method
US20160224578A1 (en) * 2015-02-03 2016-08-04 Quantum Corporation Filter File System With Inode Number As Primary Database Key
US9906563B2 (en) * 2015-03-31 2018-02-27 Dell Products, Lp Policy setting for content sharing of a plurality of remotely connected computing devices in physical or virtualized space
CN105426127A (en) * 2015-11-13 2016-03-23 浪潮(北京)电子信息产业有限公司 File storage method and apparatus for distributed cluster system
CN105573674A (en) * 2015-12-15 2016-05-11 西安交通大学 Distributed storage method oriented to a large number of small files
CN105653592A (en) * 2016-01-28 2016-06-08 浪潮软件集团有限公司 Small file merging tool and method based on HDFS
CN106021585A (en) * 2016-06-02 2016-10-12 同济大学 Traffic incident video access method and system based on time-space characteristics
US10613988B2 (en) * 2016-09-28 2020-04-07 Micro Focus Llc Purging storage partitions of databases
CN106776759A (en) * 2016-11-17 2017-05-31 郑州云海信息技术有限公司 The small documents pre-head method and system of distributed file system
WO2018205689A1 (en) * 2017-05-10 2018-11-15 华为技术有限公司 File merging method, storage device, storage apparatus, and storage medium
CN107168651A (en) * 2017-05-19 2017-09-15 郑州云海信息技术有限公司 A kind of small documents polymerize storage processing method
CN107506447A (en) * 2017-08-25 2017-12-22 郑州云海信息技术有限公司 A kind of small documents reading/writing method and system based on local file system
CN107506466A (en) * 2017-08-30 2017-12-22 郑州云海信息技术有限公司 A kind of small documents storage method and system
CN107704203A (en) * 2017-09-27 2018-02-16 郑州云海信息技术有限公司 It polymerize delet method, device, equipment and the computer-readable storage medium of big file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
严巍巍等: "SMDFS分布式海量小文件系统的大空间聚合存储技术", 《计算机研究与发展》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11751722B2 (en) 2019-02-25 2023-09-12 Sharkninja Operating Llc Cooking device and components thereof
US11766152B2 (en) 2019-02-25 2023-09-26 Sharkninja Operating Llc Cooking device and components thereof
CN113821164A (en) * 2021-08-20 2021-12-21 济南浪潮数据技术有限公司 Object aggregation method and device of distributed storage system
CN113821164B (en) * 2021-08-20 2024-02-13 济南浪潮数据技术有限公司 Object aggregation method and device of distributed storage system
CN115499426A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Method, device, equipment and medium for transmitting mass small files

Similar Documents

Publication Publication Date Title
US9575984B2 (en) Similarity analysis method, apparatus, and system
CN102831120B (en) A kind of data processing method and system
CN104301404B (en) A kind of method and device of the adjustment operation system resource based on virtual machine
CN108958659A (en) A kind of small documents polymerization, device and the medium of distributed memory system
CN101963969B (en) Method and database server for realizing load balancing in Oracle RAC (Real Application Cluster) system
CN103455526A (en) ETL (extract-transform-load) data processing method, device and system
CN105095247B (en) symbol data analysis method and system
US9836516B2 (en) Parallel scanners for log based replication
CN106055622A (en) Data searching method and system
CN109062697A (en) It is a kind of that the method and apparatus of spatial analysis service are provided
CN106257893A (en) Storage server task response method, client, server and system
CN109977168A (en) The method for synchronizing data of database and equipment preloaded based on data page
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN111930716A (en) Database capacity expansion method, device and system
CN110083306A (en) A kind of distributed objects storage system and storage method
CN105373452B (en) A kind of data back up method
CN105049524B (en) A method of the large-scale dataset based on HDFS loads
CN111949681A (en) Data aggregation processing device and method and storage medium
CN106874343A (en) The data-erasure method and system of a kind of time series database
CN110008284A (en) Method for synchronizing data of database and equipment based on data page preloading and rollback
CN106250501A (en) Report processing method and reporting system
CN107193749B (en) Test method, device and equipment
CN106293509A (en) Date storage method and system
CN109582467A (en) Processing method, system and the relevant apparatus of I/O request in a kind of storage system
CN106067158B (en) A kind of feature comparison method and device based on GPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207