CN108520016A

CN108520016A - Data storage method based on clock timer and Duo Tai upload servers and system

Info

Publication number: CN108520016A
Application number: CN201810235939.4A
Authority: CN
Inventors: 魏晓林
Original assignee: Sichuan Feixun Information Technology Co Ltd
Current assignee: Sichuan Feixun Information Technology Co Ltd
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2018-09-11

Abstract

The invention discloses a kind of data storage method and system based on clock timer and Duo Tai upload servers, this method include：Step S1 creates the thread for presetting number of threads according to clock timer, timing；The corresponding merging file of each thread is respectively created in step S2；Step S3 is incorporated into each merging file according to current time, by data dispersion after the processing under corresponding catalogue；Step S4 selects upload server by the merging file warehousing when there is the merging file to need storage.The present invention executes in-stockroom operation by the different upload server of the setting of clock timer, selection, shares the resource of occupancy, improves the efficiency of storage, the requirement of high real-time when carrying out subsequent operation to data after processing to meet.

Description

Data storage method based on clock timer and Duo Tai upload servers and system

Technical field

The present invention relates to data processing field more particularly to a kind of numbers based on clock timer and Duo Tai upload servers According to storage method and system.

Background technology

In big data business, the timeliness and accuracy of data are two important indicators, wherein timeliness is big data It is pursued always in daily record data processing strategy in framework, two parts is generally divided into the processing of daily record data：1, to original Daily record data is handled, the daily record data that obtains that treated；It 2, will treated that daily record data is uploaded to specified database deposits Storage.

(storage includes the existing storage for treated daily record data：To the merging of initial data, reports, compresses Storage), the data processing policy of T+1 is often used as to original daily record data processing, i.e., the same day can only be by the previous day Processing after data loading.The present this storage mode timeliness to data after processing is poor, and it is real-time cannot to meet data Property processing requirement.

In addition, the requirement due to real-time property is continuously improved, even if meeting the real-time of the merging of the data before reporting, If reporting merging data only with a server, this server can also exist because frequently calling reporting system order, and consume Server process resource to the greatest extent, postpones entry time instead.

Invention content

The object of the present invention is to provide a kind of data storage method based on clock timer and Duo Tai upload servers and System improves the operating time of storage, can meet the real-time property of high request.

Technical solution provided by the invention is as follows：

A kind of data storage method based on clock timer and Duo Tai upload servers, including：Step S1 is according to clock Timer, timing create the thread for presetting number of threads；The corresponding merging text of each thread is respectively created in step S2 Part；Step S3 is incorporated into each merging file according to current time, by data dispersion after the processing under corresponding catalogue；Step S4 When there is the merging file to need storage, select upload server by the merging file warehousing.

In the above-mentioned technical solutions, it is worked at the same time by the setting of clock timer and data processing server, periodically will Data carry out more progress decentralized processings after processing in the preceding paragraph time, and the combined efficiency of data after raising processing reduces follow-up The time of operation is effectively improved real-time, the regularity of data processing.

In addition, the operation for uploading storage is disperseed to be completed by more upload servers, solves and only serviced by one The frequent calling system order of device " hdfs dfs-put list.txt " and the problem of exhaust server process resource, improve number According to the time for uploading storage, the treatment effeciency to the subsequent operation of data after processing is further improved.

Further, the step S4 includes：When there is the merging file to need storage, acquisition needs to be put in storage step S41 The merging file filename；The entitled numerical value of file；Step S42 seeks predetermined server quantity the filename It is remaining, obtain complementation result；It is the upload server of the complementation result by the merging file warehousing that step S43, which selects number,.

In the above-mentioned technical solutions, corresponding upload server is selected by merging the filename of file, upload is made to take Business device can carry out in-stockroom operation in turn, reduce the frequency of every upload server calling system order, improve warehouse-in efficiency, from And the processing time of subsequent operation is reduced, improve the real-time of data processing.

Further, the step S2 is specifically included：It is pre- that step S21 does not have the corresponding thread for merging file to obtain It is marked with the value of label；Step S22 described will be preset when it is to preset unlocked definite value to have the value of the default label that thread gets After the value of label is updated to default lock value, corresponding merging file is created according to current file value；The merging document creation After the completion, it updates the current file value and the value of the default label is updated to the default unlocked definite value；Step S23 When it is default lock value to have the value of the default label that thread gets, waits for after presetting creation time, execute step S21。

In the above-mentioned technical solutions, it realizes lock mechanism by presetting the value of label, closed by current file value And the name of file, each thread will not be clashed when creating merging file, to make when being merged to data after processing It is laid the foundation with multiple threads.

Further, the step S3 includes：Step S31 is less than default according to current time, the capacity for merging file The thread of maximum capacity is incorporated into the corresponding merging file from data after acquisition processing under corresponding catalogue；Step S32, which works as, to be had When the capacity of the merging file of the thread is not less than the default maximum capacity, the merging file needs to be put in storage.

In the above-mentioned technical solutions, the default maximum capacity being rationally arranged ensure that merging of each thread to data after processing Most efficient processing capacity can be possessed, ensure that the efficiency of data processing, meet the requirement of high real-time.

Further, the step S3 further includes：Step S33 is described pre- when there is the capacity of the merging file of the thread to be less than If maximum capacity and when data after the processing have not been obtained, exclusive text is completed according to the server under the corresponding catalogue Shelves execute corresponding operation.

In the above-mentioned technical solutions, each there can be a server under corresponding catalogue and exclusive document, each line is completed Journey can monopolize document to determine how subsequent this executes in data after encountering acquisition less than new processing by this.

Further, the step S33 is specifically included：Step S331 work as the corresponding catalogue under server be completed it is exclusive When the line number of document is equal to default line number, the merging file needs to be put in storage；Step S332 works as the service under the corresponding catalogue When the line number that exclusive document is completed in device is less than default line number, after the thread waits for the default merging time, step is executed again S31。

In the above-mentioned technical solutions, the concrete processing procedure that different situations are encountered in merging process is given, it is contemplated that Every aspect, ensures that each thread can successfully be merged file warehousing, realizes the real-time of data processing.

The present invention also provides a kind of data warehousing system based on clock timer and Duo Tai upload servers, including：By Merge the server cluster of server and several upload servers composition；The merging server includes：Thread creation module, For according to clock timer, timing to create the thread for presetting number of threads；File creation module, for creating each line The corresponding merging file of journey；File combination module, for according to current time, data after the processing under corresponding catalogue to be divided It dissipates and is incorporated into each merging file；Server se-lection module, for when there is the merging file to need storage, selection to upload Server is by the merging file warehousing.

Further, the server se-lection module includes：Acquisition submodule has the merging file to need to be put in storage for working as When, obtain the filename for the merging file for needing to be put in storage；The entitled numerical value of file；Complementation submodule, being used for will be described Filename obtains complementation result to predetermined server quantity complementation；Submodule is selected, for selecting number for the complementation result Upload server by the merging file warehousing.

Further, the file creation module, for when the thread does not have the corresponding merging file, obtaining default The value of label；And when the value of the default label got is to preset unlocked definite value, more by the value of the default label New is to create corresponding merging file according to current file value after presetting lock value；After the completion of the merging document creation, update The current file value and by the value of the default label be updated to it is described preset unlocked definite value；And described in get It when the value of default label is default lock value, waits for after presetting creation time, reacquires the value for presetting label.

Further, the file combination module, the capacity for the merging file when the thread, which is less than, presets maximum hold When amount, according to current time, the corresponding merging file of the thread is incorporated into from data after acquisition processing under corresponding catalogue；With And when there is the capacity of the merging file of the thread to be not less than the default maximum capacity, the merging file needs to be put in storage.

Further, the file combination module, for described default when there is the capacity of the merging file of the thread to be less than Maximum capacity and when data after the processing have not been obtained, exclusive document is completed according to the server under the corresponding catalogue, Execute corresponding operation.

Further, the file combination module, for exclusive document to be completed when the server under the corresponding catalogue When line number is equal to default line number, the merging file needs to be put in storage；And when the server under the corresponding catalogue is completed solely When accounting for the line number of document and being less than default line number, the thread wait for it is default after merging the time, again from acquisition place under corresponding catalogue Data are incorporated into the corresponding merging file of the thread after reason.

Compared with prior art, data storage method of the invention based on clock timer and Duo Tai upload servers and System advantageous effect is：

The present invention efficiently, periodically calls multithreading to merge data after processing by the setting of clock timer Operation, improves the real-time and high efficiency of data processing.Secondly, various situations are considered in processing procedure for each thread, protect Having demonstrate,proved merging file can be with normal storage, to ensure the realizability of data loading mode.In addition, merging file needs when having When storage, different upload servers can be selected to execute, share the resource of occupancy, improve the efficiency of storage, to satisfaction pair Data carry out the requirement of high real-time when subsequent operation after processing.

Description of the drawings

Below by a manner of clearly understandable, preferred embodiment is described with reference to the drawings, clock timer is based on to one kind Give with the data storage method of more upload servers and above-mentioned characteristic, technical characteristic, advantage and its realization method of system It further illustrates.

Fig. 1 is that the present invention is based on the streams of data storage method one embodiment of clock timer and Duo Tai upload servers Cheng Tu；

Fig. 2 is that the present invention is based on clock timer and the data storage method of Duo Tai upload servers another embodiments Flow chart；

Fig. 3 is that the present invention is based on clock timer and the data storage method of Duo Tai upload servers another embodiments Flow chart；

Fig. 4 is that the present invention is based on the knots of data warehousing system one embodiment of clock timer and Duo Tai upload servers Structure schematic diagram；

Fig. 5 is that the present invention is based on clock timer and the data warehousing system of Duo Tai upload servers another embodiments Structural schematic diagram.

Drawing reference numeral explanation：

10. thread creation module, 20. file creation modules, 30. file combination modules, 40. server se-lection modules, 41. Acquisition submodule, 42. complementation submodules, 43. selection submodules.

Specific implementation mode

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, control is illustrated below The specific implementation mode of the present invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing, and obtain other embodiments.

To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, so that simplified form is easy to understand, there is identical structure or function in some figures Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".

The present invention is the framework based on HDFS (Hadoop Distributed File System, distributed file system) It realizes, by the way that timer is arranged, the log processing result under particular path is all periodically uploaded into hdfs by put, most After restart hive compression storing apparatus systems, daily record data handling result is deposited into the libraries hive in real time, it is real-time to meet the later stage The real time data extraction of the business such as advertisement dispensing and data analysis requires, while can mitigate the pressure of daily record data processing server Power utilizes server resource with reasonable.

Information sharing may be implemented in the identical storage server of each data processing server carry, and big data analysis is flat Platform server cluster can also carry out this storage server data access, realize information sharing.

As business requires continuous raising to data processing real-time, the clock timer interval time of data processing By one hour, be gradually shortened by 15 minutes, or smaller, and approach real-time processing step by step, can effectively shorten uplink file The timeliness of processing, so as to shorten the time of key step-data processing node, data processing after corresponding processing, also can at this time Enough further refinements, if but subsequent operation only is carried out to data after processing by an analysis platform server, after processing Data are refined to a certain extent, this analysis platform server can be because of frequent calling system order " hdfs dfs-put List.txt " causes the operation of this analysis platform server reported data to exhaust server process resource (mainly I/O and interior Deposit), the subsequent operation time that delayed data is handled instead.

Therefore, subsequent operation (merging, upload to data after processing and compression storage) can be by big data analysis platform Different analysis platform servers in server cluster come together to complete, to meet the high real-time requires of data processing.

In one embodiment of the invention, as shown in Figure 1, it is a kind of based on clock timer and Duo Tai upload servers Data storage method, including：

Step S1 creates the thread for presetting number of threads according to clock timer, timing；

The corresponding merging file of each thread is respectively created in step S2；

Step S3 is incorporated into each merging file according to current time, by data dispersion after the processing under corresponding catalogue；

Step S4 selects upload server by the merging file warehousing when there is the merging file to need storage.

It can be used specifically, the process that the present embodiment is put in storage primarily with respect to daily record data, in Hadoop framework big Data Analysis Platform server cluster come to data after processing (such as：Treated journal file) subsequent operation is carried out, to Data after processing are quickly uploaded to HDFS, and addition is compressed in the subregion in the libraries HIVE.By big data analysis Platform Server A node (i.e. a server) in cluster merges to multithreading file, then by multiple nodes (i.e. multiple servers) into Row uploads storage, and multiple nodes can also use multithreading when carrying out upload storage.For the ease of difference, for merging file A node be named as merging server, for upload storage server be named as upload server.

In order to improve the real-time of data processing, clock timer is provided on merging server.Clock timer one As be according to each data processing server (for by original uplink file process at data after processing) process performance setting 's.Optionally, can also according to big data analysis platform server cluster (it can be appreciated that：Merge server) processing it is specific Period (such as：One hour) processing after data capabilities setting, the setting of this clock timer is to ensure 1) every time When the time of clock timer reaches, it can handle and complete uplink file original in a upper specific time；2) simultaneously, to uplink Data after the processing of file, the time of subsequent operation (merging uploads and compress storage) can also control wheel clock herein and determine When device time within.Of course, it is possible to be handled data after processing using multithreading, therefore, clock timer is mainly gone back It is to be arranged according to the process performance of each data processing server.

Subsequent operation is the critical path of entire data processing circumstance, therefore needs exist for the subsequent operation time as far as possible Shorten, ensures the real-time of data processing.Therefore, merging server uses multithreading and merges, selects to data after processing It selects different upload servers and is uploaded to merging file.

When reaching the time of clock timer, in addition to each data server can be handled original uplink file, Meanwhile the subsequent operation for merging server meeting log-on data daily record in big data analysis platform server cluster is (i.e. to processing The subsequent operation of data afterwards).It should be noted that data after processing can be saved under particular file folder by each data server, It specially handles an original uplink file well, data after this processing will be saved under particular file folder, it is convenient The execution of subsequent operation.

About union operation, first, host process creates the thread for presetting number of threads, realizes multiple threads.Default line Number of passes amount can be arranged according to actual conditions, as long as can guarantee that each thread can will be specific in the time interval of clock timer In period (such as：Upper one hour) processing after data all merged, and corresponding upload server is allowed to be put in storage .Such as：Clock timer is set as integral point startup, and when 16, each data server starts original between 15-16 points Uplink file handled, while the host process of big data analysis platform server cluster creates 5 threads, for each Data merge after the processing for the 15-16 points that data server is dealt.

Per thread is required for creating its corresponding merging file, and data merge after facilitating it to handle a rule of acquisition Into this merging file, unified upload is carried out.

There is particular memory structure as storage original uplink file corresponding with its of data after processing, such as：Often It original uplink file is divided into 24 files and is stored under particular category, and data are also classified into 24 files after corresponding processing Folder is stored under another particular category.Therefore, when each thread starts, corresponding catalogue can be found according to current time Under processing after data carry out subsequent operation.

Such as：The original uplink file of 14-15 points is stored in the file of the D disks of storage server/2018/0308/1400 Under, data are stored under the file of the E disks of storage server/2018/0308/1400 (assuming that all after corresponding processing Original uplink file is stored in D disks, and data are stored in E disks, a file of natural gift 24 storage after corresponding processing), 15 points of wounds 6 threads are built, per thread, all can be from the file of E disks/2018/0308/1400 after creating its corresponding merging file Data after lower acquisition processing, when acquisition are all that a rule obtains, in this way can by 6 threads simultaneously to the processing under this file after Data merge, and improve the efficiency of data merging, substantially reduce and merge the time.

Dispersion merging is carried out to data after the processing under same catalogue by multithreading simultaneously, when the merging file for having thread needs When being put in storage, this thread can select upload server that will merge file warehousing.Preferably, each thread will select upload server The merging file of oneself is uploaded to HDFS, and in additional compression storage to the subregion in the libraries HIVE.HIVE is based on Hadoop The data file of structuring can be mapped as a database table, data may be implemented by this by one Tool for Data Warehouse Partition management, facilitate subsequent calling.The current time being put in storage can be needed according to file is merged, to select corresponding upload Server, such as：16：25 have a file that merges to need to be put in storage, then selecting to number the upload server for 5；Can also have Upload server is arranged in order, successively the selection past, such as：There is merging file A to be put in storage, the 1st is arranged with regard to selection Upload server has merging file B to be put in storage, with regard to selecting the upload server ... for arranging the 2nd to select in turn successively.

The present embodiment is worked at the same time by the setting and data processing server of clock timer, periodically by the preceding paragraph when In processing after data carry out carry out decentralized processings, the combined efficiency of data after raising processing, reduce subsequent operation when Between, it is effectively improved real-time, the regularity of data processing.

In another embodiment of the present invention, the improvement based on a upper embodiment, as shown in Fig. 2, step S2 is specifically wrapped It includes：

Step S21 does not have the corresponding thread for merging file to obtain the value for presetting label；

Step S22 is when it is to preset unlocked definite value to have the value of the default label that thread gets, by the pre- bidding After the value of label is updated to default lock value, corresponding merging file is created according to current file value；The merging document creation is complete Cheng Hou updates the current file value and the value of the default label is updated to the default unlocked definite value；

Step S23 is when it is default lock value to have the value of the default label that thread gets, when waiting for default create Between after, execute step S21.

Specifically, when host process creates several multithreadings, per thread can voluntarily create the merging file of oneself. Per thread can obtain the value of default label first, and the merging file of oneself create according to the value of default label.

The different value of label is preset to realize lock mechanism by setting, is monopolized.When the thread for creating preset data Afterwards, current file value, the merging file of convenient follow-up each thread creation oneself can be set.

Such as：There are 4 threads A, B, C, D, it is 0 to preset unlocked definite value, and it is 1 to preset lock value, and current file value is at the beginning For list=0.Per thread all starts to obtain the value of default label, they obtain into that will necessarily exist successively, if thread A is obtained The value of the default label arrived is 0, i.e. tag=0, then illustrates to be not locked out, the value of default label is revised as default lock by horse back Definite value, i.e. tag=1；When so other threads obtain the value for presetting label, it finds that tag=1, can not create their conjunction And file, after waiting for a period of time, then obtain.Thread A is returned to, after it completes tag=1, can obtain ought be above The initial value of part value, the current file value of default setting is therefore 0, i.e. list=0 use the value of list to merge text as it The title of part is created, i.e. list.txt, it is understood that and it is 0.txt, after the completion of establishment, updates current file value, to prevent Subsequent thread and it the case where merging file can not be created using identical title, can allow list++, reform into this way List=1 allows the value of default label to become presetting unlocked definite value again later, i.e. tag=0, such thread A just complete It merges the establishment of file, and data merge after carrying out subsequent processing.And thread B, C, D are waiting for default creation time (such as：Above-mentioned steps are repeated after 100ms), can be tag=0 there are one the value for presetting label is obtained in 3, it is assumed that be thread C, it does the same thing with the thread A of front, and the current file value that only it is obtained is 1, and the merging file of establishment is 1.txt, then list++, has reformed into list=2 ..., until per thread has it to merge file, such as：Thread A Merging file be 0.txt, the merging file of thread B is 3.txt, and the merging file of thread C is 1.txt, the merging text of thread D Part is 2.txt.

It should be noted that when current file value can be reached with the time of every wheel clock timer, default value, example are reverted to Such as 0, default value can also be reverted to again after one day, merge file as long as ensureing to successfully create, do not influence to number after processing According to subsequent operation, embodiment of current file value is not construed as limiting at this.

The merging file of each thread creation can be stored under the particular category of storage server, such as：~/meger// YYYY-MM-DD/HH/list.txt.The storage organization as data after handling can also be used, associative search is facilitated.

It realizes lock mechanism by presetting the value of label in the present embodiment, file is merged by current file value Each thread will not be clashed when creating merging file, multithreading is used when to be merged to data after processing for name Processing lays the foundation.

Preferably, step S4 includes：

Step S41 obtains the filename for the merging file for needing to be put in storage when there is the merging file to need storage； The entitled numerical value of file；

The filename to predetermined server quantity complementation, is obtained complementation result by step S42；

It is the upload server of the complementation result by the merging file warehousing that step S43, which selects number,.

Specifically, when each filename for merging file is named by numerical value, it can be according to the file for merging file Name selects the corresponding upload server to be put in storage.

Firstly, it is necessary to each upload server is numbered, such as：There are 5 upload servers, it can be according on this 5 platform Server ip is passed to be numbered, such as：First, IP0；Second, IP1；Third platform, IP2；4th, IP3；5th, IP4。

Secondly, the filename for the merging file for needing to be put in storage is obtained.Such as：The needs of merging file 0 of thread A are put in storage, So thread A can obtain this merge file filename, i.e., 0.

Then, corresponding upload server is selected to predetermined server data complementation according to file.Such as：0 pair 5 is asked Remaining, obtained complementation result is 0, and number is selected to be put in storage for 0 upload server to merging file 0.The specific generation of complementation Code can be：Slave_number=int (list)/M, list here is exactly 0, M (being exactly predetermined server quantity)=5.

Finally, it is thus identified that after the upload server that should be selected, thread will execute corresponding code by script, allow phase The upload server answered is put in storage to merging file.Such as：Thread A executes " ssh root by script:IPslave_ Number " hdfs dfs-put list.txt~/* */* */" allows the upload server that number is IP0 to upload and merges file 0, The upload server of IP0 can add in compression storage to the subregion in the libraries HIVE after uploading this and merging file 0 (as unit of day Carry out subregion), then thread A terminates its execution task.

And so on, aforesaid operations can all be executed by merging thread all on server, when it merges file selection accordingly Upload server upload, compression storage after, its execution task can be terminated, when all threads execution task all at the end of, this The subsequent operation processing task of wheel also terminates, and waits for the time of next clock timer to reach, starts new task.

In the present embodiment, corresponding upload server is selected by merging the filename of file, makes upload server can To carry out in-stockroom operation in turn, the frequency of every upload server calling system order is reduced, warehouse-in efficiency is improved, to reduce The processing time of subsequent operation improves the real-time of data processing.

In another embodiment of the present invention, the improvement based on any of the above-described embodiment, as shown in figure 3, step S3 packets It includes：

For step S31 according to current time, the capacity for merging file is less than the thread of default maximum capacity from corresponding mesh Data are incorporated into the corresponding merging file after the lower acquisition processing of record；

Step S32 is when there is the capacity of the merging file of the thread to be not less than the default maximum capacity, the merging File needs to be put in storage.

Specifically, the default maximum capacity for merging file needs rationally setting, if excessive, merging time course can be made long, If too small, need that put is frequently called to instruct, influence the operating characteristics of upload server.Such as：After the processing of journal file File, it is 200M, i.e. MaxMergerFileSize=200M that default maximum capacity, which can be arranged,.

When each thread transfers data after the processing under corresponding catalogue, typically a rule is transferred, i.e., under corresponding catalogue Data are incorporated into corresponding merging file after being deployed into a processing, are then transferred one again and are incorporated into corresponding conjunction And file ... when there is the capacity of the merging file of thread to be not less than default maximum capacity, just illustrate this merge file need into Library.

Preferably, the capacity that file is merged described in step S31 is less than the thread of default maximum capacity under corresponding catalogue Acquisition processing after data be incorporated into it is corresponding it is described merge file process be specially：

As soon as data after processing are often incorporated into the mergings file by step S311, by data after this processing from right It answers and is deleted under catalogue.

Per thread when data, often obtains one after rule acquisition processing, merged after can be by it from right It answers and is deleted under catalogue, prevent other threads from repeating to call, merge, do the duplication of labour.

Such as：Have 5 threads A, B, C, D, E, corresponding merging file is 0.txt, 1.txt, 2.txt, 3.txt, 4.txt, the merging file created at the beginning be all it is empty, each thread can simultaneously to the processing under same catalogue after Data are transferred, and thread A has transferred data after first processing, data after being incorporated into 0.txt and handling first It is deleted under the catalogue, prevents other threads from repeating to call, merge, thread B transfers data after Article 2 processing, is incorporated into Thread A transfers thread after Article 6 processing to 1.txt and data are deleted under the catalogue after handling Article 2 ..., is closed And data are deleted under the catalogue after being handled to 0.txt and by Article 6 ... each thread recursive call successively is handled because of every Data is in different size afterwards, and therefore, the time that the merging file of each thread reaches default maximum capacity is different, it is assumed that line The 1.txt of journey B has reached 200M at first, then the merging file 1.txt of thread B just needs to be put in storage, thread B can be by by 1 pair Predetermined server quantity (such as：4) complementation is carried out, selects corresponding upload server that 1.txt is uploaded to HDFS and additional pressure (subregion is carried out as unit of day), and terminate its execution task in contracting storage to the subregion in the libraries HIVE, it is remaining by other lines Journey is completed.

When all threads all finish its execution task, then explanation completes the processing task of this time, when waiting for next time The time of clock timer reaches, and opens the task of a new round.

It is most efficient that the default maximum capacity being rationally arranged ensure that each thread can possess the merging of data after processing Processing capacity ensure that the efficiency of data processing, meet the requirement of high real-time.

Preferably, step S3 further includes：

Step S33 is when the capacity for the merging file for having the thread is less than the default maximum capacity and institute has not been obtained When stating data after handling, exclusive document is completed according to the server under the corresponding catalogue, executes corresponding operation.

Specifically, when data merge after to processing, sometimes will appear after there is no any processing under corresponding catalogue The case where data, generally there are two types of may for the appearance of such case：1) all processing is over data after handling；2) data processing takes Business device is also being handled, and therefore, also data after new processing are not put under this catalogue.

And the content that exclusive document is completed in server can help to determine bottom to be which kind of situation occurs, it can basis Particular situation executes corresponding operation.There can be a server under each corresponding catalogue and exclusive document, each line is completed Journey can monopolize document to determine how subsequent this executes in data after encountering acquisition less than new processing by this.

Preferably, step S33 is specifically included：

Step S331 is when the line number that exclusive document is completed in the server under the corresponding catalogue is equal to default line number, institute Merging file is stated to need to be put in storage；

Step S332 is when the line number that exclusive document is completed in the server under the corresponding catalogue is less than default line number, institute After stating the thread waiting default merging time, step S31 is executed again.

Specifically, for convenience inquire, data processing server processing complete distribute to its initial data after, can will It identifies deposit server and exclusive document is completed, convenient subsequently to judge according to the storage rule of one mark of a line.

Default line number be exactly refer to the quantity of data processing server, if having 10 data processing servers to initial data It is handled, data after being handled, then default line number is set to 10.

Such as：There are 4 threads A, B, C, D, the merging file of thread A to have reached 200M (assuming that default maximum capacity is 200M), it has uploaded this and has merged file, thread A has finished on its execution task.When thread B is obtaining corresponding mesh After processing under record when data, data after not new processing are found, it will transfer the server under this catalogue and be completed solely Document is accounted for, confirms whether its line number is equal to 10, if being equal to 10, illustrates that all data processing servers have been completed data Processing operation would not also generate data after new processing again, and the merging file of thread B can be put in storage, and thread B can be selected pair Its merging file warehousing is terminated its execution task by the upload server answered, and is similarly thread C and line woth no need to wait for again The subsequent operation of journey D, in this way this wheel execute completion, wait for the time of the clock timer of next round to reach and open new task.

The default merging time refers to that each thread finds to obtain less than data after new processing, but at not all data Reason server all handles the time for needing to wait for when completing, and can be configured according to actual needs, such as：100ms.It is default to close And time and default creation time may be the same or different, and are voluntarily arranged according to practical by engineer.

Such as：There are 4 threads A, B, C, D, thread C data after having merged a processing when obtaining again, to find to obtain It less than data after new processing, finds there are 7 rows when having read server and exclusive document being completed, is less than default line number 12, because This, data ... are similarly thread A, B, D after waiting for after 100ms acquisition processing again.

The present embodiment gives the concrete processing procedure that different situations are encountered in merging process, it is contemplated that side's aspect Face ensures that each thread can successfully be merged file warehousing, realizes the real-time of data processing.

In another embodiment of the present invention, as shown in figure 4, a kind of being based on clock timer and Duo Tai upload servers Data warehousing system, including：The server cluster being made of merging server and several upload servers；

The merging server includes：

Thread creation module 10, for according to clock timer, timing to create the thread for presetting number of threads；

File creation module 20, for creating the corresponding merging file of each thread；

File combination module 30, for according to current time, data dispersion after the processing under corresponding catalogue being incorporated into each The merging file；

Server se-lection module 40, for when there is the merging file to need storage, selecting upload server described Merge file warehousing.

About union operation, first, host process creates the thread for presetting number of threads, realizes multiple threads.Default line Number of passes amount can be arranged according to actual conditions, as long as can guarantee that each thread can will be specific in the time interval of clock timer In period (such as：Upper one hour) processing after data all merged, and corresponding upload server is allowed to be put in storage .

Dispersion merging is carried out to data after the processing under same catalogue by multithreading simultaneously, when the merging file for having thread needs When being put in storage, this thread can select upload server that will merge file warehousing.Preferably, each thread will select upload server The merging file of oneself is uploaded to HDFS, and in additional compression storage to the subregion in the libraries HIVE.It can be according to merging file need The current time to be put in storage, to select corresponding upload server, such as：16：25 have a file that merges to need to be put in storage, then It selects to number the upload server for 5；Can also have and arrange upload server in order, successively the selection past, such as： There is merging file A to be put in storage, selection is arranged to the 1st upload server, there is merging file B to be put in storage, selection is arranged to the 2nd upload Server ... selects in turn successively.

In actual database schema, the function of thread creation module can be completed by the host process of merging server, The function of file creation module, file combination module and server se-lection module can be completed by each thread oneself.

In another embodiment of the present invention, the improvement based on above-described embodiment, as shown in figure 5, file creation module, For when the thread does not have the corresponding merging file, obtaining the value for presetting label；And

When the value of the default label got is to preset unlocked definite value, the value of the default label is updated to pre- If after lock value, corresponding merging file is created according to current file value；After the completion of the merging document creation, work as described in update Preceding document value and by the value of the default label be updated to it is described preset unlocked definite value；And

It when the value of the default label got is default lock value, waits for after presetting creation time, reacquires The value of default label.

Preferably, as shown in figure 5, server se-lection module 40 includes：

Acquisition submodule 41, for when there is the merging file to need storage, obtaining the merging text for needing to be put in storage The filename of part；The entitled numerical value of file；

Complementation submodule 42, for the filename to predetermined server quantity complementation, to be obtained complementation result；

Submodule 43 is selected, for selecting number to enter the merging file for the upload server of the complementation result Library.

In another embodiment of the present invention, the improvement based on any of the above-described embodiment, file combination module, for working as When the capacity of the merging file of the thread is less than default maximum capacity, according to current time, handled from acquisition under corresponding catalogue Data are incorporated into the corresponding merging file of the thread afterwards；And when the capacity for the merging file for having the thread is not less than institute When stating default maximum capacity, the merging file needs to be put in storage.

Preferably, file combination module is used for when the capacity of the merging file of the thread is less than default maximum capacity, According to current time, it is specific to be incorporated into the corresponding process for merging file of the thread from data after acquisition processing under corresponding catalogue For：As soon as often data after processing are incorporated into the merging file, data are deleted under corresponding catalogue after this is handled.

Specifically, per thread can incite somebody to action after in data after a rule acquisition is handled, often obtaining one, being merged It is deleted under corresponding catalogue, is prevented other threads from repeating to call, merge, is done the duplication of labour.

Preferably, file combination module, for when the capacity for the merging file for having the thread is less than the default maximum Capacity and when data after the processing have not been obtained, is completed exclusive document according to the server under the corresponding catalogue, executes Corresponding operation.

Preferably, file combination module, the line number for exclusive document to be completed when the server under the corresponding catalogue When equal to default line number, the merging file needs to be put in storage；And

When the line number that exclusive document is completed in the server under the corresponding catalogue is less than default line number, described thread etc. After the default merging time, the corresponding merging file of the thread is incorporated into from data after acquisition processing under corresponding catalogue again.

It should be noted that the specific implementation in the specific implementation process and above method embodiment of each system embodiment Cheng Xiangtong is not described in detail herein.

It should be noted that above-described embodiment can be freely combined as needed.The above is only the preferred of the present invention Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims

1. a kind of data storage method based on clock timer and Duo Tai upload servers, which is characterized in that including：

2. the data storage method based on clock timer and Duo Tai upload servers, feature exist as described in claim 1 In the step S4 includes：

Step S41 obtains the filename for the merging file for needing to be put in storage when there is the merging file to need storage；It is described The entitled numerical value of file；

3. the data storage method based on clock timer and Duo Tai upload servers, feature exist as described in claim 1 In the step S2 is specifically included：

Step S22 is when it is to preset unlocked definite value to have the value of the default label that thread gets, by the default label After value is updated to default lock value, corresponding merging file is created according to current file value；After the completion of the merging document creation, It updates the current file value and the value of the default label is updated to the default unlocked definite value；

Step S23 is waited for when it is default lock value to have the value of the default label that thread gets after presetting creation time, Execute step S21.

4. the data storage method based on clock timer and Duo Tai upload servers, feature exist as described in claim 1 In the step S3 includes：

For step S31 according to current time, the capacity for merging file is less than the thread of default maximum capacity under corresponding catalogue Data are incorporated into the corresponding merging file after acquisition processing；

Step S32 is when there is the capacity of the merging file of the thread to be not less than the default maximum capacity, the merging file It needs to be put in storage.

5. the data storage method based on clock timer and Duo Tai upload servers, feature exist as claimed in claim 4 In the step S3 further includes：

Step S33 is when the capacity for the merging file for having the thread is less than the default maximum capacity and the place has not been obtained After reason when data, exclusive document is completed according to the server under the corresponding catalogue, executes corresponding operation.

6. the data storage method based on clock timer and Duo Tai upload servers, feature exist as claimed in claim 5 In the step S33 is specifically included：

Step S331 is when the line number that exclusive document is completed in the server under the corresponding catalogue is equal to default line number, the conjunction And file needs to be put in storage；

Step S332 is when the line number that exclusive document is completed in the server under the corresponding catalogue is less than default line number, the line After journey waits for the default merging time, step S31 is executed again.

7. a kind of data warehousing system based on clock timer and Duo Tai upload servers, which is characterized in that including：By merging The server cluster of server and several upload servers composition；

The merging server includes：

Thread creation module, for according to clock timer, timing to create the thread for presetting number of threads；

File creation module, for creating the corresponding merging file of each thread；

File combination module, for according to current time, data dispersion after the processing under corresponding catalogue to be incorporated into each conjunction And file；

Server se-lection module, for when there is the merging file to need storage, selecting upload server by merging text Part is put in storage.

8. the data warehousing system based on clock timer and Duo Tai upload servers, feature exist as claimed in claim 7 In the server se-lection module includes：

Acquisition submodule, for when there is the merging file to need storage, obtaining the text for the merging file for needing to be put in storage Part name；The entitled numerical value of file；

Complementation submodule, for the filename to predetermined server quantity complementation, to be obtained complementation result；

Submodule is selected, for select to number the upload server for the complementation result by the merging file warehousing.

9. the data warehousing system based on clock timer and Duo Tai upload servers, feature exist as claimed in claim 7 In：

The file creation module, for when the thread does not have the corresponding merging file, obtaining the value for presetting label；With And

When the value of the default label got is to preset unlocked definite value, the value of the default label is updated to default lock After definite value, corresponding merging file is created according to current file value；It, ought be above described in update after the completion of the merging document creation Part value and by the value of the default label be updated to it is described preset unlocked definite value；And

It when the value of the default label got is default lock value, waits for after presetting creation time, reacquires default The value of label.

10. the data warehousing system based on clock timer and Duo Tai upload servers, feature exist as claimed in claim 7 In：

The file combination module, for when the capacity of the merging file of the thread is less than default maximum capacity, according to working as The preceding time is incorporated into the corresponding merging file of the thread from data after acquisition processing under corresponding catalogue；And

When have the thread merging file capacity be not less than the default maximum capacity when, the merging file need into Library.

11. the data warehousing system based on clock timer and Duo Tai upload servers as claimed in claim 10, feature It is：

The file combination module, for when have the thread merging file capacity be less than the default maximum capacity and When data after the processing have not been obtained, exclusive document is completed according to the server under the corresponding catalogue, is executed corresponding Operation.

12. the data warehousing system based on clock timer and Duo Tai upload servers as claimed in claim 11, feature It is：

The file combination module, the line number for exclusive document to be completed when the server under the corresponding catalogue are equal to default When line number, the merging file needs to be put in storage；And

When the line number that exclusive document is completed in the server under the corresponding catalogue is less than default line number, the thread waits for pre- If after merging the time, being incorporated into the corresponding merging file of the thread from data after acquisition processing under corresponding catalogue again.