CN104462484A - Data processing method, data processor and system - Google Patents

Data processing method, data processor and system Download PDF

Info

Publication number
CN104462484A
CN104462484A CN201410797613.2A CN201410797613A CN104462484A CN 104462484 A CN104462484 A CN 104462484A CN 201410797613 A CN201410797613 A CN 201410797613A CN 104462484 A CN104462484 A CN 104462484A
Authority
CN
China
Prior art keywords
data
database cluster
source database
write
target database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410797613.2A
Other languages
Chinese (zh)
Other versions
CN104462484B (en
Inventor
杨艳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201410797613.2A priority Critical patent/CN104462484B/en
Publication of CN104462484A publication Critical patent/CN104462484A/en
Application granted granted Critical
Publication of CN104462484B publication Critical patent/CN104462484B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data processing method, a data processor and a system. The method includes the steps that write operations of a plurality of source database clusters are monitored respectively; when it is monitored that a write operation exist in any source database cluster, the written data of the source database cluster through the current write operation are obtained; the obtained data are written in a target database cluster, wherein the storage capacity of the target database cluster is larger than or equal to the sum of the storage capacity of the multiple source database clusters; the data in the target database cluster are processed correspondingly and respectively according to data operation commands in the multiple source database clusters. By means of the data processing method, different data in the multiple source database clusters can be written in the same target database cluster, and thus the data are correspondingly and uniformly processed in the same target database cluster.

Description

Data processing method, data processor and system
Technical field
The present invention relates to data processing field, particularly relate to a kind of data processing method, data processor and system.
Background technology
For the storage problem of data large in solution business, usually the data of same business are dispensed to different source database clusters and carry out operational processes.Such as, the data capacity of a certain business has 10T, is to coordinate the use of business, the data of business is put in respectively in ten different source database clusters, puts the data of 1T in each source database cluster.
When needing to carry out the statistical operations such as alternate analysis to each data of this business, need the operation performing alternate analysis according to the statistical operation data extracted in not genbank database cluster one by one, cause the huge complexity of operational ton, and the accuracy rate of data statistics reduces.Even, for partial service, because data volume is excessive, possibly cannot utilize to extract one by one and carry out statistical study operation.
To sum up, in prior art, only can disperse to carry out the process operations such as statistical study for the data being scattered in not genbank database cluster, the accuracy rate causing the process for data to operate reduces, and for the business that partial data amount is larger, even cannot carry out process operation, and these problems can cause the operational efficiency that cannot promote business further, prevention business further develops.
Summary of the invention
In view of the above problems, embodiments of the invention propose a kind of data processing method, data processor and system, to overcome the problems referred to above or to solve the problem at least in part.
According to one aspect of the present invention, provide a kind of data processing method, for carrying out overall treatment to the data in separate multiple source database clusters, described method comprises: the write operation of monitoring described multiple source database cluster respectively; Monitor when there is write operation in arbitrary source database cluster, obtain the data of this this write operation of source database cluster write; By the data write target database cluster obtained, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster; And, according to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
Alternatively, described step of monitoring the write operation of described multiple source database cluster respectively comprises further: for source database cluster described in each: the oplog information obtaining this source database cluster every the schedule time, and described oplog information is the response daily record of this source database cluster; This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And, if the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
Alternatively, the described step by the data of acquisition write target database cluster comprises further: the identification information of deleting the data of described acquisition; The data of described acquisition are write described target database cluster; And, redistribute identification information according to the data that the identification information of the data stored in described target database cluster is described acquisition.
Alternatively, described identification information is the underscore identification information ID of data.
Alternatively, described method comprises further: after the data of acquisition are write described target database cluster, judges that whether this write operation of described target database cluster is successful; And, if not, repeat write operation, until the data of described acquisition are successfully write described target database cluster.
Alternatively, described operational order comprise following one of at least: data statistics instruction, data analysis instructions, data computations, data delete instruction.
Alternatively, described source database cluster and described target database cluster are all mongodb cluster.
According to another aspect of the present invention, additionally provide a kind of data processor, for carrying out overall treatment to the data in separate multiple source database clusters, described data processor comprises: monitoring modular, is suitable for the write operation of monitoring described multiple source database cluster respectively; Acquisition module, is suitable for monitoring when there is write operation in arbitrary source database cluster, obtains the data of this this write operation of source database cluster write; Writing module, be suitable for the data write target database cluster that will get, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster; Processing module, is suitable for according to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
Alternatively, described monitoring modular is also suitable for: for source database cluster described in each: the oplog information obtaining this source database cluster every the schedule time, and described oplog information is the response daily record of this source database cluster; This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And, if the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
Alternatively, described data processor also comprises: determination module, after being suitable for that the data of acquisition are write described target database cluster, judges that whether this write operation of described target database cluster is successful; And, if not, then notify that writing module repeats write operation, until the data of described acquisition are successfully write described target database cluster.
Alternatively, said write module is also suitable for: the identification information of deleting the data of described acquisition; The data of described acquisition are write described target database cluster; And, redistribute identification information according to the data that the identification information of the data stored in described target database cluster is described acquisition.
Alternatively, described identification information is the underscore identification information ID of data.
Alternatively, described data processor is for obtaining based on object oriented program language java exploitation.
According to another aspect of the present invention, additionally provide a kind of data handling system, comprise a target database cluster, separate multiplely provide the source database cluster of data for described target database cluster, and above-mentioned data processor, wherein, described multiple source database cluster, is suitable for storing different data; And, more will write database cluster by new data; And described target database cluster, be suitable for receiving from the data of described data processor, and provide memory capacity to be not less than the storage space of the memory capacity sum of described multiple source database cluster for described data.
According to the data processing method of the embodiment of the present invention, can be used in carrying out overall treatment to the data in separate multiple source database clusters.In the data processing method of the embodiment of the present invention, monitor the write operation of multiple source database cluster respectively, monitor there is write operation in arbitrary source database cluster time, obtain the data of this this write operation of source database cluster write, by the data write target database cluster obtained, and according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively, solve in prior art the problem that the data being scattered in not genbank database cluster cannot carry out unifying process operation.The data being scattered in not genbank database cluster can be write same target database cluster by the data processing method according to the embodiment of the present invention, therefore, when needing to carry out the process such as statistical study operation according to operational order to the data of each source database cluster, can directly the data in target database cluster be processed operated according to carry out analysiss etc. for the operational order of data in multiple source database cluster respectively.When in the face of googol according to amount time, owing to carrying out statistical operation without the need to the data of each source database cluster being extracted one by one, only need the data analysis operation in single target database cluster, significantly can simplify statistical study operation, promote the accuracy of the process operating results such as statistical study.In addition, when needing to perform the data of each source database cluster the process operations such as complicated statistical study, (data such as extracting different piece in not genbank database cluster carry out cross validation, and then Statistical Comparison result) time, directly can be analyzed in target database cluster, solve the problem that the data being in source database cluster directly cannot perform the process operations such as sophisticated statistical.
To sum up, according to the data processing method of the embodiment of the present invention, different pieces of information in multiple source database cluster can be write same target database cluster, and then according to the operational order respectively for data in multiple source database cluster, the respective handling operations such as statistical study are carried out to data in same target database cluster, after statistical study operation is performed to data, according to the result operated the statistical study of data, business can be adjusted, especially the data that statistics is comparatively bad are adjusted, ensure that stability and the security of service operation, the operational efficiency of the business that further lifting data are corresponding, and promote the development of business.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
According to hereafter by reference to the accompanying drawings to the detailed description of the specific embodiment of the invention, those skilled in the art will understand above-mentioned and other objects, advantage and feature of the present invention more.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows the processing flow chart of data processing method according to an embodiment of the invention;
Fig. 2 shows the processing flow chart of monitoring source database cluster A write operation according to an embodiment of the invention;
Fig. 3 A shows the processing flow chart of the data processing method according to another embodiment of the present invention;
Fig. 3 B shows the processing flow chart of monitoring the write operation of each source database cluster according to an embodiment of the invention;
Fig. 4 shows the structural representation of data processor according to an embodiment of the invention;
Fig. 5 shows the structural representation of the data processor according to another embodiment of the present invention; And
Fig. 6 shows the structural representation of data handling system according to an embodiment of the invention.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
For solving the problems of the technologies described above, embodiments provide a kind of data processing method, for carrying out overall treatment to the data in separate multiple source database clusters.Fig. 1 shows the processing flow chart of data processing method according to an embodiment of the invention.Please refer to Fig. 1, this flow process at least comprises the steps:
Step S102: the write operation of monitoring multiple source database cluster respectively;
Step S104: monitor when there is write operation in arbitrary source database cluster, obtains the data of this this write operation of source database cluster write;
Step S106: by the data write target database cluster obtained, wherein, the memory capacity of target database cluster is not less than the memory capacity sum of multiple source database cluster;
Step S108: according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively.
According to the data processing method of the embodiment of the present invention, can be used in carrying out overall treatment to the data in separate multiple source database clusters.In the data processing method of the embodiment of the present invention, monitor the write operation of multiple source database cluster respectively, monitor there is write operation in arbitrary source database cluster time, obtain the data of this this write operation of source database cluster write, by the data write target database cluster obtained, and according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively, solve in prior art the problem that the data being scattered in not genbank database cluster cannot carry out unifying process operation.The data being scattered in not genbank database cluster can be write same target database cluster by the data processing method according to the embodiment of the present invention, therefore, when needing to carry out the process such as statistical study operation according to operational order to the data of each source database cluster, can directly the data in target database cluster be processed operated according to carry out analysiss etc. for the operational order of data in multiple source database cluster respectively.When in the face of googol according to amount time, owing to carrying out statistical operation without the need to the data of each source database cluster being extracted one by one, only need the data analysis operation in single target database cluster, significantly can simplify statistical study operation, promote the accuracy of the process operating results such as statistical study.In addition, when needing to perform the data of each source database cluster the process operations such as complicated statistical study, (data such as extracting different piece in not genbank database cluster carry out cross validation, and then Statistical Comparison result) time, directly can be analyzed in target database cluster, solve the problem that the data being in source database cluster directly cannot perform the process operations such as sophisticated statistical.
To sum up, according to the data processing method of the embodiment of the present invention, different pieces of information in multiple source database cluster can be write same target database cluster, and then according to the operational order respectively for data in multiple source database cluster, the respective handling operations such as statistical study are carried out to data in same target database cluster, after statistical study operation is performed to data, according to the result operated the statistical study of data, business can be adjusted, especially the data that statistics is comparatively bad are adjusted, ensure that stability and the security of service operation, the operational efficiency of the business that further lifting data are corresponding, and promote the development of business.
Mention above, in the embodiment of the present invention, monitor the write operation (step S102) of multiple source database cluster respectively.In the embodiment of the present invention, for monitoring the write operation of each source database cluster, obtain the response daily record (as oplog information) of each source database cluster of monitoring every the schedule time, and determine whether to monitor the write operation of daily record according to the response daily record got.Such as, Fig. 2 shows the processing flow chart of monitoring source database cluster A write operation according to an embodiment of the invention.Please refer to Fig. 2, this flow process at least comprises the steps:
Step S202: the oplog information getting source database cluster A, is called an oplog information in this example;
Step S204: after be separated by 30 seconds (i.e. Preset Time), again obtains the oplog information of source database cluster A, is called the 2nd oplog information in this example;
Step S206: contrast the one oplog information and the 2nd oplog information, to obtain comparison result, when comparison result shows an oplog information and the 2nd oplog information is inconsistent, then determine that source database cluster A exists write operation.
When determining that arbitrary source database cluster exists write operation, the embodiment of the present invention obtains the data of write operation write, and data are write target database cluster.When data write source database cluster, cluster is the identification information that this data genaration is corresponding with it, such as underscore identification information ID (Identification is hereinafter referred to as ID), and wherein, underscore ID also can be called major key.When these data are re-write target database cluster, if retain the identification information of these data in source database cluster, may occur that the identification information of these data is conflicted with the identification information of the data of other source database clusters write in target database cluster.For the underscore ID of data, when these data are re-write target database cluster, if retain the underscore ID of these data in source database cluster, may occur that the underscore ID of the data of other source database clusters write in the underscore ID of these data and target database cluster repeats.When the underscore ID of these data and the underscore ID of other data repeats, these data cannot be written into target database cluster.For avoiding data cannot be written into target database cluster, in this example, when data being write target database cluster, deleting the underscore ID in the data got, afterwards, underscore ID being deleted successful data write target database cluster.After data being write target database cluster, by target database cluster according to the data distribution underscore ID of the underscore ID of the data stored again for writing, the underscore ID of each data in target database cluster is avoided to repeat.
In addition, data are being write in the process of target database cluster, if data-transmission interruptions, data may caused not write target database cluster completely, and/or there is the problem of serious loss in the data of write target database cluster.For ensureing the success ratio of write operation, after data are write target database cluster by the embodiment of the present invention, judge that whether this write operation is successful.Particularly, judge this write operation whether success, comprise and judge whether the data write exist disappearance, in ablation process, whether there is the situation etc. that write operation interrupts, if the situation of write operation existence interruption, can also breakpoint transmission be determined whether.When being no to the judged result of this write operation, the embodiment of the present invention repeats write operation, until the data got successfully are write target database cluster.Therefore, can by judging whether successful for the operation of the data write target database cluster got according to the embodiment of the present invention, promote the accuracy rate of write operation, avoid the data writing target database cluster to there is the problems such as data are imperfect, shortage of data, follow-up accuracy rate data in target database cluster being performed to the result of the process operations such as statistical study can be ensured further.
As shown in the step S108 in Fig. 1, in this example, after the data write target database cluster in multiple source database cluster, according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively.Such as, in target database cluster A, delete data X according to the data delete instruction for data X, and according to the input and output computations for inputoutput data Y, inputoutput data Y is calculated.It should be noted that, the operational order mentioned in above-described embodiment is only example, in this example, for the operational order of data in multiple source database except can being the data computationses such as data delete instruction mentioned above, input and output computations, can also be that data statistics instruction, data analysis instructions etc. arbitrarily can to the instructions of the operation that data perform, the embodiment of the present invention be limited this.
In addition, the data processing method that the embodiment of the present invention provides is applicable to Mongodb cluster, and the source database cluster namely in the embodiment of the present invention and target database cluster can be all Mongodb clusters.Mongodb is a kind of non-relational database that current network technology (Internet Technology, hereinafter referred to as IT) industry is commonly used.In Mongodb, all data can realize the basic operational functions (as increase operation, read operation, renewal rewards theory, deletion action etc.) of database or persistent layer in software systems by the mode directly called, and in Mongodb, each record is all a file object, achieves Target-oriented thought well.The data processing method embodiment of the present invention provided is applied in Mongodb cluster, can performing on the basis of efficient operation utilizing Mongodb cluster, improving the convenience of the data in Mongodb cluster being carried out to process operation further to data.
Embodiment one
For setting forth clearly clear by the data processing method that each preferred embodiment provides above, an embodiment is now provided to be introduced data processing method provided by the invention.In the present embodiment, existence 4 source database clusters A, B, C, D are set, and a target database cluster X.Fig. 3 A shows the processing flow chart of the data processing method according to another embodiment of the present invention.Please refer to Fig. 3 A, the data processing method of the embodiment of the present invention at least comprises the steps:
Step S302: the write operation of monitoring each source database cluster respectively.
Particularly, Fig. 3 B shows the processing flow chart of monitoring the write operation of each source database cluster according to an embodiment of the invention.Please refer to Fig. 3 B, the flow process of the monitoring of the write operation of each source database cluster is comprised the steps:
Step S3021: obtain source database cluster A, B, C, D oplog information separately, be respectively oplogA1, oplogB1, oplogC1 and oplogD1.
Step S3022: in this example, Preset Time is 3 seconds, then, after 3 seconds, again obtain source database cluster A, B, C, D oplog information separately, be respectively oplogA2, oplogB2, oplogC2 and oplogD2.It should be noted that, the Preset Time in the present invention can carry out different settings according to the difference of the effect of data in source database cluster and/or type and/or form, and the embodiment of the present invention is not limited this.
Step S3023: contrast oplogA1 and oplogA2, oplogB1 and oplogB2, oplogC1 and oplogC2, oplogD1 and oplogD2 respectively, obtaining comparing result is that oplogA1 and oplogA2 is consistent, oplogC1 and oplogC2 is consistent, and oplogB1 and oplogB2 is inconsistent, oplogD1 and oplogD2 is inconsistent, then determine that monitoring source database cluster B, D exists write operation.
Step S304: judge whether the write operation monitoring source database cluster, if so, performs step S306, if not, performs step S302.
Step S306: there is write operation when monitoring source database cluster, obtains the data of the source database cluster write that there is write operation.
Particularly, in this example, the data getting source database cluster B write are data-B1.2, and the data that source database cluster D writes are data-D2.1.Wherein, B1.2 and D2.1 is respectively the underscore ID of data in respective source database cluster of write.
Step S308: delete the underscore ID in the data got.
Step S310: the data after deleting are write target database cluster X.
Step S312: redistribute underscore ID according to the data that the data stored in target database cluster X are this write.
Such as, it is data-X2.1.2 and data-X2.1.3 that the data being this write according to the data stored in target database cluster X distribute underscore ID.
Step S314: judge that whether this write operation is successful.If so, perform step S316, if not, perform step S310.
Step S316: according to carrying out respective handling for the operational order of data in source database cluster A, B, C, D to the data in target database cluster X respectively.
It should be noted that, the execution time of step S316 can be in the random time in this flow process, namely the random time in this flow process, if receive any one or more respectively for the operational order of data in source database cluster, then can process operation according to this operational order accordingly to the data execution in the cluster of current target data storehouse.Succinct for introducing in this example, step S316 is placed in flow process final step, can not restriction be caused to the execution time of this step in practical application.
Based on the data processing method that each embodiment above provides, based on same inventive concept, embodiments provide a kind of data processor, for carrying out overall treatment to the data in separate multiple source database clusters.Fig. 4 shows the structural representation of data processor according to an embodiment of the invention.See Fig. 4, the data processor of the embodiment of the present invention at least comprises monitoring modular 410, acquisition module 420, writing module 430 and processing module 440.
Now introduce the annexation between each device of the data processor of the embodiment of the present invention or the function of composition and each several part:
Monitoring modular 410: be suitable for the write operation of monitoring multiple source database cluster respectively;
Acquisition module 420: be coupled with monitoring modular 410, is suitable for monitoring when there is write operation in arbitrary source database cluster, obtains the data of this this write operation of source database cluster write;
Writing module 430: be coupled with acquisition module 420, be suitable for the data write target database cluster that will obtain, wherein, the memory capacity of target database cluster is not less than the memory capacity sum of multiple source database cluster;
Processing module 440: be coupled with writing module 430, is suitable for according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively.
In another embodiment, monitoring modular 420 is also suitable for:
For each source database cluster:
Obtain the oplog information of this source database cluster every the schedule time, oplog information is the response daily record of this source database cluster;
This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And
If the oplog information that comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
Fig. 5 shows the structural representation of the data processor according to another embodiment of the present invention.Please also refer to Fig. 4 and Fig. 5, compare the data handling system shown in Fig. 4, the data handling system (shown in Fig. 5) of the embodiment of the present invention comprises further:
Determination module 450: be coupled with writing module 430, is suitable for writing module 430 by after the data of acquisition write target database cluster, judges that whether this write operation of target database cluster is successful; And
If not, then notify to trigger writing module 430 and repeat write operation, until the data of acquisition are successfully write target database cluster.
In another embodiment, writing module 430 is also suitable for:
Delete the identification information of the data obtained;
By the data write target database cluster obtained; And
Be that the data got redistribute identification information according to the identification information of the data stored in target database cluster.
In another embodiment, identification information is the underscore identification information ID of data.
In another embodiment, data processor is for obtaining based on object oriented program language java exploitation.
The data processing method provided based on each embodiment above and data processor, based on same inventive concept, embodiments provide a kind of data handling system.Fig. 6 shows the structural representation of data handling system according to an embodiment of the invention.See Fig. 6, the data handling system of the embodiment of the present invention at least comprises a target database cluster 610, separate multiplely provides the source database cluster 620 of data and the data processor 630 introduced above for target database cluster.It should be noted that, the number of the source database cluster shown in Fig. 6 is 3, is only example, and in practical application, in native system, the number of source database cluster can be arbitrary integer, and the embodiment of the present invention is not limited this.
Now introduce the annexation between each device of the data handling system of the embodiment of the present invention or the function of composition and each several part:
Multiple source database cluster 620: be suitable for storing different data; And, more will write database cluster by new data;
Data processor 630: be suitable for the write operation of monitoring each cluster in multiple source database cluster respectively; Monitor when there is write operation in arbitrary source database cluster, obtain the data of this this write operation of source database cluster write; By the data write target database cluster got; According to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively;
Target database cluster 610: be suitable for receiving from the data of data processor, and provide memory capacity to be not less than the storage space of the memory capacity sum of multiple source database cluster for data.
Note that data processor 630 can the data processing method of application of aforementioned, for simplicity, separately do not repeat herein.Those skilled in the art, on the basis of reading embodiment above, should understand the detail that said method implemented by data processor 630, and it also should be covered by scope of the present invention.
According to the combination of any one embodiment above-mentioned or multiple embodiment, the embodiment of the present invention can reach following beneficial effect:
According to the data processing method of the embodiment of the present invention, can be used in carrying out overall treatment to the data in separate multiple source database clusters.In the data processing method of the embodiment of the present invention, monitor the write operation of multiple source database cluster respectively, monitor there is write operation in arbitrary source database cluster time, obtain the data of this this write operation of source database cluster write, by the data write target database cluster obtained, and according to carrying out respective handling for the operational order of data in multiple source database cluster to the data in target database cluster respectively, solve in prior art the problem that the data being scattered in not genbank database cluster cannot carry out unifying process operation.The data being scattered in not genbank database cluster can be write same target database cluster by the data processing method according to the embodiment of the present invention, therefore, when needing to carry out the process such as statistical study operation according to operational order to the data of each source database cluster, can directly the data in target database cluster be processed operated according to carry out analysiss etc. for the operational order of data in multiple source database cluster respectively.When in the face of googol according to amount time, owing to carrying out statistical operation without the need to the data of each source database cluster being extracted one by one, only need the data analysis operation in single target database cluster, significantly can simplify statistical study operation, promote the accuracy of the process operating results such as statistical study.In addition, when needing to perform the data of each source database cluster the process operations such as complicated statistical study, (data such as extracting different piece in not genbank database cluster carry out cross validation, and then Statistical Comparison result) time, directly can be analyzed in target database cluster, solve the problem that the data being in source database cluster directly cannot perform the process operations such as sophisticated statistical.
To sum up, according to the data processing method of the embodiment of the present invention, different pieces of information in multiple source database cluster can be write same target database cluster, and then according to the operational order respectively for data in multiple source database cluster, the respective handling operations such as statistical study are carried out to data in same target database cluster, after statistical study operation is performed to data, according to the result operated the statistical study of data, business can be adjusted, especially the data that statistics is comparatively bad are adjusted, ensure that stability and the security of service operation, the operational efficiency of the business that further lifting data are corresponding, and promote the development of business.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in detail in the claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the device of the embodiment of the present invention or equipment.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.
So far, those skilled in the art will recognize that, although multiple exemplary embodiment of the present invention is illustrate and described herein detailed, but, without departing from the spirit and scope of the present invention, still can directly determine or derive other modification many or amendment of meeting the principle of the invention according to content disclosed by the invention.Therefore, scope of the present invention should be understood and regard as and cover all these other modification or amendments.
The invention also discloses A1. data processing method, for carrying out overall treatment to the data in separate multiple source database clusters, described method comprises:
Monitor the write operation of described multiple source database cluster respectively;
Monitor when there is write operation in arbitrary source database cluster, obtain the data of this this write operation of source database cluster write;
By the data write target database cluster obtained, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster; And
According to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
A2. the method according to A1, wherein, described step of monitoring the write operation of described multiple source database cluster respectively comprises further:
For source database cluster described in each:
Obtain the oplog information of this source database cluster every the schedule time, wherein, described oplog information is the response daily record of this source database cluster;
This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And
If the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
A3. the method according to A1 or A2, wherein, the described step by the data of acquisition write target database cluster comprises further:
Delete the identification information of the data of described acquisition;
The data of described acquisition are write described target database cluster; And
Identification information is redistributed according to the data that the identification information of the data stored in described target database cluster is described acquisition.
A4. the method according to A3, wherein, described identification information is the underscore identification information ID of data.
A5. the method according to any one of A1 to A4, comprises further:
After the data of acquisition are write described target database cluster, judge that whether this write operation of described target database cluster is successful; And
If not, repeat write operation, until the data of described acquisition are successfully write described target database cluster.
A6. the method according to any one of A1-A5, wherein, described operational order comprise following one of at least:
Data statistics instruction, data analysis instructions, data computations, data delete instruction.
A7. the method according to any one of claim A1-A6, wherein, described source database cluster and described target database cluster are all mongodb cluster.
The invention also discloses B8. data processor, for carrying out overall treatment to the data in separate multiple source database clusters, described data processor, comprising:
Monitoring modular, is suitable for the write operation of monitoring described multiple source database cluster respectively;
Acquisition module, is suitable for monitoring when there is write operation in arbitrary source database cluster, obtains the data of this this write operation of source database cluster write;
Writing module, be suitable for the data write target database cluster that will get, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster;
Processing module, is suitable for according to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
B9. the data processor according to B8, wherein, described monitoring modular is also suitable for:
For source database cluster described in each:
Obtain the oplog information of this source database cluster every the schedule time, wherein, described oplog information is the response daily record of this source database cluster;
This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And
If the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
B10. the data processor according to B8 or B9, wherein, also comprises:
Determination module, after being suitable for that the data of acquisition are write described target database cluster, judges that whether this write operation of described target database cluster is successful; And
If not, then notify that writing module repeats write operation, until the data of described acquisition are successfully write described target database cluster.
B11. the data processor according to any one of B8-B10, wherein, said write module is also suitable for:
Delete the identification information of the data of described acquisition;
The data of described acquisition are write described target database cluster; And
Identification information is redistributed according to the data that the identification information of the data stored in described target database cluster is described acquisition.
B12. the data processor according to B11, wherein, described identification information is the underscore identification information ID of data.
B13. the data processor according to any one of B8-B12, wherein, described data processor is for obtaining based on object oriented program language java exploitation.
The invention also discloses C14. data handling system, comprise a target database cluster, separate multiplely provide the source database cluster of data and the data processor described in any one of B8-B13 for described target database cluster, wherein,
Described multiple source database cluster, is suitable for storing different data; And more will write database cluster by new data; And
Described target database cluster, is suitable for receiving from the data of described data processor, and provides memory capacity to be not less than the storage space of the memory capacity sum of described multiple source database cluster for described data.

Claims (10)

1. a data processing method, for carrying out overall treatment to the data in separate multiple source database clusters, described method comprises:
Monitor the write operation of described multiple source database cluster respectively;
Monitor when there is write operation in arbitrary source database cluster, obtain the data of this this write operation of source database cluster write;
By the data write target database cluster obtained, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster; And
According to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
2. method according to claim 1, wherein, described step of monitoring the write operation of described multiple source database cluster respectively comprises further:
For source database cluster described in each:
Obtain the oplog information of this source database cluster every the schedule time, wherein, described oplog information is the response daily record of this source database cluster;
This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And
If the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
3. method according to claim 1 and 2, wherein, the described step by the data of acquisition write target database cluster comprises further:
Delete the identification information of the data of described acquisition;
The data of described acquisition are write described target database cluster; And
Identification information is redistributed according to the data that the identification information of the data stored in described target database cluster is described acquisition.
4. method according to claim 3, wherein, described identification information is the underscore identification information ID of data.
5. the method according to any one of claim 1-4, comprises further:
After the data of acquisition are write described target database cluster, judge that whether this write operation of described target database cluster is successful; And
If not, repeat write operation, until the data of described acquisition are successfully write described target database cluster.
6. the method according to any one of claim 1-5, wherein, described operational order comprise following one of at least:
Data statistics instruction, data analysis instructions, data computations, data delete instruction.
7. the method according to any one of claim 1-6, wherein, described source database cluster and described target database cluster are all mongodb cluster.
8. a data processor, for carrying out overall treatment to the data in separate multiple source database clusters, described data processor, comprising:
Monitoring modular, is suitable for the write operation of monitoring described multiple source database cluster respectively;
Acquisition module, is suitable for monitoring when there is write operation in arbitrary source database cluster, obtains the data of this this write operation of source database cluster write;
Writing module, be suitable for the data write target database cluster that will get, wherein, the memory capacity of described target database cluster is not less than the memory capacity sum of described multiple source database cluster;
Processing module, is suitable for according to carrying out respective handling for the operational order of data in described multiple source database cluster to the data in described target database cluster respectively.
9. data processor according to claim 8, wherein, described monitoring modular is also suitable for:
For source database cluster described in each:
Obtain the oplog information of this source database cluster every the schedule time, wherein, described oplog information is the response daily record of this source database cluster;
This oplog information obtained of comparison and the last oplog information obtained are to obtain comparison result; And
If the oplog information that described comparison result shows twice acquisition is inconsistent, then monitors this source database cluster and there is write operation.
10. a data handling system, comprises a target database cluster, separate multiplely provides the source database cluster of data and the data processor described in claim 8 or 9 for described target database cluster, wherein,
Described multiple source database cluster, is suitable for storing different data; And more will write database cluster by new data; And
Described target database cluster, is suitable for receiving from the data of described data processor, and provides memory capacity to be not less than the storage space of the memory capacity sum of described multiple source database cluster for described data.
CN201410797613.2A 2014-12-18 2014-12-18 Data processing method, data processor and system Expired - Fee Related CN104462484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410797613.2A CN104462484B (en) 2014-12-18 2014-12-18 Data processing method, data processor and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410797613.2A CN104462484B (en) 2014-12-18 2014-12-18 Data processing method, data processor and system

Publications (2)

Publication Number Publication Date
CN104462484A true CN104462484A (en) 2015-03-25
CN104462484B CN104462484B (en) 2018-05-22

Family

ID=52908519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410797613.2A Expired - Fee Related CN104462484B (en) 2014-12-18 2014-12-18 Data processing method, data processor and system

Country Status (1)

Country Link
CN (1) CN104462484B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446296A (en) * 2016-11-28 2017-02-22 泰康保险集团股份有限公司 Method for processing trading messages and trading system
CN107169069A (en) * 2017-05-08 2017-09-15 山大地纬软件股份有限公司 Distributed hierarchical extracts many application processes and data pick-up applicator

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN102262662A (en) * 2011-07-22 2011-11-30 浪潮(北京)电子信息产业有限公司 System, device and method for realizing database data migration in heterogeneous platform
CN102917072A (en) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 Device, system and method for carrying out data migration between data server clusters
CN102982085A (en) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 System and method of data migration
US20130080386A1 (en) * 2011-09-23 2013-03-28 International Business Machines Corporation Database caching utilizing asynchronous log-based replication

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN102262662A (en) * 2011-07-22 2011-11-30 浪潮(北京)电子信息产业有限公司 System, device and method for realizing database data migration in heterogeneous platform
US20130080386A1 (en) * 2011-09-23 2013-03-28 International Business Machines Corporation Database caching utilizing asynchronous log-based replication
CN102917072A (en) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 Device, system and method for carrying out data migration between data server clusters
CN102982085A (en) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 System and method of data migration

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446296A (en) * 2016-11-28 2017-02-22 泰康保险集团股份有限公司 Method for processing trading messages and trading system
CN106446296B (en) * 2016-11-28 2019-11-15 泰康保险集团股份有限公司 For handling the method and transaction system of transaction message
CN107169069A (en) * 2017-05-08 2017-09-15 山大地纬软件股份有限公司 Distributed hierarchical extracts many application processes and data pick-up applicator
CN107169069B (en) * 2017-05-08 2020-01-07 山大地纬软件股份有限公司 Distributed hierarchical extraction multi-application method and data extraction applicator

Also Published As

Publication number Publication date
CN104462484B (en) 2018-05-22

Similar Documents

Publication Publication Date Title
US20200366463A1 (en) Apparatuses for Providing a Set of Cryptographically Protected, Filtered, and Sorted Transaction Data Records of a Link of a Blockchain
CN103885808A (en) Hotfix processing method and device
CN104239133A (en) Log processing method, device and server
CN103413075A (en) Method and device for protecting JAVA executable program through virtual machine
CN102981945B (en) A kind of unfailing performance supervisory system and method
CN103414762B (en) cloud backup method and device
US8495629B2 (en) Virtual machine relocation system and associated methods
CN105528454A (en) Log treatment method and distributed cluster computing device
CN103324713A (en) Data processing method and device in multistage server and data processing system
CN105550104A (en) Application program performance test method and device
CN111324591B (en) Block chain bifurcation detection method and related device
KR102045772B1 (en) Electronic system and method for detecting malicious code
CN105164642A (en) Operating system support for contracts
CN104462484A (en) Data processing method, data processor and system
CN114297630A (en) Malicious data detection method and device, storage medium and processor
US11635948B2 (en) Systems and methods for mapping software applications interdependencies
US8949859B2 (en) Event-driven application systems and methods
CN115495424A (en) Data processing method, electronic device and computer program product
CN110928941A (en) Data fragment extraction method and device
CN103677746A (en) Instruction recombining method and device
WO2018227942A1 (en) Method and system for executing task based on memory optimization
CN114675931A (en) Resource monitoring method and monitoring device for integrated platform instance
CN105487849A (en) Method and system for calling unknown export functions of DLL
CN114840418A (en) Fuzzy test method and device
CN109634636B (en) Application processing method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180522

Termination date: 20211218

CF01 Termination of patent right due to non-payment of annual fee