CN104750749B - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN104750749B CN104750749B CN201310751401.6A CN201310751401A CN104750749B CN 104750749 B CN104750749 B CN 104750749B CN 201310751401 A CN201310751401 A CN 201310751401A CN 104750749 B CN104750749 B CN 104750749B
- Authority
- CN
- China
- Prior art keywords
- data
- intermediate data
- identification
- node
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of data processing method and device, this method includes:Flow data is performed by one or more calculate nodes to the flow data received to handle;Using the result that the flow data is handled as intermediate data storage in the master data sheet and secondary tables of data of database;And when one or more of calculate nodes are restarted, it is that the calculate node loads intermediate data corresponding with the calculate node from the secondary tables of data according to the node identification of the calculate node, to continue executing with the flow data processing to subsequently received flow data based on the intermediate data.Using the technical scheme of the application, it is possible to increase distributed stream computing system inquires about the speed of pilot process data corresponding to each calculate node when starting, so as to improve the speed of data loading, and then lift the toggle speed of distributed stream computing system.
Description
Technical field
The application be related to a kind of data processing method in data processing field, more particularly to distributed stream computing system and
Device.
Background technology
Distributed stream computing device will preserve substantial amounts of pilot process and calculate data in the process of running, in usual internal memory,
This partial data for calculate final result data be it is essential, therefore, typically can be during operation by pilot process meter
Count according to being persisted in disk, in case causing to restart after device interrupt by a variety of causes.Calculated for distributed stream
Pilot process calculate data storage, traditional Relational DataBase is a kind of selection, and still, traditional Relational DataBase is not
It is adapted to storage mass data, after the data volume of storage reaches more than 100,000,000, most of traditional Relational DataBase is looked into
Asking performance all can substantially be deteriorated, and can not meet the requirement of application.Big data technical field had newly risen NoSQL in recent years(Non- pass
It is type database)Technology, its most important feature are just that by the quick search of mass data, therefore, work as data
When amount is very huge, the result of calculation that product is calculated using NoSQL database purchases distributed stream is suitably to select very much, mesh
The NoSQL databases of preceding main flow have HBase, Cassandra etc..
Generally there are three kinds of modes by NoSQL data base queryings data:(1)Key-Value mode is that is, complete by one
The unique key of office inquires a record.The efficiency of this inquiry mode is highest, about a few tens of milliseconds.(2)Range scans,
I.e. by key indexes, a starting position and an end position are specified, inquires about a plurality of record.This inquiry mode efficiency is same
Sample is very high, in Millisecond.(3)Full table scan, it is necessary to can just obtain desired record by all records of scan table.It is this to look into
Inquiry mode is less efficient, for cross hundred million data amount efficiency in hour level.
At present, distributed stream computing device is typically combined system of the composition for calculating in real time with NoSQL databases,
When the system needs to stop and restart for some reason in the process of running, distributed stream computing device sometimes for
Substantial amounts of pilot process is loaded from NoSQL databases and calculates data.
Fig. 1 is the structure chart of existing distributed stream computing system, as shown in figure 1, distributed stream computing system is by dividing
Cloth N number of calculate node 110-1 in a network ..., 110-i ..., 110-N, and NoSQL databases 120 form, each
It is all separate that the pilot process of individual calculate node 110, which calculates data, and data are not occured simultaneously between each node.When the system
During restarting, each calculate node 110 needs to load the part pilot process calculating data related to oneself.
However, real time data user mainly accesses what is stored in NoSQL databases 120 by way of Key-Value
Data, therefore, the data in database 120 are usually being identified with real time data user and related to business datum
Data as key(key)Preserve, and the key that the None- identified of calculate node 110 is related to business datum, therefore, just can not yet
By way of Key-Value or the modes of range scans loads the pilot process related to oneself and calculates data, can only pass through
The mode of full table scan, that is to say, that each calculate node 110, which will scan all data, could judge which data is
One's own and load, after the data volume of table crosses hundred million, full table scan will become very slowly, to influence real time computation system
Toggle speed, when serious system may be caused not start.
On the other hand, existing a solution is the scheme that data are calculated using delay loading pilot process, i.e. when one
After message flow enters distributed stream computing system, judge whether to find pilot process corresponding to the message flow in internal memory and calculate
Data, if can find, calculate data using pilot process and subsequently calculated.If can not find, judge in NoSQL numbers
Data are calculated according to pilot process corresponding to the message flow whether can be found in storehouse, if can find, the centre that this is found
Process calculates data and is loaded into internal memory, and calculates data using the pilot process and subsequently calculated.If can not find, really
The fixed message flow is a new stream in business, and pilot process corresponding to the message flow is added in internal memory and calculates data, and profit
Data are calculated with the pilot process subsequently to be calculated.
Pilot process need not be loaded by the way of above-mentioned delay loading, during startup from NoSQL databases immediately
Data are calculated, but real-time evaluation work can be carried out immediately.However, this mode is difficult to be applicable for some application scenarios,
For example, in the case where majority of traffic belongs to new stream in business, when a piece of news stream can not find in internal memory it is corresponding
When pilot process calculates data, it is necessary to searched into NoSQL databases once just can determine that the message flow is new stream again, when
When the message flow major part of some message source is new stream, stream calculation program will continually access NoSQL databases progress data and look into
Ask, produce substantial amounts of magnetic disc i/o, cause performance degradation.
Add in summary, it is necessary to propose that a kind of applicability is wider and can improve data when distributed stream computing system starts
Carry the scheme of speed.
The content of the invention
The main purpose of the application is to provide a kind of data processing method and device, to solve to divide existing for prior art
Start the problem of slow caused by data loading is slow when cloth stream calculation system is due to starting, wherein:
This application provides a kind of data processing method, including:By one or more calculate nodes to the stream that receives
Data perform flow data processing;Using the result that the flow data is handled as intermediate data storage database master data
In table and secondary tables of data, the intermediate data is stored in the main number by the key related to the Data Identification of the intermediate data
According in table, the intermediate data by the intermediate data corresponding to the related key of node identification of calculate node be stored in institute
State in secondary tables of data;And when one or more of calculate nodes are restarted, according to the node identification of the calculate node from
It is that the calculate node loads intermediate data corresponding with the calculate node in the secondary tables of data, with based on the mediant
Handled according to the flow data is continued executing with to subsequently received flow data.
Further aspect of the application provides a kind of data processing equipment, including:Processing module, for by one or
Multiple calculate nodes perform flow data processing to the flow data received;Memory module, for the place for handling the flow data
Manage result as intermediate data storage in the master data sheet and secondary tables of data of database, the intermediate data by with it is described in
Between the related keys of Data Identification of data be stored in the master data sheet, the intermediate data by with the intermediate data pair
The related key of the node identification for the calculate node answered is stored in the secondary tables of data;And load-on module, for when described one
It is described to calculate section from the secondary tables of data according to the node identification of the calculate node when individual or multiple calculate nodes are restarted
Point loads intermediate data corresponding with the calculate node, to be continued based on the intermediate data to subsequently received flow data
Perform the flow data processing.
Compared with prior art, according to the technical scheme of the application, it is possible to increase distributed stream computing system is looked into when starting
The speed of pilot process data corresponding to each calculate node is ask, so as to improve the speed of loading data, and then is lifted distributed
The toggle speed of stream calculation system.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen
Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the structure chart of distributed stream computing system of the prior art;
Fig. 2 is the flow chart of the data processing method of the embodiment of the present application;
Fig. 3 is mediant corresponding to the node identification according to calculate node of the embodiment of the present application loads from secondary tables of data
According to flow chart;
Fig. 4 is that the embodiment of the present application obtains from master data sheet corresponding intermediate data according to inquiry request as inquiry
As a result the specific flow chart for the step of returning;
Fig. 5 is the structured flowchart of the data processing equipment of the embodiment of the present application;And
Fig. 6 is the structure chart of the targeted distributed stream computing system of the technical scheme of the application.
Embodiment
The main thought of the application is, in distributed stream computing system, by intermediate data caused by each calculate node
It is respectively written into different keys in the master data sheet and secondary tables of data of database, can be with phase when the distributed system is restarted
The key answered is searched intermediate data corresponding to each node and loaded in secondary tables of data, so as to improve the speed of loading data
Degree.Also, according to the scheme of the application, the corresponding intermediate data of each calculate node can be loaded immediately when system starts, because
And the applicability of scheme is extensive, do not limited by application scenarios.
The technical scheme of the application can apply to distributed stream computing system, with reference to figure 6, the distributed stream computing system
600 can include one or more calculate node 610-1 ..., 610-i ..., 610-N, and database 620, the database
620 include master data sheet 621 and secondary tables of data 622.During data processing, by caused by each calculate node 610-i
Intermediate data is respectively written into master data sheet 621 and secondary tables of data 622 with different keys.Here, for convenience, only show in figure
Go out a calculate node 610-i and master data sheet 621 and the relation of secondary tables of data 622.It will be appreciated that other each calculate nodes with
Master data sheet 621 is also respectively provided with similar relation with secondary tables of data 622.
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described corresponding accompanying drawing.Obviously, described embodiment is only the application one
Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing
Go out under the premise of creative work the every other embodiment obtained, belong to the scope of the application protection.
According to embodiments herein, there is provided a kind of data processing method.
The data processing method of the application can apply in distributed stream computing system handle data, wherein,
The distributed stream computing system can include one or more calculate nodes and be saved for storing one or more of calculate
The database of intermediate data corresponding to point.Wherein, the intermediate data is that the pilot process being calculated calculates data.It is distributed
Stream calculation system, it can be a real-time system, i.e., run all the time.For real-time system, the data being calculated all are
Pilot process calculates data.And for real time data user, at some time point, real time data user is from distribution
The pilot process that stream calculation system is got calculates data(The real-time calculating data that the time point is calculated), can be seen as
It is the result data of final result data, the i.e. time point.The database can be non-relational NoSQL databases.
It is separate, each section that the pilot process of each calculate node of distributed stream computing system, which calculates data,
Association of the data without certainty between point.As the system reboots, each node only needs to load the part related to oneself
Pilot process calculates data.
With reference to figure 2, Fig. 2 is the flow chart of the data processing method of the embodiment of the present application.
At step S201, flow data is performed to the flow data received by one or more calculate nodes and handled.
Distributed stream computing system can be a real-time system, and data are constantly input into distributed stream computing system
In, the flow data received is assigned to one or more calculate nodes by distributed system, by each calculate node to receiving
The flow data arrived performs flow data processing.Wherein, it is caused in real time related to the demand according to the demand of real time data user
Data, be input into distributed stream computing system, i.e. distributed stream computing system is input into the form of flow data
In.In distributed stream computing system, the processing logic of flow data can be determined according to the demand, and by the flow data of the input
It is assigned to corresponding calculate node and carries out corresponding data processing, wherein it is possible to Data Identification and/or number according to the flow data
Determine to perform the flow data calculate node of flow data processing according to processing logic, for example, right respectively by each calculate node
The flow data of different processing logics should be handled, or is handled by each calculate node corresponding to one or more Data Identifications
Flow data.
Wherein, real time data user can be the applications for user's request.For example, shopping online platform is sold
Family user typically can want to understand Transaction Information, the flow information in oneself shop etc. in real time.For these demands of seller user,
By seller user, the caused data related to demand input distributed stream computing system in real time on the shopping online platform,
That is, as long as seller user generates the new data related to demand, that is, it is input in the distributed stream computing system, by this point
Cloth stream calculation system carries out the processing related to demand to the flow data.
At step S202, using the result that the flow data is handled as intermediate data storage database main number
According in table and secondary tables of data.
Wherein, the intermediate data is stored in the master data by the key related to the Data Identification of the intermediate data
In table, so as to which real time data user can identify the key related to the Data Identification of intermediate data, so as to the real time data
User can inquire the intermediate data from master data sheet by the key.
According to one embodiment of the application, the Data Identification of the intermediate data can be the object of the intermediate data
Mark, i.e. the mark of the data object of the intermediate data, for example, can be the user of user corresponding to the intermediate data
Mark, for example, on shopping online platform seller user account., then can be with for example, the account of certain seller user is " abc "
The key of " abc " as intermediate data corresponding to the user.When real time data user wants to obtain corresponding to the seller user's
, can be at predetermined time intervals during intermediate data(For example, 5 seconds)It is in key to be inquired about in the index in master data sheet with " abc "
Between data storage location, so as in real time obtain corresponding to the seller user result data, the result data can sell family expenses
The real-time deal information or real-time traffic information of family demand, at the same time it can also which these information are showed into seller user.
The intermediate data by the intermediate data corresponding to the related key of node identification of calculate node be stored in
,, can be from pair so as to as the system reboots it is thus possible to identify the key related to each node identification in the secondary tables of data
Intermediate data corresponding with the calculate node is found by the key in tables of data, and is middle corresponding to calculate node loading
Data, so as to continue executing with flow data processing.
For example, using 1,2 ..., i ..., N identify N number of calculate node in distributed stream computing system as node identification,
Key that can be using the node identification of each calculate node as the intermediate data of the calculate node.
According to one embodiment of the application, institute can also be included in key of the intermediate data in the secondary tables of data
State key of the intermediate data in the master data sheet.Specifically, key of the intermediate data in secondary tables of data can include and this
The node identification of calculate node corresponding to intermediate data related character and key of the intermediate data in master data sheet.
In a specific embodiment, key of the intermediate data in the secondary tables of data can be by corresponding with the intermediate data
Key composition in master data sheet of the node identification of calculate node, separator and the intermediate data.
For example, the node identification of a calculate node is 18, the key of the intermediate data of the calculate node in master data sheet is
" abc ", then key " 18abc " can be formed with the key " abc " of the node identification " 18 " and the intermediate data in master data sheet, will
The intermediate data is that key writes secondary tables of data with " 18abc ".Wherein, the node identification of calculate node corresponding with the intermediate data
Related character and the intermediate data can be separated between the key in master data sheet with any separator, for example, at this
It is to be separated with " space " in example.
In step S203, when one or more of calculate nodes are restarted, according to the node mark of the calculate node
It is that the calculate node loads intermediate data corresponding with the calculate node from the secondary tables of data to know, with based in described
Between data flow data processing is continued executing with to subsequently received flow data.
With reference to figure 3, it according to the node identification of the calculate node is the calculate node from the secondary tables of data that Fig. 3, which is,
The flow chart of loading intermediate data corresponding with the calculate node.
Step S301, searched and the node in the index of the secondary tables of data according to the node identification of the calculate node
Identify related key.
Specifically, the intermediate data of each calculate node of one or more calculate nodes with the calculate node
The related data of node identification are stored in the secondary tables of data as key, then can be according to the node mark of each calculate node
Know, the key related to the node identification is searched in the index of the secondary tables of data.
According to one embodiment of the application, according to the node identification of the calculate node the database secondary tables of data
Index in, search the key related to the node identification of the calculate node, can be by the way of range scans, i.e. pass through finger
Surely the starting position searched and end position, the node identification phase with the calculate node is searched from the index of the secondary tables of data
The key of pass.
The application can use NoSQL database purchase intermediate data, and the index created in NoSQL databases is ordered into
, therefore it may only be necessary to specify starting position and end position can by the inquiry modes of range scans from secondary tables of data
The key of the corresponding intermediate data of each calculate node is searched in index.
Range scans could be arranged to the left scan mode closed the right side and opened, i.e. scans, arrives since starting position in the index
Terminate to scan at the end of end position, not the data of scan end position.
For example, key of all intermediate data of node 18 in secondary tables of data is " in 18+ separators+master data sheet
Key ".For example, key of the intermediate data of node 18 in master data sheet is " abc ", then the intermediate data is in secondary tables of data
Key be " 18abc ", wherein, using the node identification of calculate node corresponding to " space " as the intermediate data and the intermediate data
The separator between key in master data sheet.The key of intermediate data corresponding to node 18 is searched in the index of secondary tables of data
When, the starting position of lookup and end position could be arranged to:
Starting position:" 18 ", pay attention to:18 are followed by separator " space ", that is, " 18+ separators "
End position:" 19 ", pay attention to:19 are followed by separator " space ", that is, " 19+ separators "
Here the node identification of separator calculate node corresponding with the intermediate data is with the intermediate data in master data
The separator between key in table is identical, i.e. with the separator in " 18abc " is all mutually " "(Space).
Because " 19+ separators " is end position, therefore, the key for including " 18 " is scanned(That is, own corresponding to node 18
The storage location of intermediate data)When scanning key afterwards includes the key of " 19 ", it will terminate to scan, scanning will not be gone to include again
The key of " 19 ".
Step S302, according to the key related to the node identification, it is determined that intermediate data corresponding with the calculate node
Storage location.That is, in the index of secondary tables of data, according to finding the key related to the node identification of calculate node,
It is determined that the storage location of intermediate data corresponding with the calculate node.
It should be understood that although the above-mentioned key to the intermediate data in the application in the secondary tables of data is included among with this
The key of the node identification of calculate node corresponding to data related character, separator and the intermediate data in master data sheet
When, one of intermediate data corresponding to one or more of calculate nodes is searched by key from the secondary tables of data of the database
Kind of embodiment is described, but in fact, the difference of the structure of key according to the intermediate data in the secondary tables of data also
Other arbitrarily suitable lookup modes can be used.
In addition, it is to be appreciated that searched in the application by key from the secondary tables of data of the database one or more
The embodiment of intermediate data corresponding to individual calculate node is not limited to above-described embodiment, but can also use other any conjunctions
Suitable mode from the secondary tables of data of the database by key search corresponding to intermediate data.
Step S303, from intermediate data corresponding to storage location loading.That is, this is found from the secondary tables of data
After the storage location of intermediate data corresponding to calculating, the corresponding intermediate data that is found from the secondary tables of data is loaded to phase
The calculate node answered, that is to say, that load in internal memory corresponding to intermediate data to the calculate node corresponding to the calculate node.
Loaded and corresponded to from secondary tables of data according to the node identification of the calculate node by above-mentioned step S301~S303
Intermediate data to after the calculate node, the stream can be continued executing with to subsequently received flow data based on the intermediate data
Data processing.
According to one embodiment of the application, in being parsed from key of the intermediate data in the secondary tables of data
Between key of the data in the master data sheet, with reflecting for key of the respective stored intermediate data in master data sheet and the intermediate data
Relation is penetrated, for being used in follow-up data processing.
Specifically, in resolving, can remove the intermediate data in the key of the secondary tables of data with the mediant
According to the related character of the node identification of corresponding calculate node and the node mark in calculate node corresponding with the intermediate data
Separator between the key of the character of sensible pass and the intermediate data in master data sheet, the intermediate data is obtained in master data sheet
In key.For example, key of the node 18 in secondary tables of data is " 18abc ", then " 18 " and separator " " can be removed(Space), obtain
To key " abc " of the intermediate data in master data sheet.
After distributed stream computing system starts, in follow-up data processing procedure, because intermediate data is in the main number
Can be the Data Identification of the intermediate data according to the key in table, therefore the Data Identification for the flow data that can be come according to distribution
(That is, key of the intermediate data in master data sheet), searched in each calculate node corresponds to internal memory among corresponding to the flow data
Data, and related streams data processing is continued executing with to subsequently received flow data using the intermediate data.
According to one embodiment of the application, can also include:In response to the result handled for the flow data
Inquiry request, according to the Data Identification of the intermediate data of the result handled as the flow data, from the master data
The step of corresponding intermediate data is obtained in table and returns to the intermediate data as Query Result.It is detailed with reference to Fig. 4
The process of the step is described.
As shown in figure 4, in step S401, according to the Data Identification included in the inquiry request, in the master data sheet
Index in search the key related to the Data Identification.For example, it is " abc " for real time data user's requesting query account
Seller user exchange hand inquiry request, then according to the account " abc ", search in the index of master data sheet with " abc " phase
The key of pass.
Next, in step S402, according to the key related to the Data Identification, it is determined that corresponding with the Data Identification
The storage location of intermediate data.That is, after the key related to the Data Identification being found in the index of master data sheet, according to rope
Draw the storage location for determining intermediate data corresponding with the Data Identification.
Then, in step S403, returned from intermediate data corresponding to storage location acquisition as Query Result.That is,
The intermediate data is obtained from the storage location of intermediate data corresponding with the Data Identification, and reality is returned to as the result of inquiry
When data consumer.
So far the data processing method according to the embodiment of the present application is described with reference to Fig. 1 to Fig. 4.By using this
The technical scheme of application, for the distributed stream computing system that one has N number of concurrent program, mediant is loaded on startup
It is about to spend time taking 1/N using full table scan mode according to the time spent.Assuming that a distributed stream computing system has
400 concurrent calculate nodes, if the time of cost is 2 needed for being loaded immediately when being started in the way of full table scan
Hour, then in theory, the time spent using the loading strategy of the application is about 18 seconds.Therefore, in the technical side of the application
In case, it is lost when a little being run by increase, it is possible to which solution uses distributed stream of the non-relational data as storage instrument
The problem of computing system spends overlong time can not even start when starting using load mode immediately.
Similarly, the embodiment of the present application additionally provides a kind of data processing equipment.
Fig. 5 schematically shows the structured flowchart of the data processing equipment 500 according to the application one embodiment.The dress
Putting 500 can include:Processing module 510, memory module 520 and load-on module 530.
Wherein, processing module 510 can be used for performing stream to the flow data received by one or more calculate nodes
Data processing.
The result that memory module 520 can be used for handling the flow data is as intermediate data storage in database
Master data sheet and secondary tables of data in, the intermediate data is stored in by the key related to the Data Identification of the intermediate data
In the master data sheet, the intermediate data by the intermediate data corresponding to calculate node the related key of node identification
It is stored in the secondary tables of data.
Load-on module 530 can be used for when one or more of calculate nodes are restarted, according to the calculate node
Node identification is that the calculate node loads intermediate data corresponding with the calculate node from the secondary tables of data, with based on
The intermediate data continues executing with the flow data processing to subsequently received flow data.
According to one embodiment of the application, device 500 can further include enquiry module, and the module can be used for
In response to the inquiry request of the result handled for the flow data, according to the result handled as the flow data
Intermediate data Data Identification, corresponding intermediate data is obtained from the master data sheet and using the intermediate data as looking into
Result is ask to return.
Enquiry module, which may further include, searches submodule, determination sub-module and acquisition submodule.
Wherein, search submodule to can be used for according to the Data Identification included in the inquiry request, in the master data
The key related to the Data Identification is searched in the index of table.
Determination sub-module can be used for according to the key related to the Data Identification, it is determined that corresponding with the Data Identification
The storage location of intermediate data.
Acquisition submodule can be used for returning as Query Result from intermediate data corresponding to storage location acquisition.
According to one embodiment of the application, load-on module 530 may further include:Search submodule, determine submodule
Block and loading submodule.
Submodule is searched to can be used for being searched in the index of the secondary tables of data according to the node identification of the calculate node
The key related to the node identification.
Determination sub-module can be used for according to the key related to the node identification, it is determined that in corresponding with the calculate node
Between data storage location.
Loading submodule can be used for from intermediate data corresponding to storage location loading.
According to one embodiment of the application, wherein, the Data Identification of the intermediate data can include:The mediant
According to object identity.
According to one embodiment of the application, wherein, the intermediate data can wrap in the key in the secondary tables of data
Include key of the intermediate data in the master data sheet.
According to one embodiment of the application, wherein, key of the intermediate data in the secondary tables of data includes and this
The node identification of calculate node corresponding to intermediate data related character and key of the intermediate data in master data sheet.
By the function that the device of the present embodiment is realized essentially corresponds to earlier figures 1 to the embodiment of the method shown in Fig. 4,
Therefore not detailed part in the description of the present embodiment, the related description in previous embodiment is may refer to, will not be described here.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net
Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flashRAM).Internal memory is showing for computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved
State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein
Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability
Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping
Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described
Other identical element also be present in the process of element, method, commodity or equipment.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more
Usable storage medium(Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The computer program production of upper implementation
The form of product.
Embodiments herein is the foregoing is only, is not limited to the application, for those skilled in the art
For member, the application can have various modifications and variations.All any modifications within spirit herein and principle, made,
Equivalent substitution, improvement etc., should be included within the scope of claims hereof.
Claims (10)
- A kind of 1. data processing method, it is characterised in that including:Flow data is performed by one or more calculate nodes to the flow data received to handle;Using the result that the flow data is handled as intermediate data storage in the master data sheet and secondary tables of data of database, The intermediate data is stored in the master data sheet by the key related to the Data Identification of the intermediate data, the centre Data by the intermediate data corresponding to the related key of node identification of calculate node be stored in the secondary tables of data;With AndWhen one or more of calculate nodes are restarted, according to the node identification of the calculate node from the secondary tables of data Load corresponding with calculate node intermediate data for the calculate node, with based on the intermediate data to subsequently received Flow data continue executing with flow data processing.
- 2. according to the method for claim 1, it is characterised in that further comprise:In response to the inquiry request of the result handled for the flow data, according to the processing handled as the flow data As a result the Data Identification of intermediate data, corresponding intermediate data is obtained from the master data sheet and makees the intermediate data Returned for Query Result.
- 3. according to the method for claim 2, it is characterised in that in response to the result that is handled for the flow data Inquiry request, according to the Data Identification of the intermediate data of the result handled as the flow data, from the master data sheet It is middle to obtain corresponding intermediate data and returned the intermediate data as Query Result, further comprise:According to the Data Identification included in the inquiry request, searched and the Data Identification in the index of the master data sheet Related key;According to the key related to the Data Identification, it is determined that the storage location of intermediate data corresponding with the Data Identification;Returned from intermediate data corresponding to storage location acquisition as Query Result.
- 4. according to the method for claim 1, the node identification according to the calculate node is from the secondary tables of data Intermediate data corresponding with the calculate node is loaded for the calculate node, is further comprised:The key related to the node identification is searched in the index of the secondary tables of data according to the node identification of the calculate node;According to the key related to the node identification, it is determined that the storage location of intermediate data corresponding with the calculate node;AndFrom intermediate data corresponding to storage location loading.
- 5. according to the method for claim 1, wherein, the Data Identification of the intermediate data includes:The intermediate data Object identity.
- 6. according to the method any one of claim 1-5, wherein, key of the intermediate data in the secondary tables of data Include key of the intermediate data in the master data sheet.
- 7. according to the method any one of claim 1-5, wherein, key of the intermediate data in the secondary tables of data The key of node identification and the intermediate data including calculate node corresponding with the intermediate data in master data sheet.
- A kind of 8. data processing equipment, it is characterised in that including:Processing module, handled for performing flow data to the flow data received by one or more calculate nodes;Memory module, for using the result that the flow data is handled as intermediate data storage database master data sheet In secondary tables of data, the intermediate data is stored in the master data by the key related to the Data Identification of the intermediate data In table, the intermediate data by the intermediate data corresponding to calculate node the related key of node identification be stored in it is described In secondary tables of data;AndLoad-on module, for when one or more of calculate nodes are restarted, according to the node identification of the calculate node from It is that the calculate node loads intermediate data corresponding with the calculate node in the secondary tables of data, so that one or more Individual calculate node continues executing with the flow data processing based on the intermediate data to subsequently received flow data.
- 9. device according to claim 8, it is characterised in that further comprise:Enquiry module, for the inquiry request of the result in response to being handled for the flow data, according to as the stream The Data Identification of the intermediate data of the result of data processing, corresponding intermediate data is obtained from the master data sheet and is incited somebody to action The intermediate data returns as Query Result.
- 10. device according to claim 9, it is characterised in that the enquiry module includes:Submodule is searched, for according to the Data Identification included in the inquiry request, being looked into the index of the master data sheet Look for the key related to the Data Identification;Determination sub-module, for the basis key related to the Data Identification, it is determined that mediant corresponding with the Data Identification According to storage location;Acquisition submodule, for being returned from intermediate data corresponding to storage location acquisition as Query Result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310751401.6A CN104750749B (en) | 2013-12-31 | 2013-12-31 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310751401.6A CN104750749B (en) | 2013-12-31 | 2013-12-31 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104750749A CN104750749A (en) | 2015-07-01 |
CN104750749B true CN104750749B (en) | 2018-04-03 |
Family
ID=53590444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310751401.6A Active CN104750749B (en) | 2013-12-31 | 2013-12-31 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104750749B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202378A (en) * | 2016-07-08 | 2016-12-07 | 中国地质大学(武汉) | The immediate processing method of a kind of streaming meteorological data and system |
CN108932313B (en) * | 2018-06-20 | 2021-06-04 | 斑马网络技术有限公司 | Data processing method and device, electronic equipment and storage medium |
CN110909024A (en) * | 2018-09-14 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Data processing method, data processing device, computing equipment and stream computing system |
CN110908995B (en) * | 2018-09-17 | 2023-04-11 | 阿里巴巴集团控股有限公司 | Data processing method, device and equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102089741A (en) * | 2007-09-17 | 2011-06-08 | 国际商业机器公司 | Executing computer-intensive database user-defined programs on an attached high-performance parallel computer |
CN103049519A (en) * | 2012-12-18 | 2013-04-17 | 曙光信息产业(北京)有限公司 | Data uploading method and data uploading device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090043750A1 (en) * | 2007-08-07 | 2009-02-12 | Barsness Eric L | Query Optimization in a Parallel Computer System with Multiple Networks |
US8996469B2 (en) * | 2010-08-30 | 2015-03-31 | Adobe Systems Incorporated | Methods and apparatus for job state tracking in cluster computing |
-
2013
- 2013-12-31 CN CN201310751401.6A patent/CN104750749B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102089741A (en) * | 2007-09-17 | 2011-06-08 | 国际商业机器公司 | Executing computer-intensive database user-defined programs on an attached high-performance parallel computer |
CN103049519A (en) * | 2012-12-18 | 2013-04-17 | 曙光信息产业(北京)有限公司 | Data uploading method and data uploading device |
Also Published As
Publication number | Publication date |
---|---|
CN104750749A (en) | 2015-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105718455B (en) | A kind of data query method and device | |
CN107015985B (en) | Data storage and acquisition method and device | |
WO2020023828A1 (en) | Blockchain-based cross-chain data operation method and apparatus | |
CN110287197B (en) | Data storage method, migration method and device | |
US7539689B2 (en) | Bundling database | |
KR20190136053A (en) | Method and device for writing service data to blockchain system | |
CN106407207B (en) | Real-time newly-added data updating method and device | |
WO2015195830A2 (en) | Data query method and apparatus | |
CN104750749B (en) | Data processing method and device | |
CN108959510B (en) | Partition level connection method and device for distributed database | |
CN106897342B (en) | Data verification method and equipment | |
CN105930479A (en) | Data skew processing method and apparatus | |
CN106326309A (en) | Data query method and device | |
CN108399175A (en) | A kind of storage of data, querying method and its device | |
CN110427364A (en) | A kind of data processing method, device, electronic equipment and storage medium | |
CN113220717A (en) | Block chain-based data verification method and device and electronic equipment | |
CN114329096A (en) | Method and system for processing native map database | |
CN110263184A (en) | A kind of data processing method and relevant device | |
CN104166649A (en) | Caching method and device for search engine | |
CN107562533B (en) | Data loading processing method and device | |
CN110019544B (en) | Data query method and system | |
CN116361287A (en) | Path analysis method, device and system | |
CN106886546B (en) | Construction method and equipment of data website | |
CN115114289A (en) | Data query method and device and electronic equipment | |
CN114691610A (en) | Directory processing method and device, storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |