CN104424186B - The method and device of persistence is realized in a kind of stream calculation application - Google Patents

The method and device of persistence is realized in a kind of stream calculation application Download PDF

Info

Publication number
CN104424186B
CN104424186B CN201310362269.XA CN201310362269A CN104424186B CN 104424186 B CN104424186 B CN 104424186B CN 201310362269 A CN201310362269 A CN 201310362269A CN 104424186 B CN104424186 B CN 104424186B
Authority
CN
China
Prior art keywords
persistence
message
starting
offset
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310362269.XA
Other languages
Chinese (zh)
Other versions
CN104424186A (en
Inventor
刘健男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba South China Technology Co.,Ltd.
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310362269.XA priority Critical patent/CN104424186B/en
Publication of CN104424186A publication Critical patent/CN104424186A/en
Application granted granted Critical
Publication of CN104424186B publication Critical patent/CN104424186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The method and device of persistence, including present lot information consumption success are realized in being applied this application discloses a kind of stream calculation, according to the first starting offset and the persistence interval pre-set, judges whether to need to carry out persistence operation;When needing to carry out persistence operation, persistence processing is carried out according to the message position of the second start offset amount instruction, and after persistence success, the first starting offset and the second start offset amount are updated to the start offset amount of next batch message.Persistence operation in the application is carried out behind persistence interval, the time interval of disk persistence is increased, so as to greatly improve real-time computational efficiency.In fault recovery, at most only need to consume the message of the batch in persistence interval again, avoid the performance bottleneck that frequent write magnetic dribbling comes in existing synchronous persistence, the message throughput performance calculated in real time improves an order of magnitude;Meanwhile the delay reduction that fault recovery is brought real-time is not interfered with into second level.

Description

The method and device of persistence is realized in a kind of stream calculation application
Technical field
The application is related to stream calculation technology, is the method and device for realizing persistence in espespecially a kind of stream calculation application.
Background technology
Generally, data flow is referred to as message in stream calculation, and the series of computation, processing to data flow are referred to as consuming.
Stream calculation product is mainly used in calculating in real time.Calculate and carried out generally in internal memory in real time, and result of calculation will pass through Certain approach is preserved and shown.At present, it is main to use caching or be persisted to disk such as database(Non- internal storage data Storehouse)Middle two ways preserves to result of calculation.Wherein, because cache way does not have physical disk input/output(I/O), Therefore, cache way has unsurpassed message handling capacity;But because result of calculation does not have persistence, cache way Almost there is no fault-tolerant ability, that is to say, that once occurring that application program is interrupted, server is delayed machine, caching situations such as being cleared, protect The result of calculation deposited in the buffer will be unable to recover.And by the way of being persisted in disk, it is possible to achieve highest level Fault tolerance, still, it is persisted to disk and is related to substantial amounts of disk write, which in turn reduces the calculating speed of stream calculation, performs Efficiency is about than the order of magnitude lower by the way of caching.
Fig. 1 is data flow schematic diagram in existing basic error-tolerance type stream calculation application, as shown in figure 1, message-oriented middleware The message flow for collecting pocket transmission is one by one.For the ease of fault-tolerant, the stream calculation cluster pair in usual stream calculation product such as Fig. 1 It is in units of batch that message flow, which carries out consumption, i.e., some message is bundled in a batch, each batch has one Individual unique mark(ID).For the message of a batch, after every a piece of news only in batch is all successfully consumed, this The message of batch is just marked as successfully being consumed;As long as there is a piece of news not consumed successfully in a batch, whole batch Secondary message will be resend by message-oriented middleware, be consumed again by stream calculation cluster.
The message flow that final process is crossed, which is stored to disk, is referred to as persistence, and this step is most important for fault recovery 's.Once occur application program interrupt, server delay machine situations such as, only persistence operation just can guarantee that the result that calculates in real time Do not lose.Fault recovery, calculate in real time when application is restarted, it is necessary to load the process data calculated in real time and knot from disk again Fruit data, by it is stateful return to failure occur before a correct time point.In the ZooKeeper collection shown in Fig. 1 The offset for the message queue being stored with group where the message of message-oriented middleware collection pocket transmission.When message-oriented middleware collection pocket transmission During message a collection of to stream calculation cluster, start offset amount of the message of this batch in message queue can be recorded in ZooKeeper. If the message of this batch is successfully consumed by stream calculation cluster, then, message-oriented middleware cluster can send next batch message, The offset recorded in ZooKeeper is updated to start offset amount of the next batch message in message queue therewith;If this Batch message is failed by consumption, then, stream calculation cluster can re-read the skew of this batch message from ZooKeeper clusters Measure, then re-request batch message into message-oriented middleware cluster, to realize the failure retransfer of message.
The content of the invention
In order to solve the above-mentioned technical problem, the method and dress of persistence are realized in being applied this application provides a kind of stream calculation Put, the security recovery of data after failure can be ensured, improve real-time computational efficiency.
In order to reach the application purpose, the application provides a kind of method that persistence is realized in stream calculation application, including:
Present lot information consumption success, according to for preserving the current batch message consumed in message queue First starting offset of original position and the persistence interval pre-set, judge whether to need to carry out persistence operation;
When needing to carry out persistence operation, exist according to the next batch message for preserving the last persistence operation The message position of second start offset amount instruction of the original position in message queue carries out persistence processing;
After persistence operates successfully, it is next batch message to update the first starting offset and the second start offset amount respectively Start offset amount.
When the stream calculation is using normally starting, or starting after fault recovery, this method also includes:
Second is changed to according to the second start offset amount request message, while by the value of the described first starting offset The value of start offset amount.
When the value of the second start offset amount is sky or does not preserve the second start offset amount, the present lot disappears Original position of the breath positioned at the message queue of message-oriented middleware;
Also include simultaneously:The value for setting the first starting offset is sky.
The persistence operation failure, this method also include:According to the described first starting offset instruction, again to described Message in present lot message is consumed.
It is described to judge whether to need progress persistence operation to include:By the ID of the present lot divided by persistence interval, When its remainder is zero, judge to need to carry out persistence operation;
Wherein, batch ID is the integer using incremental steps as 1 since 1.
When the stream calculation is using normally starting, or starting after fault recovery, the batch ID then stream calculation applications It is 1 incremental that the batch ID that last success persistence before stopping is crossed, which continues incremental steps,.
The device that persistence is realized in a kind of stream calculation application, at least memory module, judge module is also disclosed in the application, with And processing module, wherein,
Memory module, wherein preserving persistence interval, for preserving the current batch message consumed in message team First starting offset of the original position in row, and the next batch message for preserving the last persistence operation exist Second start offset amount of the original position in message queue;
Judge module, present lot information consumption success, according to the first starting offset preserved in memory module and in advance The persistence interval first set, persistence notice is sent to processing module when judging to need to carry out persistence operation;
Processing module, the persistence notice from judge module is received, according to the second starting preserved in memory module The message position for offseting amount instruction carries out persistence operation;And after persistence operates successfully, by the first starting offset and the Two start offset amounts are updated to the start offset amount of next batch message.
The processing module is further used for:
When starting stream calculation application is normal, or starting after fault recovery, according to the preserved in the memory module Two start offset amount request messages from message-oriented middleware, while originate offset by preserved in the memory module first Value is changed to the value of the second start offset amount.
The processing module is further used for, in the persistence operation failure, according to being protected in the memory module The the first starting offset instruction deposited, is consumed to the message in the present lot message from message-oriented middleware again.
The judge module is specifically used for:The ID of present lot indicated by offset divided by described is originated by described first Persistence interval, when its remainder is zero, judge to need to carry out persistence operation, sending persistence to the processing module leads to Know;Wherein, batch ID is the integer using incremental steps as 1 since 1.
The scheme that the application provides includes present lot information consumption success, according to for preserving current batch consumed First starting offset of original position of the secondary message in message queue and the persistence interval pre-set, judge whether to need Carry out persistence operation;When needing to carry out persistence operation, according to for preserving the next of the last persistence operation The message position of second start offset amount instruction of original position of the batch message in message queue carries out persistence processing, and After persistence success, the first starting offset and the second start offset amount are updated to the start offset of next batch message Amount.Persistence operation in the application will be carried out after not consumed successfully for each batch, but be set in advance at one Carried out behind the time interval put i.e. persistence interval, increase the time interval of disk persistence, so as to greatly improve in real time Computational efficiency.So, in fault recovery, at most only need to consume the message of the batch in persistence interval again, it is and existing Synchronous persistence is compared, and avoids the performance bottleneck that frequent write magnetic dribbling comes in synchronous persistence scheme, the message calculated in real time Throughput performance improves an order of magnitude, and has reached with the scheme of cache way in the same order of magnitude;Meanwhile by failure The delay reduction for recovering to bring does not interfere with real-time to second level.
Other features and advantage will illustrate in the following description, also, partly become from specification Obtain it is clear that or being understood by implementing the application.The purpose of the application and other advantages can be by specification, rights Specifically noted structure is realized and obtained in claim and accompanying drawing.
Brief description of the drawings
Accompanying drawing is used for providing further understanding technical scheme, and a part for constitution instruction, with this The embodiment of application is used for the technical scheme for explaining the application together, does not form the limitation to technical scheme.
Fig. 1 is the schematic diagram of data flow in existing basic error-tolerance type stream calculation application;
Fig. 2 is the flow chart for the method that persistence is realized in the application of the application stream calculation;
Fig. 3 is the schematic flow sheet for the embodiment that persistence is realized in the application of the application stream calculation;
Fig. 4 is the composition structural representation for the device that persistence is realized in the application of the application stream calculation.
Embodiment
For the purpose, technical scheme and advantage of the application are more clearly understood, below in conjunction with accompanying drawing to the application Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application Feature can mutually be combined.
In one typical configuration of the application, computing device includes one or more processors(CPU), input/output Interface, network interface and internal memory.
Internal memory may include the volatile memory in computer-readable medium, random access memory(RAM)And/or The forms such as Nonvolatile memory, such as read-only storage(ROM)Or flash memory(flashRAM).Internal memory is showing for computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory(PRAM), static RAM(SRAM), it is dynamic State random access memory(DRAM), other kinds of random access memory(RAM), read-only storage(ROM), electric erasable Programmable read only memory(EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage(CD-ROM)、 Digital versatile disc(DVD)Or other optical storages, magnetic cassette tape, tape magnetic rigid disk stores or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include non-temporary computer readable media(transitory media), such as data-signal and carrier wave of modulation.
Can be in the computer system of such as one group computer executable instructions the flow of accompanying drawing illustrates the step of Perform.Also, although logical order is shown in flow charts, in some cases, can be with suitable different from herein Sequence performs shown or described step.
At present, error-tolerance type stream calculation scheme substantially has following several:
One kind is to write real-time result of calculation and cache, and disaster tolerance is realized by disposing two sets of identical computing clusters.This The advantages of kind mode is that, due to no magnetic disc i/o, execution efficiency is fast, and concurrent access performance is high;But shortcoming is also apparent , i.e., lower deployment cost is double, and if two sets of computing clusters break down simultaneously, result of calculation still can lose.
Another kind is to start one to calculate application in real time to complete normal service computation function, and will calculate knot in real time Fruit write buffer;Meanwhile open another independent real-time calculating and apply so that the origination message received is persisted into disk In.Because the application for being persisted to the application of disk with finishing service calculates is separate, it is thereby achieved that high perform effect Rate.When occur cluster delay the failures such as machine when, in fault recovery, it is necessary to be consumed again to backing up message in disk, Although loss of data will not be caused, when size of message is very big, the process of fault recovery result in calculates application in real time Significant delays, so as to lose real-time value.
Also one kind is improved on the basis of second scheme, using the method for synchronous persistence, i.e., is counted in real time Calculation is applied when consumption is per batch message, all by real-time result of calculation(It is not origination message, but disappearing after consumption is processed Breath)It is persisted in disk, moreover, only after the consumption to the batch and persistence operation all successes, among message Part cluster can just send the message of next batch.This mode is the stream calculation scheme for fault recovery general at present.Tool For body,
Synchronous persistence, it is exactly the operation that each batch will make a write magnetic disk.As shown in figure 1, it is stored in Offset in ZooKeeper clusters can be updated to next batch message institute with the successful consumption to a batch message In the original position of message queue.If not doing synchronous persistence, once generation application interruption, cluster are delayed situations such as machine, in event Barrier recover, using restarting after, will have partial data loss.T batches are successfully being consumed as an example it is assumed that applying in real time Message after, do not carry out persistence, the result of calculation of this batch be saved in disk, and then consume (T+1) batch Message, the offset being now stored in Zookeeper clusters has been updated to message queue where (T+1) batch message Original position.If server now occurs to delay machine, after fault recovery, using restarting, application in real time is from Zookeeper collection Offset and the request message into message-oriented middleware cluster are obtained in group, it is clear that what is at this moment asked is (T+1) batch Message;And the message of T batches is not stored in disk before(It is not carried out synchronous persistence), necessarily cause T batches The loss of message.
In synchronous persistence scheme, because each batch will do the operation of a write magnetic disk, and during the transmission of batch Between be spaced about between 400 milliseconds~2 seconds, in real time application in internal memory is calculated about the time of cost to each batch message Within 1 second.So, the frequency of disk write can be very frequent.According to a large amount of experiences it is recognised that being spent per batch disk write Time account for whole batch and consume the proportion of spent total time and reached 50% or more, it is real-time that disk write becomes influence The main bottleneck of computational efficiency.
Fig. 2 is the flow chart for the method that persistence is realized in the application of the application stream calculation, as shown in Fig. 2 including following step Suddenly:
Step 200:Present lot information consumption success, according to for preserving the current batch message consumed in message First starting offset of the original position in queue and the persistence interval pre-set, judge whether to need to carry out persistence Operation.
In this step, the consumption to present lot message belongs to prior art, implements and is not belonging to the guarantor of the application Scope is protected, is repeated no more here.
In this step, persistence interval N is one pre-set and is more than 1 integer, such as 50.Generally, will can hold Longization interval N is arranged to an integer between 10 to 100.
Judge whether to need progress persistence operation to include in this step:By the ID of present lot divided by persistence interval N, when its remainder is zero, judge to need to carry out persistence operation.Wherein, batch ID be since 1 incremental steps be 1 it is whole Number.
It will be carried out after the persistence operation in this step, the application is successfully consumed for each batch , but be to carry out after the N of persistence interval at an interval.So, in fault recovery, at most only need to consume N again The message of individual batch, compared with existing synchronous persistence, avoid the performance that frequent write magnetic dribbling comes in synchronous persistence scheme Bottleneck, the message throughput performance calculated in real time improve an order of magnitude, and have reached with the scheme for not doing persistence as delayed Mode is deposited in the same order of magnitude, meanwhile, by the delay reduction that fault recovery is brought to second level.
Step 201:When needing to carry out persistence operation, according to the next group for preserving the last persistence operation The message position of second start offset amount instruction of original position of the secondary message in message queue carries out persistence processing.
Persistence processing is exactly that stream calculation is applied from the message position of the second start offset amount instruction, by data buffering Calculation result data is written in disk.Specific implementation belongs to the conventional techniques of those skilled in the art, except that, The result of calculation for needing exist for persistence be by the second start offset amount instruction, and the second start offset amount preserve be nearest Original position of the next batch message of persistence operation in message queue, that is to say, that persistence is from upper one Batch after the success of secondary persistence starts, and includes persistence interval N result of calculation of the batch message in data buffering.I.e. originally Persistence in application is operated after not each batch is successfully consumed and will carried out, but is persistence at a batch interval It is spaced N's.
Step 202:After persistence success, the first starting offset and the second start offset amount are updated to next batch and disappeared The start offset amount of breath.The renewal process of this step, it ensure that the success of the never persistence processing of persistence processing next time disappears The batch message of expense starts.
When normally starting in stream calculation application, or starting after fault recovery, the application method also includes:According to second Beginning offset request message, while the value of the first starting offset is changed to the value of the second start offset amount.Now, if The value of two start offset amounts is empty or do not preserve the second start offset amount, then from the starting of the message queue of message-oriented middleware Position starts request message, while the value of the first starting offset is arranged to empty.
Meanwhile stream calculation application is normal starts, or when starting after fault recovery, batch ID can then using stopping before Batch ID for crossing of last success persistence continue to be incremented by, rather than be incremented by again since 1.With ensure batch ID for The same uniqueness applied in real time.
If persistence operation failure, the application method also includes:According to the first starting offset instruction, again to current Message in batch message is consumed.
The inventive method is described in detail with reference to embodiment.Fig. 3 is to realize to hold in the application of the application stream calculation The schematic flow sheet for the embodiment changed long, using storm as stream calculation framework in the present embodiment, use java language development flowmeters Application is calculated, and is described by taking the application of error-tolerance type stream calculation shown in Fig. 1 as an example, as shown in figure 3, including:
Step 300~step 301:Stream calculation application starts as started after normal startup or fault recovery, from ZooKeeper The second start offset amount of middle reading, is then asked according to the message position of the second start offset amount instruction into message-oriented middleware Message, meanwhile, by ZooKeeper first starting offset value be changed to read the second start offset amount value.
In this step, if without preserving the second start offset amount or its value as sky in ZooKeeper, then, from disappearing The original position for ceasing the message queue of middleware starts request message, while first in ZooKeeper is originated into offset Value is arranged to empty.
Step 302:The request message into message-oriented middleware.
Step 303:The computing unit of stream calculation cluster is consumed to the present lot message received, if consumption is lost Lose, into step 308;Otherwise step 304 is entered.
Step 304:After the successful consumption of present lot message, present lot ID divided by the persistence pre-set are judged Whether the remainder obtained after the N of interval is equal to 0.If not equal to 0, illustrate now it is not necessary to persistence be carried out, into step 309;Otherwise step 305 is entered.
Step 305:When needing to carry out persistence operation, carry out persistence processing and real-time result of calculation is saved in magnetic In disk.
Step 306:Judge whether persistence operation succeeds, if being successfully entered step 307;If unsuccessfully enter step 310.Wherein, judge whether persistence successfully belongs to the conventional techniques of those skilled in the art, usual database software meeting It is supplied to the interface of user's persistence, judges whether persistence is successful by return code after calling interface.
Step 307:Persistence operates successfully, the start offset amount of next batch message is obtained from message-oriented middleware, together When the offset saved as into the first starting offset and the second start offset amount, return to step 302 afterwards.
Step 308:Fail if consumed to current message, after the first starting offset is re-read from ZooKeeper Return to step 302.If now the value of the first starting offset is sky, return to step 302 and from the message of message-oriented middleware The original position of queue starts request message.
Step 309:When persistence operation need not be carried out, the starting of next batch message is obtained from message-oriented middleware Offset, while the offset is saved as into return to step 302 after the first starting offset, continue to consume message.
Step 310:If persistence operation failure, returned after the first starting offset is re-read from ZooKeeper Step 302.
Flow shown in Fig. 3 stops when stream calculation application receives termination order.
Fig. 4 is the composition structural representation for the device that persistence is realized in the application of the application stream calculation, as shown in figure 4, extremely Include memory module, judge module, and processing module less, wherein,
Memory module, wherein preserving persistence interval, for preserving the current batch message consumed in message team First starting offset of the original position in row, and the next batch message for preserving the last persistence operation exist Second start offset amount of the original position in message queue;
Judge module, present lot information consumption success, according to the first starting offset preserved in memory module and in advance The persistence interval first set, persistence notice is sent to processing module when judging to need to carry out persistence operation;It is specific to use In by the ID of the present lot indicated by the first starting offset divided by persistence interval, when its remainder is zero, judging need to Carry out persistence operation;Wherein, batch ID is the integer using incremental steps as 1 since 1.
Processing module, the persistence notice from judge module is received, according to the second starting preserved in memory module The message position for offseting amount instruction carries out persistence processing;And after persistence success, by the first starting offset and second Beginning offset is updated to the start offset amount of next batch message.
Processing module is further used for:When normally starting in stream calculation application, or starting after fault recovery, according to storage The second start offset amount request message from message-oriented middleware preserved in module, while the first that will be preserved in memory module The value of beginning offset is changed to the value of the second start offset amount.
Processing module is further used for, in persistence operation failure, according to the first starting preserved in memory module Amount instruction is offset, the message in the present lot message from message-oriented middleware is consumed again.
By taking framework shown in Fig. 1 as an example, the memory module in the application device can be arranged in ZooKeeper, judge mould Block and processing module can be arranged in stream calculation cluster.In actual applications, also in can be substituted using other software ZooKeeper, such as HBase, Mysql etc.;Or memory module is arranged on realization etc. in message-oriented middleware.
Those skilled in the art should be understood that each part for the device that above-mentioned the embodiment of the present application is provided, And each step in method, they can be concentrated on single computing device, or are distributed in multiple computing device institutes group Into network on.Alternatively, they can be realized with the program code that computing device can perform.It is thus possible to they are deposited Storage performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by it In multiple modules or step be fabricated to single integrated circuit module to realize.So, the application is not restricted to any specific Hardware and software combine.
Although the embodiment disclosed by the application is as above, described content is only to readily appreciate the application and use Embodiment, it is not limited to the application.Technical staff in any the application art, is taken off not departing from the application On the premise of the spirit and scope of dew, any modification and change, but the application can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (10)

1. the method for persistence is realized in a kind of stream calculation application, it is characterised in that including:
Present lot information consumption success, according to for preserving starting of the current batch message consumed in message queue First starting offset of position and the persistence interval pre-set, judge whether to need to carry out persistence operation;
When needing to carry out persistence operation, according to the next batch message for preserving the last persistence operation in message The message position of second start offset amount instruction of the original position in queue carries out persistence processing;
After persistence operates successfully, the first starting offset and the second start offset amount rising for next batch message are updated respectively Beginning offset.
2. according to the method for claim 1, it is characterised in that the stream calculation application is normal to be started, or fault recovery After when starting, this method also includes:
The second starting is changed to according to the second start offset amount request message, while by the value of the described first starting offset The value of offset.
3. according to the method for claim 2, it is characterised in that the value of the second start offset amount is sky or does not preserve When having the second start offset amount, the present lot message is located at the original position of the message queue of message-oriented middleware;
Also include simultaneously:The value for setting the first starting offset is sky.
4. according to the method for claim 1, it is characterised in that the persistence operation failure, this method also include:According to The first starting offset instruction, is consumed to the message in the present lot message again.
5. the method according to claim 2 or 4, it is characterised in that described to judge whether to need to carry out persistence operation bag Include:By the ID of the present lot divided by persistence interval, when its remainder is zero, judge to need to carry out persistence operation;
Wherein, batch ID is the integer using incremental steps as 1 since 1.
6. according to the method for claim 5, it is characterised in that the stream calculation application is normal to be started, or fault recovery After when starting, batch ID that last the success persistence of the batch ID before then stream calculation application stops is crossed continues to be incremented by Step-length is 1 incremental.
7. the device of persistence is realized in a kind of stream calculation application, it is characterised in that at least memory module, judge module, and Processing module, wherein,
Memory module, wherein preserving persistence interval, for preserving the current batch message consumed in message queue Original position the first starting offset, and for preserving the next batch message of the last persistence operation in message Second start offset amount of the original position in queue;
Judge module, present lot information consumption success, set according to the first starting offset preserved in memory module and in advance The persistence interval put, persistence notice is sent to processing module when judging to need to carry out persistence operation;
Processing module, the persistence notice from judge module is received, according to the second start offset preserved in memory module The message position of amount instruction carries out persistence operation;And after persistence operates successfully, by the first starting offset and second Beginning offset is updated to the start offset amount of next batch message.
8. device according to claim 7, it is characterised in that the processing module is further used for:
When normally starting in stream calculation application, or starting after fault recovery, according to second preserved in the memory module Beginning offset request message from message-oriented middleware, while by the value of the preserved in the memory module first starting offset more It is changed to the value of the second start offset amount.
9. device according to claim 7, it is characterised in that the processing module is further used for, described lasting When changing operation failure, according to the first starting offset instruction preserved in the memory module, again to from message-oriented middleware Message in present lot message is consumed.
10. according to the device described in any one of claim 7~9, it is characterised in that the judge module is specifically used for:By institute The ID of the present lot indicated by the first starting offset divided by the persistence interval are stated, when its remainder is zero, is judged Need to carry out persistence operation, persistence notice is sent to the processing module;Wherein, batch ID is with incremental step since 1 A length of 1 integer.
CN201310362269.XA 2013-08-19 2013-08-19 The method and device of persistence is realized in a kind of stream calculation application Active CN104424186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310362269.XA CN104424186B (en) 2013-08-19 2013-08-19 The method and device of persistence is realized in a kind of stream calculation application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310362269.XA CN104424186B (en) 2013-08-19 2013-08-19 The method and device of persistence is realized in a kind of stream calculation application

Publications (2)

Publication Number Publication Date
CN104424186A CN104424186A (en) 2015-03-18
CN104424186B true CN104424186B (en) 2018-04-03

Family

ID=52973190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310362269.XA Active CN104424186B (en) 2013-08-19 2013-08-19 The method and device of persistence is realized in a kind of stream calculation application

Country Status (1)

Country Link
CN (1) CN104424186B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106598473B (en) * 2015-10-15 2020-09-04 南京中兴新软件有限责任公司 Message persistence method and device
CN107783728B (en) * 2016-08-31 2021-07-23 百度在线网络技术(北京)有限公司 Data storage method, device and equipment
CN106789741B (en) * 2016-12-26 2020-02-18 北京奇虎科技有限公司 Consumption method and device of message queue
CN107273228B (en) * 2017-07-13 2020-09-04 焦点科技股份有限公司 Message transmission method based on star topology architecture
CN107295106B (en) * 2017-07-31 2020-08-14 杭州多麦电子商务股份有限公司 Message data service cluster
CN108418879B (en) * 2018-02-26 2021-03-02 新疆熙菱信息技术股份有限公司 High-reliability massive heterogeneous data transmission method and system
CN108509299B (en) * 2018-03-29 2022-08-12 广西电网有限责任公司 Message processing method, device and computer readable storage medium
CN108984770A (en) * 2018-07-23 2018-12-11 北京百度网讯科技有限公司 Method and apparatus for handling data
CN111931025B (en) * 2020-07-20 2023-08-15 武汉美和易思数字科技有限公司 Data continuous grabbing method and system based on Actor model
CN112000489A (en) * 2020-07-29 2020-11-27 新华三大数据技术有限公司 Kafka data processing method and server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975684A (en) * 2006-12-13 2007-06-06 天津理工大学 Distributing real-time data bank fault recovering method capable of supporting serving and recovering simultaneously
CN101510838A (en) * 2009-02-26 2009-08-19 北京北纬点易信息技术有限公司 Method for implementing perdurable data queue
US8145859B2 (en) * 2009-03-02 2012-03-27 Oracle International Corporation Method and system for spilling from a queue to a persistent store

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975684A (en) * 2006-12-13 2007-06-06 天津理工大学 Distributing real-time data bank fault recovering method capable of supporting serving and recovering simultaneously
CN101510838A (en) * 2009-02-26 2009-08-19 北京北纬点易信息技术有限公司 Method for implementing perdurable data queue
US8145859B2 (en) * 2009-03-02 2012-03-27 Oracle International Corporation Method and system for spilling from a queue to a persistent store

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一体化电能实时信息采集和管理分析系统;陈凯平;《中国优秀硕士学位论文全文数据库 信息科技辑》;20100515;正文第49页 *
使用storm实现实时大数据分析;真实的归宿;《http://blog.csdn.net/hguisu/article/details/8454368》;20121231;第4-10页 *

Also Published As

Publication number Publication date
CN104424186A (en) 2015-03-18

Similar Documents

Publication Publication Date Title
CN104424186B (en) The method and device of persistence is realized in a kind of stream calculation application
CN107544862B (en) Stored data reconstruction method and device based on erasure codes and storage node
CN110493148B (en) Block processing, block consensus and block synchronization method and device
CN103942252B (en) A kind of method and system for recovering data
CN103309767A (en) Method and device for processing client log
US11271748B2 (en) Consensus methods and systems in consortium blockchain
US20230098190A1 (en) Data processing method, apparatus, device and medium based on distributed storage
CN102843396A (en) Data writing and reading method and device in distributed caching system
EP3680787B1 (en) Method for synchronization between primary database and standby database, database system and device
CN109491609B (en) Cache data processing method, device and equipment and readable storage medium
CN111383031A (en) Intelligent contract execution method and system in block chain and electronic equipment
CN106899654A (en) A kind of sequence value generation method, apparatus and system
CN106815094B (en) Method and equipment for realizing transaction submission in master-slave synchronization mode
CN109144787A (en) A kind of data reconstruction method, device, equipment and readable storage medium storing program for executing
CN108293003A (en) Distribution figure handles the fault-tolerant of network
US20130103910A1 (en) Cache management for increasing performance of high-availability multi-core systems
CN108206839A (en) One kind is based on majority's date storage method, apparatus and system
CN108984779A (en) Distributed file system snapshot rollback metadata processing method, device and equipment
CN109189615A (en) A kind of delay machine treating method and apparatus
CN110798366B (en) Task logic processing method, device and equipment
CN113448647B (en) Resource synchronization method, implementation equipment and electronic equipment
CN111541747B (en) Data check point setting method and device
WO2020238653A1 (en) Encoding method in distributed system environment, decoding method in distributed system environment, and corresponding apparatuses
CN112015325B (en) Method for generating decoding matrix, decoding method and corresponding device
CN109344630B (en) Block generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211105

Address after: Room 233, building 14, No. 788, Guangzhou Avenue South, Haizhu District, Guangzhou City, Guangdong Province

Patentee after: Alibaba South China Technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right