CN105354089B - Support the stream data processing unit and system of iterative calculation - Google Patents

Support the stream data processing unit and system of iterative calculation Download PDF

Info

Publication number
CN105354089B
CN105354089B CN201510664968.9A CN201510664968A CN105354089B CN 105354089 B CN105354089 B CN 105354089B CN 201510664968 A CN201510664968 A CN 201510664968A CN 105354089 B CN105354089 B CN 105354089B
Authority
CN
China
Prior art keywords
iterative
operator
stream data
processing
data message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510664968.9A
Other languages
Chinese (zh)
Other versions
CN105354089A (en
Inventor
林学练
申阳
王家兴
马帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruihang Zhizhen Technology Co ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510664968.9A priority Critical patent/CN105354089B/en
Publication of CN105354089A publication Critical patent/CN105354089A/en
Application granted granted Critical
Publication of CN105354089B publication Critical patent/CN105354089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provide it is a kind of support iterative calculation stream data processing unit and system.The processing unit includes: at least one input adaptation and multiple processing nodes, handling node includes streaming operator and iterative operator, it is connected between input adaptation and streaming operator or iterative operator by directed edge, is connected between streaming operator and iterative operator or streaming operator by directed edge.It is provided by the invention support iterative calculation stream data processing unit and system, it can be achieved that stream data processing in basic iterative calculation demand, solve the problems, such as stream data processing in iterative calculation.

Description

Support the stream data processing unit and system of iterative calculation
Technical field
The present invention relates to Parallel and distributed computation technical field more particularly to a kind of stream datas for supporting iterative calculation Processing unit and system.
Background technique
Now with the technology development of big data processing, using Hadoop as the batch data processing system of representative gradually It is not able to satisfy all application demands, the especially application in fields such as financial product transaction, internet information processing is wanted mostly Higher processing capability in real time is sought, to cope with the data flow persistently generated, therefore some stream data processing systems obtain extensively Application.
Stream data is the data sequence with time sequencing, can be seen as historical data and ever-increasing update number According to union.Stream data processing system does not depend on external memory generally, but is calculated in memory, to obtain better timeliness Property.Existing stream data processing system has Storm, S4, Timestream etc., and Fig. 1 is the streaming in existing Storm system Data processing unit schematic diagram, stream data processing unit as shown in Figure 1 are by a series of processing node S and processing node B The directed acyclic graph (Directed Acyclic Graph, hereinafter referred to as: DAG) of composition realizes processing node S by data flow Association between processing node B, processing node S are responsible for reading data incessantly from external data source, and with data tuple (Tuple) form is sent to corresponding processing node B, and processing node B is responsible for calculating the data flow received, realize The concrete functions such as filtering, polymerization, inquiry, can cascade, the data flow that can also be sent out after calculating.Fig. 2 is existing S4 Stream data processing unit schematic diagram in system, stream data processing unit as shown in Figure 2 are by multiple processing unit PE The DAG of logical constitution between (processing element), PE are the basic computational ele- ments in S4 system, in S4 system In, data flow is made of the ordered sequence of event, and event is calculated in each PE, is flowed between PE, is obtained in PE8 Final data flow.Stream data processing unit in existing Timestream system is also to be made of multiple processing nodes What DAG was realized, Fig. 3 is the signal of a processing node in the stream data processing unit in existing Timestream system Figure, each processing node v in data flow DAG trigger relevant operation f after obtaining input traffic iv, generate new data O is flowed, and updates the state of processing node v.
The processing system of above-mentioned three kinds of stream data processing units composition be can be seen that when handling stream data, generally All be stream data reach processing node after, processing node in memory directly carry out data it is real-time calculating after output calculating As a result, can be met well for the demand of real-time.But in practical applications, the processing of stream data is gone out The demand of iterative calculation is showed, such as there may be page rank algorithms in the short essay present treatment application of microblog data (PageRank) or user's hierarchical algorithms (TunkRank) user force calculate application, real-time traffic application in have reality When the iterative calculation such as shortest path planning, although existing above-mentioned stream data processing system meets real-time demand, but not It can be realized the iterative calculation problem in stream data processing.
Summary of the invention
The present invention provide it is a kind of support iterative calculation stream data processing unit and system, with solve stream data processing In iterative calculation problem.
In a first aspect, the present invention provides a kind of stream data processing unit for supporting iterative calculation, comprising:
At least one input adaptation and multiple processing nodes, the processing node include streaming operator and iterative operator, It is connected between the input adaptation and the streaming operator or iterative operator by directed edge, the streaming operator changes with described It is connected between formula operator or the streaming operator by directed edge;
The input adaptation is used for: receiving data flow, and received data flow is encapsulated as according to preset assembly strategy Connected processing node is sent to after stream data message;
The streaming operator is used for:
The stream data message received is put into the first processing queue, call preset first Stream Processing function into Row processing will be sent to connected processing node or output after the stream data message generated after processing encapsulation;
The iterative operator is used for:
When receiving stream data message, the stream data message received is put into second processing queue, is called Preset second Stream Processing function is handled, and is connected being sent to after the encapsulation of the stream data message that generates after processing Node or output are handled, and/or, the iterative data message generated after processing is sent to itself or iteration with itself parallelization Formula operator;
When receiving iterative data message, the iterative data message received is put into third processing queue, is called Preset iterative processing function is handled;
Wherein, the stream data message includes pending data, and the iterative data message includes pending data, changes Generation wheel number and greatest iteration wheel number.
Further, the iterative operator is also used to:
After calling preset iterative processing function to be handled, judge whether to generate new iterative data message, if producing It is raw, then the iteration wheel number in the new iterative data message is added one, is judging that the iteration wheel number after an operation is added to be greater than Terminate iteration when greatest iteration wheel number, otherwise will add the new iterative data message after an operation be sent to itself or with itself simultaneously The iterative operator of rowization.
Further, the processing node further include:
Gathering operator, the gathering operator are used for:
The stream data message received is put into fourth process queue, judges whether to have received and all polymerize with described The stream data message that the upstream processing node of formula operator connection is sent, if then calling preset third Stream Processing function pair All stream data message of caching are handled, and are connected being sent to after the encapsulation of the stream data message that generates after processing Node or output are handled, if otherwise continuing waiting for receiving new stream data message.
Further, the gathering operator is also used to:
Before the stream data message that will be received is put into fourth process queue, to the stream data message received It is pre-processed and is merged.
Further, the iterative operator is also used to:
Before the stream data message that will be received is put into second processing queue, to the stream data message received It is pre-processed and is merged, alternatively,
Before the iterative data message that will be received is put into third processing queue, to the iterative data message received It is pre-processed and is merged.
Further, the message format of the stream data message are as follows: (f1, f2 ... ... fN), wherein fX indicates X The content of field, total N number of field;
The message format of the iterative data message are as follows: (f1, f2 ... ... fN, Num, Max Num), wherein Num and Max Num is iteration wheel number and greatest iteration wheel number respectively.
Second aspect, the present invention provide a kind of stream data processing system for supporting iterative calculation, including a main controller Device and multiple computing machines, the host machine are used to be responsible for the condition monitoring for the entire cluster that the multiple computing machine is constituted And resource allocation, the host machine is also used to receive data flow, and data flow is analyzed and dispatching distribution arrives to receiving It is handled on each computing machine;
The stream data processing unit of support iterative calculation described in setting first aspect, described inside the computing machine Computing machine is used to receive the data flow of the host machine dispatching distribution, and executes the stream of the support iterative calculation of internal setting Formula data processing unit handles received data stream.
The stream data processing unit provided by the invention for supporting iterative calculation and system, by single in stream data processing It include input adaptation and streaming operator and iterative operator in member, streaming operator executes streaming computing, and iterative operator can be to Itself sends iterative data message, can trigger the iterative calculation from ring type, therefore can realize basic in stream data processing Iterative calculation demand, solve the problems, such as stream data processing in iterative calculation.
Detailed description of the invention
It, below will be to embodiment or the prior art in order to illustrate more clearly of the present invention or technical solution in the prior art Attached drawing needed in description is briefly described, it should be apparent that, the accompanying drawings in the following description is of the invention one A little embodiments for those of ordinary skill in the art without any creative labor, can also be according to this A little attached drawings obtain other attached drawings.
Fig. 1 is the stream data processing unit schematic diagram in existing Storm system;
Fig. 2 is the stream data processing unit schematic diagram in existing S4 system;
Fig. 3 is the signal of a processing node in the stream data processing unit in existing Timestream system Figure;
Fig. 4 is the structural schematic diagram for the stream data processing unit embodiment one that the present invention supports iterative calculation;
Fig. 5 is the structural schematic diagram for the stream data processing unit embodiment two that the present invention supports iterative calculation.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the attached drawing in the present invention, to this Technical solution in invention is clearly and completely described, it is clear that and described embodiments are some of the embodiments of the present invention, Instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative labor Every other embodiment obtained under the premise of dynamic, shall fall within the protection scope of the present invention.
Fig. 4 is the structural schematic diagram for the stream data processing unit embodiment one that the present invention supports iterative calculation, such as Fig. 4 institute Show, the stream data processing unit of the present embodiment may include: at least one input adaptation 11 and multiple processing nodes, processing section Point includes streaming operator and iterative operator (streaming operator shown in Fig. 4 12 and iterative operator 13).Streaming operator and iterative Operator is two distinct types of processing node.Connected between input adaptation 11 and streaming operator or iterative operator by directed edge It connects, by directed edge connection, (input adaptation 11 shown in Fig. 4 and streaming are calculated between streaming operator and iterative operator or streaming operator It is connected between son 12, streaming operator 12 and iterative operator 13 by directed edge).It should be noted that structural representation shown in FIG. 1 Figure is only enumerated one such structure example.In big data processing unit, if it exists when multiple input data sources, it can press The data type of input source is distinguished, and separately designs a kind of input adaptation for each input, if input source data type is identical, Designing an input adaptation in logic, (one input adaptation concurrently may be distributed to different disposal node for multiple tasks when calculating On), if input source data type is different, handled respectively for every kind of input source design input adaptation respectively.For same Input adaptation (the corresponding input adaptation of same data type input source in logic) is concurrently in commission multiple tasks It is distributed on different disposal node, the concurrent quantity of each input adaptation is given by application program, is defaulted as if this value is default 1。
It should be noted that operator of the invention is the basic computational ele- ment of stream data processing unit, it safeguards one One calculating function of operator state and realization.The state of operator is optionally, to be defined by the user, if user saves in memory Using data, then operator is stateful.If user does not save using data, operator is stateless.
Wherein, input adaptation 11 is used for: receiving data flow, and received data flow is encapsulated according to preset assembly strategy By being sent to the processing node connected after stream data message.
Streaming operator is used for: the stream data message received being put into the first processing queue, calls preset first Stream Processing function is handled, will be sent to after the stream data message generated after processing encapsulation connected processing node or Output.
Iterative operator is used for: when receiving stream data message, the stream data message received being put into second It handles in queue, preset second Stream Processing function is called to be handled, the stream data message generated after processing is encapsulated Be sent to connected processing node or output afterwards, and/or, by the iterative data message generated after processing be sent to itself or with The iterative operator of itself parallelization.Refer to its after the same logical node parallelization with the iterative operator of itself parallelization He handles node.
When receiving iterative data message, the iterative data message received is put into third processing queue, is called Preset iterative processing function is handled.
Wherein, stream data message include pending data, iterative data message include pending data, iteration wheel number and Greatest iteration wheel number.Such as the message format of stream data message are as follows: (f1, f2 ... ... fN), wherein fX indicates X field Content, altogether N number of field.The message format of iterative data message are as follows: (f1, f2 ... ... fN, Num, Max Num), wherein Num It is iteration wheel number and greatest iteration wheel number respectively with MaxNum.
Further, iterative operator is also used to: after calling preset iterative processing function to be handled, being judged whether New iterative data message is generated, if generating, the iteration wheel number in new iterative data message is added one, is judging to add one Iteration wheel number after operation terminates iteration when being greater than greatest iteration wheel number, otherwise will add the new iterative data message after an operation It is sent to itself or the iterative operator with itself parallelization.
Optionally, node is handled further include: gathering operator, gathering operator are used for: the stream data received is disappeared Breath is put into fourth process queue, judges whether to have received the stream that all upstream processing nodes connecting with gathering operator are sent Formula data-message, if preset third Stream Processing function is then called to handle all stream data message of caching, It will be sent to connected processing node or output after the stream data message generated after processing encapsulation, if otherwise continuing waiting for connecing Receive new stream data message.Such as the upstream processing node being connect with gathering operator have it is multiple, pass through increase gathering calculate Son, gathering operator can disposably handle all upstream nodes for receiving treated in a certain period of time data flow. Therefore it may make the application scenarios of stream data processing unit more extensive.
Preferably, gathering operator is also used to: being put into it in fourth process queue in the stream data message that will be received Before, the stream data message received is pre-processed and merged.The preset third streaming of subsequent calls can be reduced in this way The execution time for handling the processing of function, it can achieve better real-time.
Preferably, iterative operator is also used to: being put into it in second processing queue in the stream data message that will be received Before, the stream data message received is pre-processed and merged, alternatively, being put into the iterative data message that will be received Before in three processing queues, the iterative data message received is pre-processed and merged.Subsequent calls can be reduced in this way The execution of the processing for executing the time or calling preset iterative processing function of the processing of preset second Stream Processing function Time can achieve better real-time.
It should be noted that the interative computation in the stream data processing unit provided by the invention for supporting iterative calculation is most The iterative calculation of small data batch well, if it exists for a long time, the process demand of the iterative calculation of big data quantity, the present invention in It proposes the degree of parallelism that large batch of data are split to and are increased processing node, data is put into multiple parallel processing nodes Middle progress parallelization calculating needs to be counted according to field Hash (hash) value in data before carrying out parallelization calculating According to division, different status data and iterative message are sent in different parallelization processing nodes.And for it is long when Between iterative calculation, the present invention in propose on the one hand reduce an iteration processing data volume, on the other hand control iteration most Big number.
In the execution of operation, each different classes of operator may be concurrently multiple calculating tasks, and be distributed to It is executed on multiple machines.Each parallel calculating task is carried out the same operator, but safeguards the data of different subregions respectively, Handle the data flow of different subregions.
The stream data processing unit provided in this embodiment for supporting iterative calculation, by stream data processing unit Including input adaptation and streaming operator and iterative operator, streaming operator executes streaming computing, and iterative operator can be to itself Send iterative data message, the iterative calculation from ring type can be triggered, thus can realize in stream data processing it is basic repeatedly In generation, calculates demand, solves the problems, such as the iterative calculation in stream data processing.
Specifically, in the realization of programming interface, in input adaptation, Input () interface need to be set to execute number According to the external input of stream, pretreatment and transmission process, the data flow for needing to send later can be encapsulated according to preset assembly strategy Be sent among subsequent operator.For streaming operator, Stream Processing function interface Execute () interface is set and is come in fact Existing calculating process can handle input traffic in this Stream Processing function and send new data flow.This basic stream Formula processing function can be also implemented and executed in gathering operator and iterative operator, be but differed in that: be calculated in gathering In son, the trigger timing according to control Stream Processing function is needed, it is necessary to after the arrival of all upstream datas at statistics It manages (calculating).Optional Combine () interface can be increased to gathering operator simultaneously, often receiving a data It is can choose when unit using Combine () interface and data is simply pre-processed and merged, it is subsequent to reduce The execution time that Execute () interface calculates.In iterative operator, first carries out Execute () interface and carry out basic interface It calculates, triggers iterate to calculate according to certain condition later, operator can be sent to data-message oneself touching in iterative calculation every time The iteration for sending out next round, until meeting iteration termination condition or being more than maximum number of iterations.In order to realize iterative calculation needs Setting iExecute () interface executes iterative calculation and iCondition () interface to judge iteration termination condition.It is similar Ground can also increase optional pretreatment interface iPreprocess () interface to iterative operator, select before iteration starts It selects and whether needs to be implemented some data predictions and data merging work.
A specific embodiment is used below, and the technical solution of embodiment illustrated in fig. 4 is described in detail.Micro- For the application that user's hierarchical algorithms (TunkRank) user force present in the short essay present treatment application of rich data calculates To be illustrated.User force calculating has used TunkRank algorithm, this is the streaming meter for having simple interative computation The application of calculation.Fig. 5 is the structural schematic diagram for the stream data processing unit embodiment two that the present invention supports iterative calculation, such as Fig. 5 Shown, I is input adaptation, and A, B, D are streaming operator, and C is iteration operator.The user force that includes in the present embodiment calculate and Push away text number (Tweets word count) calculating.Process is as follows:
One, outside pushes away literary (tweet) data flow input input adaptation I, and input adaptation I locates received data flow in advance Stream data information and sending is encapsulated as to subsequent streaming according to preset assembly strategy after reason (such as screening invalid data etc.) Operator A.Wherein, the field content of only one field of this stream data message, it may be assumed that " (field1) ", wherein field1 is One qualified to push away literary content, format shaped like: " userA:sentence#topic#@userB ", wherein userA is to send to push away text User, sentence expression pushes away literary body matter, and topic is topic, and userB indicates the other users for being mentioned ("@").
Two, streaming operator A receives Stream Processing message, is put into processing queue.Processing thread can monitor queue, sequence It takes out a Stream Processing message and its Stream Processing function is called to be handled, treatment process is to take out in upstream data Field1 and " customer relationship to " content for pushing away text and resolving to " userA@userB ", generating new stream data message has two Each and every one field contents, it may be assumed that " (field1, field2) ", wherein field1 is userA, field2 userB.Later should It is sent to iteration operator C after new stream data message encapsulation, and literary (tweet) text will be pushed away and be sent to streaming operator B.
Three, iteration operator C safeguards the data of a customer relationship figure, this is an oriented authorized graph, and each point represents one A user, each edge represent the relationship between user, and midpoint weight indicates that the influence power of user, side right indicate between user again Relationship strength.Point weight is required user force, point weights can be write with this figure of manual queries or timing Enter the modes such as file and inquires real-time user force calculated result.
Iteration operator C is equally put into processing queue after receiving upstream operator A and being sent to its Stream Processing message and waits Stream Processing is carried out, treatment process is the side " customer relationship to " data are converted in figure, such as receives (userA, userB) Data, then find the corresponding A of two users, two points of B, and modify the weight of side AB, one established if AB does not have side This side right, is added one if having existed the side AB, then (B is equivalent to wherein according to the following formula by the new side AB that weight is 1 again X, A is equivalent to its Y) calculate on the AB of side " influence power that A is transmitted to B ":
Wherein Y is the user being affected, and p is a forwarding constant, and N (Y) indicates that the adjoining point set of Y, wt (Y, X) are side YX Weight.
Then new pushRank value and last time pushRank value is (last if the newly-established side AB PushRank value is 0) to be compared, and rank push request is converted into if difference is more than threshold value, and be packaged into an iteration Data-message is sent to this operator C, message format shaped like: (field1, field2, Num, max Num), wherein field1 is to use Family point (B in such as this example), field2 are point weight modification value, i.e. pushRank, Num are 0 (because being that first round iteration disappears Breath), max Num is the most bull wheel number of setting.
Iteration operator C is receiving iterative data message, and message is put into iterative processing queue and waits for iterative processing. In iterative calculation can according in message field1 and field2 find corresponding point, modify its weight (such as B), it is then right Calculating separately new pushRank value in each abutment points of this point, (B is equivalent to the Y in formula at this time, and each abutment points are X), similarly, for the pushRank more than threshold value, generate rank push and request simultaneously to be packaged as an iterative data message, together The Num of stylish iterative message adds one, and judges whether to be more than max Num, and just this message is sent if be not above To this operator.More than then terminating iteration.
Four, streaming operator B reception pushes away literary (tweet) text, is responsible for segmenting and word data are sent to streaming operator D.
Five, streaming operator D receives word data, counts the frequency of word.
The present invention also provides a kind of stream data processing systems for supporting iterative calculation, including a host machine and multiple Computing machine, host machine are used to be responsible for the condition monitoring and resource allocation of the entire cluster that multiple computing machines are constituted, master control Machine is also used to receive data flow, and is analyzed and handled on dispatching distribution to each computing machine to data flow is received. The stream data processing unit of the support iterative calculation of setting Fig. 4 or embodiment illustrated in fig. 5, computing machine are used inside computing machine In the stream data processing unit that the support for receiving the data flow of host machine dispatching distribution, and executing internal setting iterates to calculate Handle received data stream.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (6)

1. a kind of stream data processing unit for supporting iterative calculation characterized by comprising
At least one input adaptation and multiple processing nodes, the processing node include streaming operator and iterative operator, described Between input adaptation and the streaming operator or iterative operator by directed edge connect, the streaming operator with it is described iterative It is connected between operator or the streaming operator by directed edge;
The input adaptation is used for: receiving data flow, and received data flow is encapsulated as streaming according to preset assembly strategy Connected processing node is sent to after data-message;
The streaming operator is used for:
The stream data message received is put into the first processing queue, is called at preset first Stream Processing function Reason will be sent to connected processing node or output after the stream data message generated after processing encapsulation;
The iterative operator is used for:
When receiving stream data message, the stream data message received is put into second processing queue, is called default The second Stream Processing function handled, will the stream data message that generated after processing encapsulation after be sent to connected processing Node or output, and/or, the iterative data message generated after processing is sent to itself or iterative calculation with itself parallelization Son;
When receiving iterative data message, the iterative data message received is put into third processing queue, is called default Iterative processing function handled;
Wherein, the stream data message includes pending data, and the iterative data message includes pending data, iteration wheel Several and greatest iteration wheel number;
The iterative operator is also used to:
After calling preset iterative processing function to be handled, judge whether to generate new iterative data message, if generating, Iteration wheel number in the new iterative data message is added one, is judging that the iteration wheel number after an operation is added to change greater than maximum Terminate iteration when generation wheel number, otherwise will add the new iterative data message after an operation be sent to itself or with itself parallelization Iterative operator.
2. the stream data processing unit according to claim 1 for supporting iterative calculation, which is characterized in that the processing section Point further include:
Gathering operator, the gathering operator are used for:
The stream data message received is put into fourth process queue, judges whether that having received all and gathering calculates The stream data message that the upstream processing node of son connection is sent, if then calling preset third Stream Processing function to caching All stream data message handled, will the stream data message that generated after processing encapsulation after be sent to connected processing Node or output, if otherwise continuing waiting for receiving new stream data message.
3. the stream data processing unit according to claim 2 for supporting iterative calculation, which is characterized in that the gathering Operator is also used to:
Before the stream data message that will be received is put into fourth process queue, the stream data message received is carried out Pretreatment and merging.
4. the stream data processing unit according to claim 3 for supporting iterative calculation, which is characterized in that described iterative Operator is also used to:
Before the stream data message that will be received is put into second processing queue, the stream data message received is carried out Pretreatment and merging, alternatively,
Before the iterative data message that will be received is put into third processing queue, the iterative data message received is carried out Pretreatment and merging.
5. the stream data processing unit according to claim 1 for supporting iterative calculation, which is characterized in that the streaming number According to the message format of message are as follows: (f1, f2 ... ... fN), wherein fX indicates the content of X field, altogether N number of field;
The message format of the iterative data message are as follows: (f1, f2 ... ... fN, Num, Max Num), wherein Num and Max Num It is iteration wheel number and greatest iteration wheel number respectively.
6. a kind of stream data processing system for supporting iterative calculation, which is characterized in that including a host machine and multiple meters Machine is calculated, the host machine is used to be responsible for the condition monitoring and resource point for the entire cluster that the multiple computing machine is constituted Match, the host machine is also used to receive data flow, and data flow analyze and dispatching distribution is to each calculating to receiving It is handled on machine;
The described in any item stream datas for supporting iterative calculation of setting Claims 1 to 5 are handled single inside the computing machine Member, the computing machine are used to receive the data flow of the host machine dispatching distribution, and execute the support iteration of internal setting The stream data processing unit processes received data stream of calculating.
CN201510664968.9A 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation Active CN105354089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510664968.9A CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510664968.9A CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Publications (2)

Publication Number Publication Date
CN105354089A CN105354089A (en) 2016-02-24
CN105354089B true CN105354089B (en) 2019-02-01

Family

ID=55330063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510664968.9A Active CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Country Status (1)

Country Link
CN (1) CN105354089B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107819693B (en) * 2016-09-12 2019-05-07 北京百度网讯科技有限公司 Data flow processing method and device for data flow system
CN108270805B (en) * 2016-12-30 2021-03-05 中国移动通信集团河北有限公司 Resource allocation method and device for data processing
CN107463595A (en) * 2017-05-12 2017-12-12 中国科学院信息工程研究所 A kind of data processing method and system based on Spark
CN109714222A (en) * 2017-10-26 2019-05-03 创盛视联数码科技(北京)有限公司 The distributed computer monitoring system and its monitoring method of High Availabitity
CN110990059B (en) * 2019-11-28 2021-11-19 中国科学院计算技术研究所 Stream type calculation engine operation method and system for tilt data
CN113127182A (en) * 2019-12-30 2021-07-16 中国移动通信集团上海有限公司 Deep learning scheduling configuration system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890649B2 (en) * 2006-05-04 2011-02-15 International Business Machines Corporation System and method for scalable processing of multi-way data stream correlations
CN103699442A (en) * 2013-12-12 2014-04-02 深圳先进技术研究院 Iterable data processing method under MapReduce calculation framework
CN104267939A (en) * 2014-09-17 2015-01-07 华为技术有限公司 Business processing method, device and system
CN104504143A (en) * 2015-01-04 2015-04-08 华为技术有限公司 Flow graph optimizing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890649B2 (en) * 2006-05-04 2011-02-15 International Business Machines Corporation System and method for scalable processing of multi-way data stream correlations
CN103699442A (en) * 2013-12-12 2014-04-02 深圳先进技术研究院 Iterable data processing method under MapReduce calculation framework
CN104267939A (en) * 2014-09-17 2015-01-07 华为技术有限公司 Business processing method, device and system
CN104504143A (en) * 2015-01-04 2015-04-08 华为技术有限公司 Flow graph optimizing method and device

Also Published As

Publication number Publication date
CN105354089A (en) 2016-02-24

Similar Documents

Publication Publication Date Title
CN105354089B (en) Support the stream data processing unit and system of iterative calculation
US11057313B2 (en) Event processing with enhanced throughput
US10592282B2 (en) Providing strong ordering in multi-stage streaming processing
Ben-Nun et al. Groute: An asynchronous multi-GPU programming model for irregular computations
Tantalaki et al. A review on big data real-time stream processing and its scheduling techniques
US9934276B2 (en) Systems and methods for fault tolerant, adaptive execution of arbitrary queries at low latency
Zeitler et al. Massive scale-out of expensive continuous queries
Buddhika et al. Neptune: Real time stream processing for internet of things and sensing environments
Chauhan et al. Performance evaluation of Yahoo! S4: A first look
WO2016177279A1 (en) Data processing method and system
US10255049B2 (en) Non-blocking application object framework and dependency model management
WO2016041126A1 (en) Method and device for processing data stream based on gpu
US10459760B2 (en) Optimizing job execution in parallel processing with improved job scheduling using job currency hints
CN102222108B (en) Scripting method and device
Şahin et al. C-stream: a co-routine-based elastic stream processing engine
Li et al. Hone: Mitigating stragglers in distributed stream processing with tuple scheduling
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
Sax et al. Performance optimization for distributed intra-node-parallel streaming systems
CN109685375A (en) A kind of business risk regulation engine operation method based on semi-structured text data
Wang et al. Sublinear algorithms for big data applications
Saranu et al. Intensified scheduling algorithm for virtual machine tasks in cloud computing
Sventek et al. Unification of publish/subscribe systems and stream databases: the impact on complex event processing
Yang et al. Parameter communication consistency model for large-scale security monitoring based on mobile computing
Zhang et al. A dataflow optimisation mechanism for service–oriented cloud workflow
CN107911484A (en) A kind of method and device of Message Processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230802

Address after: 100043 No.291, commercial building, 2nd floor, building 1, jianxiyuan Zhongli, Haidian District, Beijing

Patentee after: Beijing Ruihang Zhizhen Technology Co.,Ltd.

Address before: 100191 box 7-28, Beijing University of Aeronautics and Astronautics, Haidian District, Beijing

Patentee before: BEIHANG University

TR01 Transfer of patent right