CN105354089A - Streaming data processing model and system supporting iterative calculation - Google Patents

Streaming data processing model and system supporting iterative calculation Download PDF

Info

Publication number
CN105354089A
CN105354089A CN201510664968.9A CN201510664968A CN105354089A CN 105354089 A CN105354089 A CN 105354089A CN 201510664968 A CN201510664968 A CN 201510664968A CN 105354089 A CN105354089 A CN 105354089A
Authority
CN
China
Prior art keywords
iterative
operator
stream
stream data
data message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510664968.9A
Other languages
Chinese (zh)
Other versions
CN105354089B (en
Inventor
林学练
申阳
王家兴
马帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruihang Zhizhen Technology Co ltd
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510664968.9A priority Critical patent/CN105354089B/en
Publication of CN105354089A publication Critical patent/CN105354089A/en
Application granted granted Critical
Publication of CN105354089B publication Critical patent/CN105354089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provides a streaming data processing model and system supporting iterative calculation. The processing model comprises at least one input adaptation and a plurality of processing nodes, wherein each processing node comprises a streaming operator and an iterative operator; the input adaptation is connected with the streaming operator or the iterative operator through a directed edge; and the streaming operator is connected with the iterative operator or the streaming operator through the directed edge. The streaming data processing model and system supporting the iterative calculation, can meet the requirements of basic iterative calculation in streaming data processing, and solve the iterative calculation problem in the streaming data processing.

Description

Support stream data transaction module and the system of iterative computation
Technical field
The present invention relates to Parallel and distributed computation technical field, particularly relate to a kind of stream data transaction module and the system of supporting iterative computation.
Background technology
At present along with the technical development of large data processing, be that the batch data disposal system of representative can not meet all application demands gradually with Hadoop, especially higher processing capability in real time is mostly required in the application in the fields such as financial product transaction, internet information process, to tackle the data stream continuing to produce, therefore some stream data disposal systems are widely used.
Stream data is the data sequence with time sequencing, can be seen as the union of historical data and ever-increasing more new data.Stream data disposal system does not generally rely on external memory, but calculates in internal memory, better ageing to obtain.Existing stream data disposal system has Storm, S4, Timestream etc., Fig. 1 is the stream data transaction module schematic diagram in existing Storm system, stream data transaction module is as shown in Figure 1 the directed acyclic graph (DirectedAcyclicGraph be made up of a series of processing node S and processing node B, hereinafter referred to as: DAG), the association between processing node S and processing node B is realized by data stream, processing node S is responsible for reading data incessantly from external data source, and send to corresponding processing node B with the form of data tuple (Tuple), the data stream that processing node B is responsible for receiving calculates, realize filtering, polymerization, the concrete functions such as inquiry, can cascade, also the data stream after calculating can outwards be sent.Fig. 2 is the stream data transaction module schematic diagram in existing S4 system, stream data transaction module be as shown in Figure 2 by multiple processing unit PE (processingelement) between the DAG of logical constitution, PE is the basic computational ele-ment in S4 system, in S4 system, data stream is made up of the ordered sequence of event, event is calculated in each PE, flows between PE, obtains final data stream at PE8.Stream data transaction module in existing Timestream system is also that the DAG be made up of multiple processing node realizes, Fig. 3 is the schematic diagram of a processing node in the stream data transaction module in existing Timestream system, each processing node v in data stream DAG, after acquisition input traffic i, trigger associative operation f v, produce new data stream o, and the state of update process node v.
Can find out, the disposal system that above-mentioned three kinds of stream data transaction modules are formed is when processing stream data, general is all after stream data arrives processing node, processing node exports result of calculation directly carry out the real-time calculating of data in internal memory after, and the demand for real-time can well be met.But in actual applications, there is the demand of iterative computation in the process for stream data, the application of page rank algorithm (PageRank) or the calculating of user's hierarchical algorithms (TunkRank) user force such as may be there is in the short text process application of microblog data, the iterative computation such as real-time shortest path planning are had in real-time traffic application, although existing above-mentioned stream data disposal system meets real-time demand, the iterative computation problem in stream data process can not be realized.
Summary of the invention
The invention provides a kind of stream data transaction module and the system of supporting iterative computation, to solve the iterative computation problem in stream data process.
First aspect, the invention provides a kind of stream data transaction module supporting iterative computation, comprising:
At least one input adaptation and multiple processing node, described processing node comprises streaming operator and iterative operator, described input adaptation is connected by directed edge with between described streaming operator or iterative operator, and described streaming operator is connected by directed edge with between described iterative operator or described streaming operator;
Described input adaptation is used for: receiving data stream, and sends to connected processing node after the data stream of reception is encapsulated as stream data message according to the assembly strategy preset;
Described streaming operator is used for:
The stream data message received is put into the first processing queue, calls the first default Stream Processing function and process, send to connected processing node or output by after the stream data message encapsulation produced after process;
Described iterative operator is used for:
When receiving stream data message, the stream data message received is put into the second processing queue, call the second default Stream Processing function to process, connected processing node or output is sent to by after the stream data message encapsulation produced after process, and/or, the iterative data message that produces after process is sent to self or with the iterative operator of self parallelization;
When receiving iterative data message, the iterative data message received being put into the 3rd processing queue, calling default iterative processing function and processing;
Wherein, described stream data message comprises pending data, and described iterative data message comprises pending data, iteration wheel number and greatest iteration wheel number.
Further, described iterative operator also for:
Calling after default iterative processing function processes, judge whether to produce new iterative data message, if produce, then the iteration wheel number in described new iterative data message is added one, the finishing iteration when judging that the wheel of the iteration after adding operation number is greater than greatest iteration wheel number, otherwise the new iterative data message after adding an operation is sent to self or with the iterative operator of self parallelization.
Further, described processing node also comprises:
Gathering operator, described gathering operator is used for:
The stream data message received is put into the 4th processing queue, judge whether to have received the stream data message that upstream processing node that all and described gathering operator is connected sends, if then call all stream data message of the 3rd default Stream Processing function to buffer memory to process, send to connected processing node or output by after the stream data message encapsulation produced after process, then continue the new stream data message of wait-receiving mode if not.
Further, described gathering operator also for:
Before the stream data received message is put into the 4th processing queue, pre-service and merging are carried out to the stream data message received.
Further, described iterative operator also for:
Before the stream data received message is put into the second processing queue, pre-service and merging are carried out to the stream data message received, or,
Before the iterative data received message is put into the 3rd processing queue, pre-service and merging are carried out to the iterative data message received.
Further, the message format of described stream data message is: (f1, f2 ... fN), wherein fX represents the content of X field, altogether N number of field;
The message format of described iterative data message is: (f1, f2 ... fN, Num, MaxNum), wherein Num and MaxNum is iteration wheel number and greatest iteration wheel number respectively.
Second aspect, the invention provides a kind of stream data disposal system supporting iterative computation, comprise a host machine and multiple computing machine, described host machine is used for condition monitoring and the Resourse Distribute of being responsible for the whole cluster that described multiple computing machine is formed, described host machine also for receiving data stream, and carries out analysis dispatching distribution process to each computing machine receiving data stream;
Described computing machine inside arranges the stream data transaction module of the support iterative computation described in first aspect, described computing machine is for receiving the data stream of described host machine dispatching distribution, and the data stream that the stream data transaction module process performing the inner support iterative computation arranged receives.
The stream data transaction module of support iterative computation provided by the invention and system, by comprising input adaptation and streaming operator and iterative operator at stream data transaction module, streaming operator performs streaming and calculates, iterative operator can to self sending iterative data message, the iterative computation from ring type can be triggered, therefore can realize the basic iterative computation demand in stream data process, solve the iterative computation problem in stream data process.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the stream data transaction module schematic diagram in existing Storm system;
Fig. 2 is the stream data transaction module schematic diagram in existing S4 system;
Fig. 3 is the schematic diagram of a processing node in the stream data transaction module in existing Timestream system;
Fig. 4 is the structural representation that the present invention supports the stream data transaction module embodiment one of iterative computation;
Fig. 5 is the structural representation that the present invention supports the stream data transaction module embodiment two of iterative computation.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with the accompanying drawing in the present invention, the technical scheme in the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 4 is the structural representation that the present invention supports the stream data transaction module embodiment one of iterative computation, as shown in Figure 4, the stream data transaction module of the present embodiment can comprise: at least one input adaptation 11 and multiple processing node, and processing node comprises streaming operator and iterative operator (operator of streaming shown in Fig. 4 12 and iterative operator 13).Streaming operator and iterative operator are two kinds of dissimilar processing nodes.Input adaptation 11 is connected by directed edge with between streaming operator or iterative operator, and streaming operator is connected (input adaptation 11 shown in Fig. 4 with streaming operator 12, be connected between streaming operator 12 with iterative operator 13 by directed edge) with between iterative operator or streaming operator by directed edge.It should be noted that, the structural representation shown in Fig. 1 is only enumerated a kind of structure example wherein.In large data processing model, if when there is multiple input data source, can distinguish by the data type of input source, a kind of input adaptation is designed respectively for each input, if input source data type is identical, design input adaptation (during calculating input adaptation may concurrent be that multiple task distribution is on different disposal node) in logic, if input source data type is different, then designs input adaptation for often kind of input source respectively and process respectively.For same input adaptation (input adaptation that same data type input source is corresponding in logic) in commission by concurrent be that multiple task distribution is on different disposal node, the concurrent quantity of each input adaptation is given by application program, if this value is default be defaulted as 1.
Also it should be noted that, operator of the present invention is the basic computational ele-ment of stream data transaction module, and it is safeguarded an operator state and realizes a computing function.The state of operator is optional, defined by the user, if user preserves application data in internal memory, then operator has state.If user does not preserve application data, then operator is stateless.
Wherein, input adaptation 11 for receiving data stream, and sends to connected processing node after the data stream of reception is encapsulated as stream data message according to the assembly strategy preset.
Streaming operator is used for: the stream data message received is put into the first processing queue, calls the first default Stream Processing function and processes, and sends to connected processing node or output by after the stream data message encapsulation produced after process.
Iterative operator is used for: when receiving stream data message, the stream data message received is put into the second processing queue, call the second default Stream Processing function to process, connected processing node or output is sent to by after the stream data message encapsulation produced after process, and/or, the iterative data message that produces after process is sent to self or with the iterative operator of self parallelization.Other processing nodes after same logic node parallelization are referred to the iterative operator of self parallelization.
When receiving iterative data message, the iterative data message received being put into the 3rd processing queue, calling default iterative processing function and processing.
Wherein, stream data message comprises pending data, and iterative data message comprises pending data, iteration wheel number and greatest iteration wheel number.The message format of such as stream data message is: (f1, f2 ... fN), wherein fX represents the content of X field, altogether N number of field.The message format of iterative data message is: (f1, f2 ... fN, Num, MaxNum), wherein Num and MaxNum is iteration wheel number and greatest iteration wheel number respectively.
Further, iterative operator also for: calling after default iterative processing function processes, judge whether to produce new iterative data message, if produce, then the iteration wheel number in new iterative data message is added one, the finishing iteration when judging that the wheel of the iteration after adding operation number is greater than greatest iteration wheel number, otherwise the new iterative data message after adding an operation is sent to self or with the iterative operator of self parallelization.
Alternatively, processing node also comprises: gathering operator, gathering operator is used for: the stream data message received is put into the 4th processing queue, judge whether to have received the stream data message that all upstream processing nodes be connected with gathering operator send, if then call all stream data message of the 3rd default Stream Processing function to buffer memory to process, send to connected processing node or output by after the stream data message encapsulation produced after process, then continue the new stream data message of wait-receiving mode if not.The upstream processing node be such as connected with gathering operator has multiple, and by increasing gathering operator, gathering operator can the data stream of all upstream nodes of receiving of disposable process in section sometime after process.Therefore the application scenarios of stream data transaction module can be made more extensive.
Preferably, gathering operator also for: before the stream data received message is put into the 4th processing queue, pre-service and merging are carried out to the stream data message received.The execution time of the process of the 3rd Stream Processing function that subsequent calls is preset can be reduced like this, better real-time can be reached.
Preferably, iterative operator also for: before the stream data received message is put into the second processing queue, pre-service and merging are carried out to the stream data message received, or, before the iterative data received message is put into the 3rd processing queue, pre-service and merging are carried out to the iterative data message received.The execution time of the execution time that can reduce the process of the second Stream Processing function that subsequent calls is preset like this or the process of calling default iterative processing function, better real-time can be reached.
It should be noted that, interative computation in the stream data transaction module of support iterative computation provided by the invention is small data iterative computation in batches preferably, if exist for a long time, the processing demands of the iterative computation of big data quantity, the degree of parallelism carrying out large batch of data to split and increase processing node is proposed in the present invention, data are put in multiple parallel processing node and carry out parallelization calculating, before carrying out parallelization calculating, need the division carrying out data according to field Hash (hash) value in data, so that different status datas and iterative message are sent in different parallelization processing nodes.And for long iterative computation, the data volume reducing by an iterative processing is on the one hand proposed in the present invention, control the maximum times of iteration on the other hand.
In the execution of operation, each different classes of operator may by concurrent be multiple calculation task, and be distributed on multiple machine and perform.Each parallel calculation task performs the same operator, but safeguards the data of different subregion respectively, processes the data stream of different subregion.
The stream data transaction module of the support iterative computation that the present embodiment provides, by comprising input adaptation and streaming operator and iterative operator at stream data transaction module, streaming operator performs streaming and calculates, iterative operator can to self sending iterative data message, the iterative computation from ring type can be triggered, therefore can realize the basic iterative computation demand in stream data process, solve the iterative computation problem in stream data process.
Particularly, in the realization of DLL (dynamic link library), in input adaptation, Input () interface need be set to perform outside input, pre-service and the process of transmitting of data stream, need the data stream sent according to the assembly strategy encapsulation preset and can be sent among follow-up operator afterwards.For streaming operator, Stream Processing function interface Execute () interface is set to realize computation process, input traffic can be processed and send new data stream in this Stream Processing function.This basic Stream Processing function also can be implemented and perform in gathering operator and iterative operator, but difference is: in gathering operator, need the trigger timing according to controlling streaming process function, must wait until that all upstream datas arrive rear statistics and carry out processing (calculating).Increase optional Combine () interface can to gathering operator simultaneously, simple pre-service and merging can be carried out for data by Selection utilization Combine () interface when often receiving a data cell, reducing the execution time that follow-up Execute () interface calculates.In iterative operator, first perform Execute () interface and carry out basic interface calculating, iterative computation is triggered afterwards according to certain condition, in each iterative computation, operator can send to data-message and oneself trigger the iteration of next round, until meet iteration termination condition or exceed maximum iteration time.Need to arrange iExecute () interface execution iterative computation to realize iterative computation, and iCondition () interface judges iteration termination condition.Similarly, increase optional pre-service interface iPreprocess () interface also can to iterative operator, select before iteration starts the need of some data prediction of execution and data merging work.
Adopt a specific embodiment below, technical scheme embodiment illustrated in fig. 4 is described in detail.What calculate with user's hierarchical algorithms (TunkRank) user force existed in the short text process application of microblog data is applied as example to be described.User force calculates and employs TunkRank algorithm, and this is the application that a streaming with simple interative computation calculates.Fig. 5 is the structural representation that the present invention supports the stream data transaction module embodiment two of iterative computation, and as shown in Figure 5, I is input adaptation, and A, B, D are streaming operator, and C is iteration operator.The user force comprised in the present embodiment calculates and pushes away word number (Tweetswordcount) and calculates.Process is as follows:
One, outside pushes away literary composition (tweet) data stream input input adaptation I, and input adaptation I is encapsulated as stream data information and sending to follow-up streaming operator A according to the assembly strategy preset after carrying out pre-service (such as screening invalid data etc.) to the data stream received.Wherein, this stream data message only has the field content of a field, i.e.: " (field1) ", wherein field1 is one and qualified pushes away civilian content, form shape is as " userA:sentence#topic#userB ", and wherein userA sends the user pushing away literary composition, and sentence represents and pushes away civilian body matter, topic is topic, and userB represents other users being mentioned (" ").
Two, streaming operator A receives Stream Processing message, puts into processing queue.Processing threads can monitor queue, take out a Stream Processing message in turn and call its Stream Processing function and process, processing procedure is, take out field1 in upstream data and " customer relationship to " content pushing away literary composition and resolve to " userAuserB ", produce new stream data message and have two each and every one field contents, i.e.: " (field1, field2) ", wherein field1 is userA, and field2 is userB.Send to iteration operator C by after this new stream data message encapsulation afterwards, and literary composition (tweet) text will be pushed away send to streaming operator B.
Three, iteration operator C safeguards the data of a customer relationship figure, and this is an oriented authorized graph, and each point represents a user, and the relation between the representative of consumer of every bar limit, its mid point weight table shows the influence power of user, and limit weight represents the relationship strength between user.Point weight be required user force, can this figure of manual queries or timing the modes such as a weights writing in files are inquired about real-time user force result of calculation.
Iteration operator C puts into the pending Stream Processings such as processing queue equally after reception upstream operator A sends to its Stream Processing message, processing procedure is the limit " customer relationship to " data are converted in figure, such as receive (userA, userB) data, then find the A that two users are corresponding, B two points, and revise the weight of limit AB, if AB does not have limit, set up the new AB limit that a weight is 1, if there is AB limit, this limit weight is added one, then according to following formula, (B is equivalent to X wherein, A is equivalent to its Y) calculate on the AB of limit " influence power that A transmits to B ":
p u s h R a n k ( Y , X ) = ( 1 + p * - I ( Y ) ) * w t ( Y , X ) Σ Z ∈ N ( Y ) w t ( Y , Z )
Wherein Y is the user be affected, and p is one and forwards constant, and N (Y) represents the adjacent point set of Y, and wt (Y, X) is the weight of limit YX.
Then new pushRank value and last pushRank value (if newly-established AB limit, last pushRank value is 0) are compared, if difference exceedes threshold value, convert rank to and push request, and be packaged into an iterative data message and send to this operator C, message format shape is as (field1, field2, Num, maxNum), wherein field1 is user's point (B as in this example), field2 is a weight modification value, namely pushRank, Num be 0 (because being first round iterative message), maxNum is the most bull wheel number arranged.
Iteration operator C is receiving iterative data message, message is put into the pending iterative processings such as iterative processing queue.Corresponding point can be found according to field1 and field2 in message in iterative computation, revise its weight (such as B), then each abutment points put for this calculates new pushRank value respectively, and (now B is equivalent to the Y in formula, each abutment points is X), similarly, for the pushRank exceeding threshold value, generate rank propelling movement ask and be packaged as an iterative data message, Num with stylish iterative message adds one, and judge whether to have exceeded maxNum, just this message is sent to this operator if do not exceeded.Exceed then finishing iteration.
Four, streaming operator B receives and pushes away literary composition (tweet) text, is responsible for participle and word data are sent to streaming operator D.
Five, streaming operator D receives word data, the frequency of statistics word.
The present invention also provides a kind of stream data disposal system supporting iterative computation, comprise a host machine and multiple computing machine, host machine is used for condition monitoring and the Resourse Distribute of being responsible for the whole cluster that multiple computing machine is formed, host machine also for receiving data stream, and carries out analysis dispatching distribution process to each computing machine receiving data stream.Computing machine inside arranges the stream data transaction module of Fig. 4 or support iterative computation embodiment illustrated in fig. 5, computing machine is for receiving the data stream of host machine dispatching distribution, and the data stream that the stream data transaction module process performing the inner support iterative computation arranged receives.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (7)

1. support a stream data transaction module for iterative computation, it is characterized in that, comprising:
At least one input adaptation and multiple processing node, described processing node comprises streaming operator and iterative operator, described input adaptation is connected by directed edge with between described streaming operator or iterative operator, and described streaming operator is connected by directed edge with between described iterative operator or described streaming operator;
Described input adaptation is used for: receiving data stream, and sends to connected processing node after the data stream of reception is encapsulated as stream data message according to the assembly strategy preset;
Described streaming operator is used for:
The stream data message received is put into the first processing queue, calls the first default Stream Processing function and process, send to connected processing node or output by after the stream data message encapsulation produced after process;
Described iterative operator is used for:
When receiving stream data message, the stream data message received is put into the second processing queue, call the second default Stream Processing function to process, connected processing node or output is sent to by after the stream data message encapsulation produced after process, and/or, the iterative data message that produces after process is sent to self or with the iterative operator of self parallelization;
When receiving iterative data message, the iterative data message received being put into the 3rd processing queue, calling default iterative processing function and processing;
Wherein, described stream data message comprises pending data, and described iterative data message comprises pending data, iteration wheel number and greatest iteration wheel number.
2. the stream data transaction module of support iterative computation according to claim 1, is characterized in that, described iterative operator also for:
Calling after default iterative processing function processes, judge whether to produce new iterative data message, if produce, then the iteration wheel number in described new iterative data message is added one, the finishing iteration when judging that the wheel of the iteration after adding operation number is greater than greatest iteration wheel number, otherwise the new iterative data message after adding an operation is sent to self or with the iterative operator of self parallelization.
3. the stream data transaction module of support iterative computation according to claim 1 and 2, is characterized in that, described processing node also comprises:
Gathering operator, described gathering operator is used for:
The stream data message received is put into the 4th processing queue, judge whether to have received the stream data message that upstream processing node that all and described gathering operator is connected sends, if then call all stream data message of the 3rd default Stream Processing function to buffer memory to process, send to connected processing node or output by after the stream data message encapsulation produced after process, then continue the new stream data message of wait-receiving mode if not.
4. the stream data transaction module of support iterative computation according to claim 3, is characterized in that, described gathering operator also for:
Before the stream data received message is put into the 4th processing queue, pre-service and merging are carried out to the stream data message received.
5. the stream data transaction module of support iterative computation according to claim 4, is characterized in that, described iterative operator also for:
Before the stream data received message is put into the second processing queue, pre-service and merging are carried out to the stream data message received, or,
Before the iterative data received message is put into the 3rd processing queue, pre-service and merging are carried out to the iterative data message received.
6. the stream data transaction module of support iterative computation according to claim 1, is characterized in that, the message format of described stream data message is: (f1, f2 ... fN), wherein fX represents the content of X field, altogether N number of field;
The message format of described iterative data message is: (f1, f2 ... fN, Num, MaxNum), wherein Num and MaxNum is iteration wheel number and greatest iteration wheel number respectively.
7. support the stream data disposal system of iterative computation for one kind, it is characterized in that, comprise a host machine and multiple computing machine, described host machine is used for condition monitoring and the Resourse Distribute of being responsible for the whole cluster that described multiple computing machine is formed, described host machine also for receiving data stream, and carries out analysis dispatching distribution process to each computing machine receiving data stream;
Described computing machine inside arranges the stream data transaction module of the support iterative computation described in any one of claim 1 ~ 6, described computing machine is for receiving the data stream of described host machine dispatching distribution, and the data stream that the stream data transaction module process performing the inner support iterative computation arranged receives.
CN201510664968.9A 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation Active CN105354089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510664968.9A CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510664968.9A CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Publications (2)

Publication Number Publication Date
CN105354089A true CN105354089A (en) 2016-02-24
CN105354089B CN105354089B (en) 2019-02-01

Family

ID=55330063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510664968.9A Active CN105354089B (en) 2015-10-15 2015-10-15 Support the stream data processing unit and system of iterative calculation

Country Status (1)

Country Link
CN (1) CN105354089B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463595A (en) * 2017-05-12 2017-12-12 中国科学院信息工程研究所 A kind of data processing method and system based on Spark
CN107819693A (en) * 2016-09-12 2018-03-20 北京百度网讯科技有限公司 data flow processing method and device for data flow system
CN108270805A (en) * 2016-12-30 2018-07-10 中国移动通信集团河北有限公司 For the resource allocation methods and device of data processing
CN109714222A (en) * 2017-10-26 2019-05-03 创盛视联数码科技(北京)有限公司 The distributed computer monitoring system and its monitoring method of High Availabitity
CN110990059A (en) * 2019-11-28 2020-04-10 中国科学院计算技术研究所 Stream type calculation engine operation method and system for tilt data
CN113127182A (en) * 2019-12-30 2021-07-16 中国移动通信集团上海有限公司 Deep learning scheduling configuration system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890649B2 (en) * 2006-05-04 2011-02-15 International Business Machines Corporation System and method for scalable processing of multi-way data stream correlations
CN103699442A (en) * 2013-12-12 2014-04-02 深圳先进技术研究院 Iterable data processing method under MapReduce calculation framework
CN104267939A (en) * 2014-09-17 2015-01-07 华为技术有限公司 Business processing method, device and system
CN104504143A (en) * 2015-01-04 2015-04-08 华为技术有限公司 Flow graph optimizing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890649B2 (en) * 2006-05-04 2011-02-15 International Business Machines Corporation System and method for scalable processing of multi-way data stream correlations
CN103699442A (en) * 2013-12-12 2014-04-02 深圳先进技术研究院 Iterable data processing method under MapReduce calculation framework
CN104267939A (en) * 2014-09-17 2015-01-07 华为技术有限公司 Business processing method, device and system
CN104504143A (en) * 2015-01-04 2015-04-08 华为技术有限公司 Flow graph optimizing method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107819693A (en) * 2016-09-12 2018-03-20 北京百度网讯科技有限公司 data flow processing method and device for data flow system
CN107819693B (en) * 2016-09-12 2019-05-07 北京百度网讯科技有限公司 Data flow processing method and device for data flow system
CN108270805A (en) * 2016-12-30 2018-07-10 中国移动通信集团河北有限公司 For the resource allocation methods and device of data processing
CN108270805B (en) * 2016-12-30 2021-03-05 中国移动通信集团河北有限公司 Resource allocation method and device for data processing
CN107463595A (en) * 2017-05-12 2017-12-12 中国科学院信息工程研究所 A kind of data processing method and system based on Spark
CN109714222A (en) * 2017-10-26 2019-05-03 创盛视联数码科技(北京)有限公司 The distributed computer monitoring system and its monitoring method of High Availabitity
CN110990059A (en) * 2019-11-28 2020-04-10 中国科学院计算技术研究所 Stream type calculation engine operation method and system for tilt data
CN110990059B (en) * 2019-11-28 2021-11-19 中国科学院计算技术研究所 Stream type calculation engine operation method and system for tilt data
CN113127182A (en) * 2019-12-30 2021-07-16 中国移动通信集团上海有限公司 Deep learning scheduling configuration system and method

Also Published As

Publication number Publication date
CN105354089B (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN105354089A (en) Streaming data processing model and system supporting iterative calculation
US11580441B2 (en) Model training method and apparatus
Vats et al. Performance evaluation of K-means clustering on Hadoop infrastructure
Hidalgo et al. Self-adaptive processing graph with operator fission for elastic stream processing
US8595732B2 (en) Reducing the response time of flexible highly data parallel task by assigning task sets using dynamic combined longest processing time scheme
CN106202092B (en) Data processing method and system
US9712646B2 (en) Automated client/server operation partitioning
CN108089921A (en) Server for cloud big data operation architecture and operation resource optimization method thereof
CN103942108B (en) Resource parameters optimization method under Hadoop isomorphism cluster
JP6908643B2 (en) Systems and methods for scheduling a collection of non-preemptive tasks in a multi-robot environment
CN106909554B (en) Method and device for loading database text table data
Zhang et al. Hybrid genetic algorithm for a type-II robust mixed-model assembly line balancing problem with interval task times
Almeida et al. Dyno: Dynamic onloading of deep neural networks from cloud to device
CN110019298A (en) Data processing method and device
CN113297188B (en) Data processing method and device
CN106610989B (en) Search keyword clustering method and device
CN117311998B (en) Large model deployment method and system
CN110728372B (en) Cluster design method and cluster system for dynamic loading of artificial intelligent model
CN109783202A (en) Event-handling method, system, equipment and storage medium
Nagarajan et al. Malleable scheduling for flows of jobs and applications to MapReduce
CN111211938B (en) Biological information software monitoring system and method
Chen et al. Using deep learning to predict and optimize hadoop data analytic service in a cloud platform
CN108108235B (en) Task processing method and device
Bengre et al. A learning-based scheduler for high volume processing in data warehouse using graph neural networks
Slimani et al. Hardware Fuzzy Scheduler for Real-Time Independent Tasks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230802

Address after: 100043 No.291, commercial building, 2nd floor, building 1, jianxiyuan Zhongli, Haidian District, Beijing

Patentee after: Beijing Ruihang Zhizhen Technology Co.,Ltd.

Address before: 100191 box 7-28, Beijing University of Aeronautics and Astronautics, Haidian District, Beijing

Patentee before: BEIHANG University