CN113360261A - System, method, medium, and apparatus for processing data blocking in stream processing - Google Patents

System, method, medium, and apparatus for processing data blocking in stream processing Download PDF

Info

Publication number
CN113360261A
CN113360261A CN202110625832.2A CN202110625832A CN113360261A CN 113360261 A CN113360261 A CN 113360261A CN 202110625832 A CN202110625832 A CN 202110625832A CN 113360261 A CN113360261 A CN 113360261A
Authority
CN
China
Prior art keywords
module
data
processor
processing
processor module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625832.2A
Other languages
Chinese (zh)
Inventor
杜匡俊
蔡晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Netis Technologies Co ltd
Original Assignee
Shanghai Netis Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Netis Technologies Co ltd filed Critical Shanghai Netis Technologies Co ltd
Priority to CN202110625832.2A priority Critical patent/CN113360261A/en
Publication of CN113360261A publication Critical patent/CN113360261A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution

Abstract

The invention provides a processing system, a method, a medium and a device for data blocking in stream processing, which comprises the following steps: a dispatcher module: receiving a data stream and distributing the data stream to a processor module; the processor module: calculating the data according to preset calculation logic; an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module; a monitor module: monitoring the states of all processor modules, and performing exception recovery on the processor modules judged to be abnormal; a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt. According to the invention, the system is subjected to abnormal recovery at the minimum cost by restarting the processor module, and only the data causing the blockage is discarded, so that the cost of the abnormal recovery is greatly reduced.

Description

System, method, medium, and apparatus for processing data blocking in stream processing
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a system, a method, a medium, and a device for processing data congestion in stream processing.
Background
Stream processing is the act of continually merging new data to compute a result. In the streaming process, input data is not limited and there is no predetermined start or end. The data producer writes the data record into the ordered data stream, and the data processor needs to read data from the data stream to perform conversion operation and output the processed results to the data stream of the consumer according to the input sequence for the next stage of processing. Due to the real-time property and the expandability of the stream processing, the stream processing has a large number of application scenarios in the fields of network monitoring, risk management, fraud identification, algorithm transaction and the like.
To improve processing performance, a stream processing system is usually designed to perform concurrent processing by using the MapReduce algorithm. A common stream processing implementation is shown in fig. 1:
1) orderly infinite data streams sequentially enter the dispatcher component;
2) the dispatcher component branches the data to a plurality of processors awaiting processing;
3) the processor assembly acquires data from the queue to be processed and carries out predefined conversion operation;
4) the processor component sends the converted result to the aggregator component;
5) the aggregator component collects and sorts the results generated by the processors and outputs the results to the consumers in the input order.
Since the input data order needs to be consistent with the output data order, the aggregator needs to wait for all data to arrive before sending the result data to the consumer. At this time, there may be an exception occurring in a certain processor component or an illegal data input, so that the result data cannot be generated all the time, and thus the whole pipeline may be blocked, which may cause an abnormal stop of the whole system. Often such problems can only be avoided by adequate testing at the development stage or by rigorous checking of the input data. But as the complexity of the business scenario gets higher and higher, the difficulty of avoiding pipeline blocking during the development phase becomes higher and higher.
Patent document CN1553671A (application number: CN03131801.0) discloses a method for non-blocking transmission of user data in an inverse multiplexing process, comprising a transmission processing step and a reception processing step, wherein the transmission processing step comprises: extracting user cells carried by ICP cells according to VPI and VCI values; determining the number of segments for splitting one cell according to the number of links in the group, and loading the segments into ICP cells; sending the ICP cell; the receiving processing step includes: extracting the ICP cell; extracting user data from the ICP cell, filtering and storing the extracted user data; and restoring the filtered data into user cells.
Disclosure of Invention
In view of the defects in the prior art, the present invention aims to provide a processing system, a method, a medium and a device for data blocking in stream processing.
The processing system for data blocking in stream processing provided by the invention comprises:
a dispatcher module: receiving data flow, and distributing the data to the processor module in a polling distribution mode or a mode of distribution according to the characteristic value;
the processor module: receiving data provided by a dispatcher, calculating according to preset calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module;
a monitor module: monitoring the states of all processor modules, regularly inquiring the tasks currently processed by the processor modules, recording the start time of the tasks and comparing the start time with the current system time, if the processing time of the tasks exceeds a preset threshold, judging the corresponding processor modules to be abnormal, and entering a blocking recovery flow to perform abnormal recovery;
a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt.
The method for processing data blockage in stream processing provided by the invention comprises the following steps:
step 1: starting the monitor module when the system is started, inquiring the states of all processor modules according to a preset time interval and detecting whether a blocked processor module exists or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed or not;
and 4, step 4: judging whether the processor module is abnormal or not;
and 5: and performing exception recovery on the exception handler module.
Preferably, the step 3 comprises: if the task ID changes, updating the task ID which is recorded in the monitor module and is currently processed by the processor module and recording the initial event of the task;
and if the task ID is consistent with the task ID obtained by the last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time, and taking the difference value as the processing time of the task.
Preferably, the step 4 comprises: if the processing time is greater than or equal to a preset threshold value, judging the corresponding processor module to be abnormal; otherwise, the monitor module enters sleep to wait for being awakened next time.
Preferably, the step 5 comprises:
and 5.1, when the processor module is judged to be abnormal, stopping the operation of the processor module, avoiding the new data stream from entering the processor for processing, and simultaneously dispatching the input data stream to other processor modules for processing.
And 5.2, constructing null data of the task causing overtime, sending the null data to the aggregator module, and ensuring that the input data sequence is consistent with the output data sequence after the processor module is abnormally recovered.
And 5.3, re-establishing the processor module according to the preset computing logic, and acquiring and loading the state pipe module in the stopped processor module after the processor module is established, wherein the state pipe module is used for recovering the computing logic and the data state before stopping.
And 5.4, binding the processor module and the dispatcher module, starting the processor module, replacing the stopped abnormal processor module with the newly created processor module, and then starting the newly created processor module to recover processing the data stream.
According to the present invention, a computer-readable storage medium storing a computer program is provided, wherein the computer program is configured to implement the steps of the method described above when executed by a processor.
According to the invention, the processing device for data blocking in stream processing comprises: a controller;
the controller includes the computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of a method of handling data blocking in the stream processing; alternatively, the controller comprises a processing system for data blocking in the stream processing.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention solves the problem that the system is unavailable due to the blocking of the stream processing program caused by program coding error or illegal data input in the production environment;
(2) according to the method, the processor module is restarted to perform exception recovery on the system at the minimum cost, the conventional exception recovery method is the restart of the system, a large amount of data can be discarded due to the existence of cache and historical states in the system, and the new method only discards the data causing the blockage, so that the exception recovery cost is greatly reduced.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram of a stream processing model;
FIG. 2 is a diagram of a blocking self-healing flow processing model;
fig. 3 is a flow chart of congestion recovery.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Examples
Referring to fig. 2, the system for processing data blocking in stream processing according to the present invention includes:
a dispatcher module: receiving a data stream, and distributing the data to a processor module according to a user-defined distribution algorithm, wherein the distribution algorithm is generally a polling distribution algorithm or a distribution algorithm according to a characteristic value and the like;
the processor module: receiving data provided by a dispatcher, calculating according to user-defined calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, sorting according to the sequence of data entering a distributor, and outputting to a subsequent link;
a monitor module: the system comprises a processor module, a blocking recovery flow and a data processing flow, wherein the processor module is used for monitoring the states of all processor modules, regularly inquiring task IDs currently processed by the processor modules, recording the starting time of tasks and comparing the starting time with the current system time, judging the processor modules to be abnormal if the processing time of the tasks exceeds a threshold value, and entering the blocking recovery flow for abnormal recovery;
a state management module: and storing the running state of the processor module, and separating the computational logic and the data state of the processor module in the original model. In the new method, the processor module only carries out calculation and does not reserve the calculation state. The state is handed to the state management module for management. When the processor module is re-created, the state management module is retained and loaded by a new processor module after the processor module is re-established. Since for a stream processing system the state of the processor changes with the processing of the incoming data stream, the output result is also affected by the state of the processor. Therefore, when the processor module is rebuilt, the state in the processor needs to be able to be restored after the processor module is rebuilt, so as to ensure the system is normal. For example, an accumulator is implemented, with the input being a sequence of random numbers and the output being the current running total. The processor needs to maintain a current running total as a state, and each time data is input, it needs to sum the input value and the running total and output the current running total. When an exception occurs, it is necessary to ensure that the running total is able to recover after the exception, otherwise the totalizer is reset.
Referring to fig. 3, the method for recovering data blocking in stream processing includes the following steps:
step 1: starting the monitor module when the system is started, wherein the monitor module starts a timing task, inquires the states of all processor modules at regular intervals and detects whether the processor modules are blocked or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed: if the change occurs, updating the ID of the current processing task of the processor module recorded in the monitor module and recording the initial event of the task; if the task ID is consistent with the task ID obtained by last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time as the processing time of the task, and if the processing time is greater than or equal to the set task timeout threshold, judging the processor module to be timeout;
and 4, step 4: judging whether the processor module is overtime: if the processor module is not overtime, the monitor module enters sleep to wait for being awakened next time; entering a processor recovery process if the processor module is identified as timeout;
and 5: starting a processor exception recovery process, and recovering the exception processor module identified in the step 4;
step 5.1: stopping the overtime processor module to avoid new data streams from entering the processor for processing, and scheduling subsequent input data streams to other processor modules for processing through stopping operation;
step 5.2: constructing null data of a task ID causing timeout and sending the null data to an aggregator module, wherein a stream handler must ensure that the order of input data and the order of output data are consistent and cannot be lost, so when an exception handler is recovered, a task being processed by the handler is discarded, and therefore a null data containing the task ID must be constructed and sent to the aggregator module, so that the data stream can be ensured to be processed continuously instead of waiting for the discarded result of the exception data;
step 5.3: newly building a processor module, re-building the processor module according to predefined calculation logic, acquiring a state manager from the stopped processor module before after successful building, and loading the state manager for recovering the previous calculation state so as to ensure that the calculation result of the subsequent data stream is correct;
step 5.4: and binding the processor to the dispatcher, starting the processor, replacing the originally stopped abnormal processor module with the newly created processor module, and starting the newly created processor to resume processing the data stream after successful replacement.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A system for handling data blocking in stream processing, comprising:
a dispatcher module: receiving data flow, and distributing the data to the processor module in a polling distribution mode or a mode of distribution according to the characteristic value;
the processor module: receiving data provided by a dispatcher, calculating according to preset calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module;
a monitor module: monitoring the states of all processor modules, regularly inquiring the tasks currently processed by the processor modules, recording the start time of the tasks and comparing the start time with the current system time, if the processing time of the tasks exceeds a preset threshold, judging the corresponding processor modules to be abnormal, and entering a blocking recovery flow to perform abnormal recovery;
a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt.
2. A method for processing data blocking in stream processing, which is characterized in that the system for processing data blocking in stream processing according to claim 1 is adopted, and comprises the following steps:
step 1: starting the monitor module when the system is started, inquiring the states of all processor modules according to a preset time interval and detecting whether a blocked processor module exists or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed or not;
and 4, step 4: judging whether the processor module is abnormal or not;
and 5: and performing exception recovery on the exception handler module.
3. The method for processing data blocking in stream processing according to claim 2, wherein the step 3 comprises: if the task ID changes, updating the task ID which is recorded in the monitor module and is currently processed by the processor module and recording the initial event of the task;
and if the task ID is consistent with the task ID obtained by the last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time, and taking the difference value as the processing time of the task.
4. The method for processing data blocking in stream processing according to claim 3, wherein the step 4 comprises: if the processing time is greater than or equal to a preset threshold value, judging the corresponding processor module to be abnormal; otherwise, the monitor module enters sleep to wait for being awakened next time.
5. The method for processing data blocking in stream processing according to claim 2, wherein the step 5 comprises: and 5.1, when the processor module is judged to be abnormal, stopping the operation of the processor module, avoiding the new data stream from entering the processor for processing, and simultaneously dispatching the input data stream to other processor modules for processing.
6. The method for processing data blocking in stream processing according to claim 5, wherein the step 5 comprises: and 5.2, constructing null data of the task causing overtime, sending the null data to the aggregator module, and ensuring that the input data sequence is consistent with the output data sequence after the processor module is abnormally recovered.
7. The method for processing data blocking in stream processing according to claim 6, wherein said step 5 comprises: and 5.3, re-establishing the processor module according to the preset computing logic, and acquiring and loading the state pipe module in the stopped processor module after the processor module is established, wherein the state pipe module is used for recovering the computing logic and the data state before stopping.
8. The method for processing data blocking in stream processing according to claim 7, wherein the step 5 comprises: and 5.4, binding the processor module and the dispatcher module, starting the processor module, replacing the stopped abnormal processor module with the newly created processor module, and then starting the newly created processor module to recover processing the data stream.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 2 to 8.
10. A processing device for data blocking in stream processing, comprising: a controller;
the controller comprises the computer-readable storage medium of claim 9 storing a computer program which, when executed by a processor, implements the steps of the method of handling data blocking in stream processing of any one of claims 2 to 8; alternatively, the controller comprises the processing system of data blocking in stream processing of claim 1.
CN202110625832.2A 2021-06-04 2021-06-04 System, method, medium, and apparatus for processing data blocking in stream processing Pending CN113360261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625832.2A CN113360261A (en) 2021-06-04 2021-06-04 System, method, medium, and apparatus for processing data blocking in stream processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625832.2A CN113360261A (en) 2021-06-04 2021-06-04 System, method, medium, and apparatus for processing data blocking in stream processing

Publications (1)

Publication Number Publication Date
CN113360261A true CN113360261A (en) 2021-09-07

Family

ID=77532359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625832.2A Pending CN113360261A (en) 2021-06-04 2021-06-04 System, method, medium, and apparatus for processing data blocking in stream processing

Country Status (1)

Country Link
CN (1) CN113360261A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104320382A (en) * 2014-09-30 2015-01-28 华为技术有限公司 Distributive real-time stream processing device, method and unit
CN105652232A (en) * 2015-12-30 2016-06-08 国家电网公司 Stream processing-based electric energy metering device online abnormality diagnosis method and system
CN108629016A (en) * 2018-05-08 2018-10-09 成都信息工程大学 Support real-time stream calculation towards big data database control system, computer program
CN109753385A (en) * 2019-01-14 2019-05-14 重庆邮电大学 A kind of restoration methods and system towards the monitoring of stream calculation system exception
CN110502576A (en) * 2019-08-12 2019-11-26 北京迈格威科技有限公司 Data integration method, distributed computational nodes and distributed deep learning training system
CN111158960A (en) * 2019-12-31 2020-05-15 北京讯鸟软件有限公司 Distributed abnormal data processing method and device based on memory

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104320382A (en) * 2014-09-30 2015-01-28 华为技术有限公司 Distributive real-time stream processing device, method and unit
CN105652232A (en) * 2015-12-30 2016-06-08 国家电网公司 Stream processing-based electric energy metering device online abnormality diagnosis method and system
CN108629016A (en) * 2018-05-08 2018-10-09 成都信息工程大学 Support real-time stream calculation towards big data database control system, computer program
CN109753385A (en) * 2019-01-14 2019-05-14 重庆邮电大学 A kind of restoration methods and system towards the monitoring of stream calculation system exception
CN110502576A (en) * 2019-08-12 2019-11-26 北京迈格威科技有限公司 Data integration method, distributed computational nodes and distributed deep learning training system
CN111158960A (en) * 2019-12-31 2020-05-15 北京讯鸟软件有限公司 Distributed abnormal data processing method and device based on memory

Similar Documents

Publication Publication Date Title
CN106802826B (en) Service processing method and device based on thread pool
AU2012217636B2 (en) Restarting data processing systems
CN110231995B (en) Task scheduling method, device and storage medium based on Actor model
CN108089915B (en) Method and system for business control processing based on message queue
CN103370693A (en) Restarting processes
US10505881B2 (en) Generating message envelopes for heterogeneous events
CN110611707B (en) Task scheduling method and device
CN111045810A (en) Task scheduling processing method and device
CN115297124B (en) System operation and maintenance management method and device and electronic equipment
US20040133680A1 (en) System and method for processing hardware or service usage data
CN115328741A (en) Exception handling method, device, equipment and storage medium
CN113297009B (en) Information backup method, equipment, platform and storage medium
CN110650164B (en) File uploading method and device, terminal and computer storage medium
CN114168291A (en) Main thread stuck processing method, device, equipment and storage medium
CN114090297A (en) Service message processing method and related device
CN112764959B (en) Method, device, equipment and storage medium for monitoring application program locking problem
CN110333916A (en) Request message processing method, device, computer system and readable storage medium storing program for executing
CN113360261A (en) System, method, medium, and apparatus for processing data blocking in stream processing
CN106170013B (en) A kind of Kafka message uniqueness method based on Redis
CN110287159B (en) File processing method and device
CN111831408A (en) Asynchronous task processing method and device, electronic equipment and medium
JP2008102778A (en) Information processor, control method of information processor and program
CN116680055A (en) Asynchronous task processing method and device, computer equipment and storage medium
CN107958414B (en) Method and system for eliminating long transactions of CICS (common integrated circuit chip) system
CN102221995A (en) Break restoration method of seismic data processing work

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907

RJ01 Rejection of invention patent application after publication