CN113360261A - System, method, medium, and apparatus for processing data blocking in stream processing - Google Patents
System, method, medium, and apparatus for processing data blocking in stream processing Download PDFInfo
- Publication number
- CN113360261A CN113360261A CN202110625832.2A CN202110625832A CN113360261A CN 113360261 A CN113360261 A CN 113360261A CN 202110625832 A CN202110625832 A CN 202110625832A CN 113360261 A CN113360261 A CN 113360261A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- processor
- processing
- processor module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 81
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000000903 blocking effect Effects 0.000 title claims abstract description 30
- 230000002159 abnormal effect Effects 0.000 claims abstract description 19
- 238000011084 recovery Methods 0.000 claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 16
- 238000012544 monitoring process Methods 0.000 claims abstract description 5
- 238000012163 sequencing technique Methods 0.000 claims abstract description 3
- 238000009826 distribution Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
Abstract
The invention provides a processing system, a method, a medium and a device for data blocking in stream processing, which comprises the following steps: a dispatcher module: receiving a data stream and distributing the data stream to a processor module; the processor module: calculating the data according to preset calculation logic; an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module; a monitor module: monitoring the states of all processor modules, and performing exception recovery on the processor modules judged to be abnormal; a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt. According to the invention, the system is subjected to abnormal recovery at the minimum cost by restarting the processor module, and only the data causing the blockage is discarded, so that the cost of the abnormal recovery is greatly reduced.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a system, a method, a medium, and a device for processing data congestion in stream processing.
Background
Stream processing is the act of continually merging new data to compute a result. In the streaming process, input data is not limited and there is no predetermined start or end. The data producer writes the data record into the ordered data stream, and the data processor needs to read data from the data stream to perform conversion operation and output the processed results to the data stream of the consumer according to the input sequence for the next stage of processing. Due to the real-time property and the expandability of the stream processing, the stream processing has a large number of application scenarios in the fields of network monitoring, risk management, fraud identification, algorithm transaction and the like.
To improve processing performance, a stream processing system is usually designed to perform concurrent processing by using the MapReduce algorithm. A common stream processing implementation is shown in fig. 1:
1) orderly infinite data streams sequentially enter the dispatcher component;
2) the dispatcher component branches the data to a plurality of processors awaiting processing;
3) the processor assembly acquires data from the queue to be processed and carries out predefined conversion operation;
4) the processor component sends the converted result to the aggregator component;
5) the aggregator component collects and sorts the results generated by the processors and outputs the results to the consumers in the input order.
Since the input data order needs to be consistent with the output data order, the aggregator needs to wait for all data to arrive before sending the result data to the consumer. At this time, there may be an exception occurring in a certain processor component or an illegal data input, so that the result data cannot be generated all the time, and thus the whole pipeline may be blocked, which may cause an abnormal stop of the whole system. Often such problems can only be avoided by adequate testing at the development stage or by rigorous checking of the input data. But as the complexity of the business scenario gets higher and higher, the difficulty of avoiding pipeline blocking during the development phase becomes higher and higher.
Patent document CN1553671A (application number: CN03131801.0) discloses a method for non-blocking transmission of user data in an inverse multiplexing process, comprising a transmission processing step and a reception processing step, wherein the transmission processing step comprises: extracting user cells carried by ICP cells according to VPI and VCI values; determining the number of segments for splitting one cell according to the number of links in the group, and loading the segments into ICP cells; sending the ICP cell; the receiving processing step includes: extracting the ICP cell; extracting user data from the ICP cell, filtering and storing the extracted user data; and restoring the filtered data into user cells.
Disclosure of Invention
In view of the defects in the prior art, the present invention aims to provide a processing system, a method, a medium and a device for data blocking in stream processing.
The processing system for data blocking in stream processing provided by the invention comprises:
a dispatcher module: receiving data flow, and distributing the data to the processor module in a polling distribution mode or a mode of distribution according to the characteristic value;
the processor module: receiving data provided by a dispatcher, calculating according to preset calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module;
a monitor module: monitoring the states of all processor modules, regularly inquiring the tasks currently processed by the processor modules, recording the start time of the tasks and comparing the start time with the current system time, if the processing time of the tasks exceeds a preset threshold, judging the corresponding processor modules to be abnormal, and entering a blocking recovery flow to perform abnormal recovery;
a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt.
The method for processing data blockage in stream processing provided by the invention comprises the following steps:
step 1: starting the monitor module when the system is started, inquiring the states of all processor modules according to a preset time interval and detecting whether a blocked processor module exists or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed or not;
and 4, step 4: judging whether the processor module is abnormal or not;
and 5: and performing exception recovery on the exception handler module.
Preferably, the step 3 comprises: if the task ID changes, updating the task ID which is recorded in the monitor module and is currently processed by the processor module and recording the initial event of the task;
and if the task ID is consistent with the task ID obtained by the last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time, and taking the difference value as the processing time of the task.
Preferably, the step 4 comprises: if the processing time is greater than or equal to a preset threshold value, judging the corresponding processor module to be abnormal; otherwise, the monitor module enters sleep to wait for being awakened next time.
Preferably, the step 5 comprises:
and 5.1, when the processor module is judged to be abnormal, stopping the operation of the processor module, avoiding the new data stream from entering the processor for processing, and simultaneously dispatching the input data stream to other processor modules for processing.
And 5.2, constructing null data of the task causing overtime, sending the null data to the aggregator module, and ensuring that the input data sequence is consistent with the output data sequence after the processor module is abnormally recovered.
And 5.3, re-establishing the processor module according to the preset computing logic, and acquiring and loading the state pipe module in the stopped processor module after the processor module is established, wherein the state pipe module is used for recovering the computing logic and the data state before stopping.
And 5.4, binding the processor module and the dispatcher module, starting the processor module, replacing the stopped abnormal processor module with the newly created processor module, and then starting the newly created processor module to recover processing the data stream.
According to the present invention, a computer-readable storage medium storing a computer program is provided, wherein the computer program is configured to implement the steps of the method described above when executed by a processor.
According to the invention, the processing device for data blocking in stream processing comprises: a controller;
the controller includes the computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of a method of handling data blocking in the stream processing; alternatively, the controller comprises a processing system for data blocking in the stream processing.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention solves the problem that the system is unavailable due to the blocking of the stream processing program caused by program coding error or illegal data input in the production environment;
(2) according to the method, the processor module is restarted to perform exception recovery on the system at the minimum cost, the conventional exception recovery method is the restart of the system, a large amount of data can be discarded due to the existence of cache and historical states in the system, and the new method only discards the data causing the blockage, so that the exception recovery cost is greatly reduced.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram of a stream processing model;
FIG. 2 is a diagram of a blocking self-healing flow processing model;
fig. 3 is a flow chart of congestion recovery.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Examples
Referring to fig. 2, the system for processing data blocking in stream processing according to the present invention includes:
a dispatcher module: receiving a data stream, and distributing the data to a processor module according to a user-defined distribution algorithm, wherein the distribution algorithm is generally a polling distribution algorithm or a distribution algorithm according to a characteristic value and the like;
the processor module: receiving data provided by a dispatcher, calculating according to user-defined calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, sorting according to the sequence of data entering a distributor, and outputting to a subsequent link;
a monitor module: the system comprises a processor module, a blocking recovery flow and a data processing flow, wherein the processor module is used for monitoring the states of all processor modules, regularly inquiring task IDs currently processed by the processor modules, recording the starting time of tasks and comparing the starting time with the current system time, judging the processor modules to be abnormal if the processing time of the tasks exceeds a threshold value, and entering the blocking recovery flow for abnormal recovery;
a state management module: and storing the running state of the processor module, and separating the computational logic and the data state of the processor module in the original model. In the new method, the processor module only carries out calculation and does not reserve the calculation state. The state is handed to the state management module for management. When the processor module is re-created, the state management module is retained and loaded by a new processor module after the processor module is re-established. Since for a stream processing system the state of the processor changes with the processing of the incoming data stream, the output result is also affected by the state of the processor. Therefore, when the processor module is rebuilt, the state in the processor needs to be able to be restored after the processor module is rebuilt, so as to ensure the system is normal. For example, an accumulator is implemented, with the input being a sequence of random numbers and the output being the current running total. The processor needs to maintain a current running total as a state, and each time data is input, it needs to sum the input value and the running total and output the current running total. When an exception occurs, it is necessary to ensure that the running total is able to recover after the exception, otherwise the totalizer is reset.
Referring to fig. 3, the method for recovering data blocking in stream processing includes the following steps:
step 1: starting the monitor module when the system is started, wherein the monitor module starts a timing task, inquires the states of all processor modules at regular intervals and detects whether the processor modules are blocked or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed: if the change occurs, updating the ID of the current processing task of the processor module recorded in the monitor module and recording the initial event of the task; if the task ID is consistent with the task ID obtained by last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time as the processing time of the task, and if the processing time is greater than or equal to the set task timeout threshold, judging the processor module to be timeout;
and 4, step 4: judging whether the processor module is overtime: if the processor module is not overtime, the monitor module enters sleep to wait for being awakened next time; entering a processor recovery process if the processor module is identified as timeout;
and 5: starting a processor exception recovery process, and recovering the exception processor module identified in the step 4;
step 5.1: stopping the overtime processor module to avoid new data streams from entering the processor for processing, and scheduling subsequent input data streams to other processor modules for processing through stopping operation;
step 5.2: constructing null data of a task ID causing timeout and sending the null data to an aggregator module, wherein a stream handler must ensure that the order of input data and the order of output data are consistent and cannot be lost, so when an exception handler is recovered, a task being processed by the handler is discarded, and therefore a null data containing the task ID must be constructed and sent to the aggregator module, so that the data stream can be ensured to be processed continuously instead of waiting for the discarded result of the exception data;
step 5.3: newly building a processor module, re-building the processor module according to predefined calculation logic, acquiring a state manager from the stopped processor module before after successful building, and loading the state manager for recovering the previous calculation state so as to ensure that the calculation result of the subsequent data stream is correct;
step 5.4: and binding the processor to the dispatcher, starting the processor, replacing the originally stopped abnormal processor module with the newly created processor module, and starting the newly created processor to resume processing the data stream after successful replacement.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A system for handling data blocking in stream processing, comprising:
a dispatcher module: receiving data flow, and distributing the data to the processor module in a polling distribution mode or a mode of distribution according to the characteristic value;
the processor module: receiving data provided by a dispatcher, calculating according to preset calculation logic and sending a calculation result to an aggregator module;
an aggregator module: receiving the calculation results of all processors, and sequencing according to the time sequence of data entering a dispatcher module;
a monitor module: monitoring the states of all processor modules, regularly inquiring the tasks currently processed by the processor modules, recording the start time of the tasks and comparing the start time with the current system time, if the processing time of the tasks exceeds a preset threshold, judging the corresponding processor modules to be abnormal, and entering a blocking recovery flow to perform abnormal recovery;
a state management module: the method comprises the steps of storing the running state of a processor module, separating the computing logic and the data state of the processor module, and recovering the computing logic and the data state after the processor module is rebuilt.
2. A method for processing data blocking in stream processing, which is characterized in that the system for processing data blocking in stream processing according to claim 1 is adopted, and comprises the following steps:
step 1: starting the monitor module when the system is started, inquiring the states of all processor modules according to a preset time interval and detecting whether a blocked processor module exists or not;
step 2: inquiring and recording the task ID being processed by each processor;
and step 3: judging whether the current task ID and the last inquired task ID are changed or not;
and 4, step 4: judging whether the processor module is abnormal or not;
and 5: and performing exception recovery on the exception handler module.
3. The method for processing data blocking in stream processing according to claim 2, wherein the step 3 comprises: if the task ID changes, updating the task ID which is recorded in the monitor module and is currently processed by the processor module and recording the initial event of the task;
and if the task ID is consistent with the task ID obtained by the last query, acquiring the starting time of the task ID, calculating the difference value between the starting time of the task and the current system time, and taking the difference value as the processing time of the task.
4. The method for processing data blocking in stream processing according to claim 3, wherein the step 4 comprises: if the processing time is greater than or equal to a preset threshold value, judging the corresponding processor module to be abnormal; otherwise, the monitor module enters sleep to wait for being awakened next time.
5. The method for processing data blocking in stream processing according to claim 2, wherein the step 5 comprises: and 5.1, when the processor module is judged to be abnormal, stopping the operation of the processor module, avoiding the new data stream from entering the processor for processing, and simultaneously dispatching the input data stream to other processor modules for processing.
6. The method for processing data blocking in stream processing according to claim 5, wherein the step 5 comprises: and 5.2, constructing null data of the task causing overtime, sending the null data to the aggregator module, and ensuring that the input data sequence is consistent with the output data sequence after the processor module is abnormally recovered.
7. The method for processing data blocking in stream processing according to claim 6, wherein said step 5 comprises: and 5.3, re-establishing the processor module according to the preset computing logic, and acquiring and loading the state pipe module in the stopped processor module after the processor module is established, wherein the state pipe module is used for recovering the computing logic and the data state before stopping.
8. The method for processing data blocking in stream processing according to claim 7, wherein the step 5 comprises: and 5.4, binding the processor module and the dispatcher module, starting the processor module, replacing the stopped abnormal processor module with the newly created processor module, and then starting the newly created processor module to recover processing the data stream.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 2 to 8.
10. A processing device for data blocking in stream processing, comprising: a controller;
the controller comprises the computer-readable storage medium of claim 9 storing a computer program which, when executed by a processor, implements the steps of the method of handling data blocking in stream processing of any one of claims 2 to 8; alternatively, the controller comprises the processing system of data blocking in stream processing of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625832.2A CN113360261A (en) | 2021-06-04 | 2021-06-04 | System, method, medium, and apparatus for processing data blocking in stream processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625832.2A CN113360261A (en) | 2021-06-04 | 2021-06-04 | System, method, medium, and apparatus for processing data blocking in stream processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113360261A true CN113360261A (en) | 2021-09-07 |
Family
ID=77532359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110625832.2A Pending CN113360261A (en) | 2021-06-04 | 2021-06-04 | System, method, medium, and apparatus for processing data blocking in stream processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113360261A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320382A (en) * | 2014-09-30 | 2015-01-28 | 华为技术有限公司 | Distributive real-time stream processing device, method and unit |
CN105652232A (en) * | 2015-12-30 | 2016-06-08 | 国家电网公司 | Stream processing-based electric energy metering device online abnormality diagnosis method and system |
CN108629016A (en) * | 2018-05-08 | 2018-10-09 | 成都信息工程大学 | Support real-time stream calculation towards big data database control system, computer program |
CN109753385A (en) * | 2019-01-14 | 2019-05-14 | 重庆邮电大学 | A kind of restoration methods and system towards the monitoring of stream calculation system exception |
CN110502576A (en) * | 2019-08-12 | 2019-11-26 | 北京迈格威科技有限公司 | Data integration method, distributed computational nodes and distributed deep learning training system |
CN111158960A (en) * | 2019-12-31 | 2020-05-15 | 北京讯鸟软件有限公司 | Distributed abnormal data processing method and device based on memory |
-
2021
- 2021-06-04 CN CN202110625832.2A patent/CN113360261A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320382A (en) * | 2014-09-30 | 2015-01-28 | 华为技术有限公司 | Distributive real-time stream processing device, method and unit |
CN105652232A (en) * | 2015-12-30 | 2016-06-08 | 国家电网公司 | Stream processing-based electric energy metering device online abnormality diagnosis method and system |
CN108629016A (en) * | 2018-05-08 | 2018-10-09 | 成都信息工程大学 | Support real-time stream calculation towards big data database control system, computer program |
CN109753385A (en) * | 2019-01-14 | 2019-05-14 | 重庆邮电大学 | A kind of restoration methods and system towards the monitoring of stream calculation system exception |
CN110502576A (en) * | 2019-08-12 | 2019-11-26 | 北京迈格威科技有限公司 | Data integration method, distributed computational nodes and distributed deep learning training system |
CN111158960A (en) * | 2019-12-31 | 2020-05-15 | 北京讯鸟软件有限公司 | Distributed abnormal data processing method and device based on memory |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106802826B (en) | Service processing method and device based on thread pool | |
AU2012217636B2 (en) | Restarting data processing systems | |
CN110231995B (en) | Task scheduling method, device and storage medium based on Actor model | |
CN108089915B (en) | Method and system for business control processing based on message queue | |
CN103370693A (en) | Restarting processes | |
US10505881B2 (en) | Generating message envelopes for heterogeneous events | |
CN110611707B (en) | Task scheduling method and device | |
CN111045810A (en) | Task scheduling processing method and device | |
CN115297124B (en) | System operation and maintenance management method and device and electronic equipment | |
US20040133680A1 (en) | System and method for processing hardware or service usage data | |
CN115328741A (en) | Exception handling method, device, equipment and storage medium | |
CN113297009B (en) | Information backup method, equipment, platform and storage medium | |
CN110650164B (en) | File uploading method and device, terminal and computer storage medium | |
CN114168291A (en) | Main thread stuck processing method, device, equipment and storage medium | |
CN114090297A (en) | Service message processing method and related device | |
CN112764959B (en) | Method, device, equipment and storage medium for monitoring application program locking problem | |
CN110333916A (en) | Request message processing method, device, computer system and readable storage medium storing program for executing | |
CN113360261A (en) | System, method, medium, and apparatus for processing data blocking in stream processing | |
CN106170013B (en) | A kind of Kafka message uniqueness method based on Redis | |
CN110287159B (en) | File processing method and device | |
CN111831408A (en) | Asynchronous task processing method and device, electronic equipment and medium | |
JP2008102778A (en) | Information processor, control method of information processor and program | |
CN116680055A (en) | Asynchronous task processing method and device, computer equipment and storage medium | |
CN107958414B (en) | Method and system for eliminating long transactions of CICS (common integrated circuit chip) system | |
CN102221995A (en) | Break restoration method of seismic data processing work |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210907 |
|
RJ01 | Rejection of invention patent application after publication |