CN110007899B - Storm-based universal window frame system - Google Patents

Storm-based universal window frame system Download PDF

Info

Publication number
CN110007899B
CN110007899B CN201810678763.XA CN201810678763A CN110007899B CN 110007899 B CN110007899 B CN 110007899B CN 201810678763 A CN201810678763 A CN 201810678763A CN 110007899 B CN110007899 B CN 110007899B
Authority
CN
China
Prior art keywords
window
event
scheduling
life cycle
windows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810678763.XA
Other languages
Chinese (zh)
Other versions
CN110007899A (en
Inventor
尤夕多
张雷
万敏
蔡巍伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinzailing Technology Co ltd
Original Assignee
Zhejiang Xinzailing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Xinzailing Technology Co ltd filed Critical Zhejiang Xinzailing Technology Co ltd
Priority to CN201810678763.XA priority Critical patent/CN110007899B/en
Publication of CN110007899A publication Critical patent/CN110007899A/en
Application granted granted Critical
Publication of CN110007899B publication Critical patent/CN110007899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/24Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces

Abstract

The invention discloses a storm-based universal window frame system, which comprises a window component, a window scheduler, an event tracker and an event set, wherein the window component is used for receiving a storm event; the window component comprises a data set of a window storage layer; the window scheduler is responsible for registering the window tasks, scheduling the windows and managing the life cycles of the windows; the event set is all original data of the window; the event tracker is responsible for triggering an operator of a window, the operator of the window is an event called by the window, and the event is divided into four stages, namely a STOP event, a NEW event, a PROCESS event and an EXPIRE event; therefore, the invention provides a storm-based universal window frame system with the advantages of resource saving and quick scheduling.

Description

Storm-based universal window frame system
Technical Field
The invention relates to the field of window scheduling, in particular to a storm-based universal window framework system.
Background
storm native provided window framework, which includes two functions for window computation, (1) sliding-based windows: the data stream is divided into individual windows according to time or events, which in turn can be configured as time-based or event-based. (2) scrolling based windows. Similar to the sliding window, only one window length parameter can be regarded as the window with the same window length and sliding step size.
The technical scheme has the following defects: this approach only implements the most basic window model, with two significant disadvantages, (1) when encountering the need to use different window configurations for the same data, two window instances need to be created, which results in data that is twice redundant. (2) For a time-based window, it is a resource-wasting behavior to obtain the full amount of data each time, which means that the full amount of data of a window length needs to be cached in the memory, which is terrible in the era of data volume explosion.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides a storm-based universal window frame system which saves resources and is rapidly scheduled.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the storm-based universal window frame system comprises a window component, a window scheduler, an event tracker and an event set; the window component comprises a data set of a window storage layer; the window scheduler is responsible for registering the window tasks, scheduling the windows and managing the life cycles of the windows; the event set is all original data of the window; the event tracker is responsible for triggering an operator of a window, the operator of the window is an event called by the window, and the event is divided into four stages, namely a STOP event, a NEW event, a PROCESS event and an EXPIRE event;
STOP event: if the time of the event generation is larger than the life cycle deadline of the window, judging that the event does not belong to the window, wherein the events are sorted according to time, so that the events after the event do not belong to the window;
NEW event: the event is a newly input event of a new window, namely the event is not generated in the life cycle of the last window and is only generated in the life cycle of the new window;
PROCESS event: the event is completely within the life cycle of the window;
EXPIRE event: the event is expired data and needs to be cleared, namely the time of the event is within the life cycle deadline of the last window;
the window scheduler is used for allocating and scheduling the window components, the event tracker is used for tracking the event phase and triggering the corresponding window components, so that an event set is formed in the whole process, data resources are provided for the next window component during calling, and the whole resource occupation is reduced.
Further, the window component calls include a schedule of aggregated windows and a schedule of non-aggregated windows.
Further, the scheduling of the non-aggregation window comprises the following steps:
101 Event input step: this step is used as the stage of event entering the window life cycle, which includes the pre-filtering of the event and the entry of the event; event pre-filtering is used for preventing dirty data from entering, and event logging is used for adding an event into a core cache container of a window life cycle;
102 Window checking step: the step is the triggering of window components, and one window component has at least one event; through the input of an event and the traversal of window components, judging which window component exceeds a preset window threshold value, and triggering the corresponding window component through a window scheduler;
103 Event phase classification step: establishing a window component state machine model based on the content input by the event, and corresponding to corresponding indexes, wherein the indexes comprise the following three types: recording an event when the life cycle of the previous window is ended, recording an event when the life cycle of the current window is ended, wherein the first state in the life cycle of the current window is the event of the NEW event, so that the next future event can be directly positioned by using the index; and generating a final window after the establishment is finished.
Furthermore, the scheduling of the aggregation window is based on a window scheduling model encapsulated on the non-aggregation window, the scheduling of the aggregation window has more aggregated data sets than the scheduling of the non-aggregation window, and the life cycles of the data sets are managed by the window scheduler.
Compared with the prior art, the invention has the advantages that:
the invention creates the scheduling of the aggregation window and the scheduling of the non-aggregation window, the statistical value calculated by the historical window is used for replacing the native data, and the statistical value is usually 1 or several fields, so that the statistical value is different from the original data by several orders of magnitude and even can be ignored, the model not only reduces the use amount of memory resources, but also avoids repeated statistics, and saves cpu resources. The aggregation window using the scheme can reduce the data volume of (window length-sliding step length) time, and the effect is obviously good.
Drawings
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a flowchart of the overall window scheduling of the present invention;
FIG. 3 is a diagram of an aggregated and non-aggregated window scheduling process of the present invention;
fig. 4 is a schematic diagram of the aggregation window time and resource scheduling according to the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
As shown in fig. 1 to 4, the storm-based generic window framework system includes a window component, a window scheduler, an event tracker, and an event collection.
The window component includes a data set of a window storage layer.
The window scheduler is responsible for registering the window tasks, scheduling the windows and managing the life cycles of the windows. Specifically, a single window assembly supports a plurality of events, namely the configuration of a plurality of window assemblies can multiplex cached data on a certain program, and data redundancy is reduced. The window scheduler will manage the life cycle of all events including insert, delete, etc. operations. Scheduling of aggregated windows and scheduling of non-aggregated windows are also included.
The event set is all the raw data of the window. The present scheme employs a red-black tree data structure as a container for data sets, one of the important reasons being ordering, it is necessary to sort all data entering the window life cycle, and only sorted data can establish indexes, and the efficiency of traversal is greatly improved. The red and black tree is a good-performance ordering binary tree, and the data writing performance is also good. And when aiming at one window component, the aggregated data is cached in a queue, and the cache of the aggregated data of a plurality of windows in a hash table can be realized according to the FIFO (first in first out) principle so as to be called quickly.
The event tracker is responsible for triggering the window operators, and the execution frequency and the execution conditions of the tracker are different for each different window operator, so that different threshold values of the window can be formed, and the window scheduler can select the correct window to be called. For example, the time window may be based on a watermark trigger, and the event window is based on an event trigger, but the core function is to determine whether the event satisfies the requirements of the window. The event tracker is different facing different window assemblies, and the time-based event tracker and the event-based event tracker simultaneously support horizontal extension, thereby achieving excellent compatibility and providing a favorable basis for subsequent scheduling of aggregation windows and scheduling of non-aggregation windows.
The operator of the window is an event called by the window, and the event is divided into four stages, namely a STOP event, a NEW event, a PROCESS event and an EXPIRE event.
STOP event: if the event occurs at a time greater than the lifecycle deadline of the window, it is determined that the event does not belong to the window, wherein the events are ordered by time and therefore the events following the event do not belong to the window.
NEW event: an event is a newly entered event for a new window, i.e. the event does not occur during the lifetime of the previous window, but only during the lifetime of the new window.
PROCESS event: the event is completely within the life cycle of this window.
EXPIRE event: the event is expired data and needs to be cleared, namely the time of the event is within the life cycle expiration time of the last window.
The window scheduler is used for allocating and scheduling the window assemblies, and the event tracker is used for tracking the event stages and triggering the corresponding window assemblies, so that an event set is formed in the whole process, data resources are provided for the next window assembly during calling, and the whole resource occupation is reduced. The core part of the method is the scheduling of the aggregation window and the scheduling of the non-aggregation window. And aiming at the scheduling of the non-aggregated window, an index is introduced, so that the original data set is prevented from being repeatedly traversed when the window data set is packaged. And abstracting again on the basis of the scheduling of the non-aggregation window to obtain an aggregation window model, so that the performance of the window is optimized.
Specifically, the window component calls include a schedule of aggregated windows and a schedule of non-aggregated windows.
The non-aggregated window is the basis of the window frame, and can meet the requirement of generating any window. The scheme adopts a state machine model to track each input event and constructs indexes on the important point nodes, and the method specifically comprises the following steps:
101 Event input step: this step enters the phase of the window lifecycle as an event, which includes pre-filtering of events and logging of events. Event pre-filtering is to prevent dirty data from entering, and if the time of an event is much longer than normal time, the event will always occupy memory resources, so that pre-processing is required. Event logging is the addition of an event to the core cache container of the window lifecycle, i.e. the container selection of the data set in the event set.
102 Window checking step: the step is the triggering of window components, one window component has at least one event, so the triggering mechanism of each window is different, such as the window component based on time or based on events, so each window component has a separate event tracker, and the third stage is entered only when the threshold of the corresponding window component is reached, thereby preventing the events from repeatedly traversing. Namely, through the input of an event, the window assembly is traversed, and the value of the window assembly is judged to exceed which window assembly preset window threshold value, so that the corresponding window assembly is triggered through the window scheduler.
103 Event phase classification step: establishing a window component state machine model based on the content input by the event, and corresponding to corresponding indexes, wherein the indexes comprise the following three types: recording the event when the life cycle of the last window is ended, recording the event when the life cycle of the current window is ended, wherein the first state in the life cycle of the current window is the event of the NEW event, and thus, the index can be used for directly positioning the next event in the future. And generating a final window after the establishment is finished. The following table specifically locates event phases according to index
English Explanation of the invention
windowTuples Data set of current time window
newTuples Newly added data set in current time window and data set not in last time window
expiredTuples Expired data set
In the process of acquiring newTuples and expiredtuplies, the event can be directly positioned by using the index according to the definition, and the corresponding event can be directly judged by using the windows tuples as the current window. In addition, in the process of managing the life cycle of an event, it should be noted that an EXPIRE event cannot be deleted immediately, because one window component may have multiple events, and the state of one event in one task is EXPIRE, which does not indicate that the state of the event in other window components is also EXPIRE, if the direct deletion causes the absence of the event in other window components, it is necessary to wait until all window components determine that the state of the event is EXPIRE, and then the event can be deleted.
The scheduling of the aggregation window is based on a window scheduling model encapsulated on the non-aggregation window, and the difference point of the scheduling of the aggregation window and the scheduling of the non-aggregation window is in the event phase classification step. The scheduling of the aggregation window increases the aggregated data set compared with the scheduling of the non-aggregation window, and the life cycle of the data set is managed by the window scheduler. Therefore, the scheduling of the aggregation window is just equivalent to processing the aggregated data sets as events, and the calling is accelerated by aggregating the data. The following presents its advantages with respect to the aggregation window:
the scheduling of the aggregation window is characterized in that the original data is replaced by the statistical value calculated by the historical window component, and the statistical value is 1 or a plurality of fields, so that the statistical value is different from the original data by several orders of magnitude or even can be ignored, the usage amount of memory resources is reduced by the model, repeated statistics is avoided, and cpu resources are saved. With particular reference to the aggregated window model of fig. 4, scheduling using aggregated windows may reduce the amount of data (window length-sliding step) time, with the effect being apparent, where white rectangles represent raw data within the current window lifecycle time, where white circles represent aggregated data within the current window lifecycle time, black rectangles represent raw data within non-current window lifecycle times, and black circles represent aggregated data that has expired.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.

Claims (2)

1. The storm-based universal window frame system is characterized by comprising a window component, a window scheduler, an event tracker and an event set; the window component comprises a data set of a window storage layer; the window scheduler is responsible for registering the window tasks, scheduling the windows and managing the life cycles of the windows; the event set is all original data of the window; the event tracker is responsible for triggering an operator of a window, the operator of the window is an event called by the window, and the event is divided into four stages, namely a STOP event, a NEW event, a PROCESS event and an EXPIRE event; STOP event: if the time of the event generation is larger than the life cycle deadline of the window, judging that the event does not belong to the window, wherein the events are sorted according to time, so that the events after the event do not belong to the window; NEW event: the event is a newly input event of a new window, namely the event is not generated in the life cycle of the last window and is only generated in the life cycle of the new window; PROCESS event: the event is completely within the life cycle of the window; EXPIRE event: the event is expired data and needs to be cleared, namely the time of the event is within the life cycle deadline of the last window; the window scheduler is used for allocating and scheduling the window assemblies, and the event tracker is used for tracking the event stage and triggering the corresponding window assemblies, so that an event set is formed in the whole process, data resources are provided for the next window assembly during calling, and the whole resource occupation is reduced;
the window component calls a scheduling mode comprising an aggregation window and a scheduling mode comprising a non-aggregation window;
the scheduling of the non-aggregated window comprises the following steps: 101 Event input step: this step is taken as a phase of event entry window lifecycle, which includes pre-filtering of events and entry of events; event pre-filtering is used for preventing dirty data from entering, and event logging is used for adding an event into a core cache container of a window life cycle; 102 Window checking step: the step is the triggering of window components, and one window component has at least one event; through the input of an event and the traversal of window components, judging which window component exceeds a preset window threshold value, and triggering the corresponding window component through a window scheduler; 103 Event phase classification step: establishing a window component state machine model based on the content input by the event, and corresponding to corresponding indexes, wherein the indexes comprise the following three types: recording an event when the life cycle of the previous window is ended, recording an event when the life cycle of the current window is ended, wherein the first state in the life cycle of the current window is the event of the NEW event, so that the next future event can be directly positioned by using the index; and generating a final window after the establishment is finished.
2. A storm-based generic window framework system as claimed in claim 1, wherein the scheduling of aggregated windows is based on a window scheduling model encapsulated on top of non-aggregated windows, the scheduling of aggregated windows having more aggregated datasets than the scheduling of non-aggregated windows, the life cycles of these datasets also being managed by the window scheduler.
CN201810678763.XA 2018-06-27 2018-06-27 Storm-based universal window frame system Active CN110007899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810678763.XA CN110007899B (en) 2018-06-27 2018-06-27 Storm-based universal window frame system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810678763.XA CN110007899B (en) 2018-06-27 2018-06-27 Storm-based universal window frame system

Publications (2)

Publication Number Publication Date
CN110007899A CN110007899A (en) 2019-07-12
CN110007899B true CN110007899B (en) 2022-11-18

Family

ID=67164744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810678763.XA Active CN110007899B (en) 2018-06-27 2018-06-27 Storm-based universal window frame system

Country Status (1)

Country Link
CN (1) CN110007899B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293889A (en) * 2015-06-05 2017-01-04 北京国双科技有限公司 A kind of control the method and device that sliding window moves

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930473A (en) * 2010-09-14 2010-12-29 何吴迪 Method for constructing cloud computing window search system with executable structure
US9541432B2 (en) * 2013-05-17 2017-01-10 The United States Of America As Represented By The Administrator Of The U.S. Environmental Protection Agency Flow imaging and monitoring for synchronized management of wide area drainage
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
CN104639466B (en) * 2015-03-05 2018-04-10 北京航空航天大学 A kind of application network Bandwidth Dynamic priority support method based on Storm real-time streams Computational frames
CN106067096B (en) * 2016-06-24 2019-09-17 北京邮电大学 A kind of data processing method, apparatus and system
CN107545014A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 Stream calculation instant disposal system for treating based on Storm
US20180075163A1 (en) * 2016-09-15 2018-03-15 Oracle International Corporation Clustering event processing engines
CN107678852B (en) * 2017-10-26 2021-06-22 携程旅游网络技术(上海)有限公司 Method, system, equipment and storage medium based on stream data real-time calculation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293889A (en) * 2015-06-05 2017-01-04 北京国双科技有限公司 A kind of control the method and device that sliding window moves

Also Published As

Publication number Publication date
CN110007899A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
US6832227B2 (en) Database management program, a database managing method and an apparatus therefor
US20180189350A1 (en) Streaming data processing method, streaming data processing device and memory medium
US9298774B2 (en) Changing the compression level of query plans
CN109271435B (en) Data extraction method and system supporting breakpoint continuous transmission
CN106354817B (en) Log processing method and device
CN108196959B (en) Resource management method and device of ETL system
CN103838659A (en) Method and device for controlling system logs
CN108958789A (en) A kind of parallel streaming calculation method, electronic equipment, storage medium and system
CN110825598A (en) Log real-time processing method and system
CN109901918B (en) Method and device for processing overtime task
CN109977139B (en) Data processing method and device based on class structured query statement
CN112363812B (en) Database connection queue management method based on task classification and storage medium
CN110196868A (en) Based on distributed work order flow monitoring method
CN114185885A (en) Streaming data processing method and system based on column storage database
CN110007899B (en) Storm-based universal window frame system
CN106407636B (en) Integration result statistical method and device
CN107577809A (en) Offline small documents processing method and processing device
CN110633302A (en) Processing method and device for massive structured data
US9767180B2 (en) Floating time dimension design
CN109299132A (en) SQL data processing method, system and electronic equipment
CN114090409A (en) Message processing method and device
CN113986942A (en) Message queue management method and device based on man-machine conversation
CN113901141A (en) Distributed data synchronization method and system
CN113342758B (en) Metadata management method, device, equipment and medium of file system
CN111125161A (en) Real-time data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant