KR101867220B1 - Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data - Google Patents

Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data Download PDF

Info

Publication number
KR101867220B1
KR101867220B1 KR1020170023893A KR20170023893A KR101867220B1 KR 101867220 B1 KR101867220 B1 KR 101867220B1 KR 1020170023893 A KR1020170023893 A KR 1020170023893A KR 20170023893 A KR20170023893 A KR 20170023893A KR 101867220 B1 KR101867220 B1 KR 101867220B1
Authority
KR
South Korea
Prior art keywords
stream data
real
data
processing method
time
Prior art date
Application number
KR1020170023893A
Other languages
Korean (ko)
Inventor
안재훈
손재기
박창원
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Priority to KR1020170023893A priority Critical patent/KR101867220B1/en
Priority to US15/464,798 priority patent/US10671636B2/en
Application granted granted Critical
Publication of KR101867220B1 publication Critical patent/KR101867220B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

A real-time stream processing method, an apparatus and a recording medium to which the method is applied are provided. According to the present real-time stream processing method, the real-time stream data processing method can be selected according to the type of the stream data, and the input stream data can be processed according to the selected real-time stream data processing method. By automatically selecting the model, the throughput of the real-time processing service can be increased.

Description

Technical Field [0001] The present invention relates to a real-time stream processing method and apparatus capable of simultaneously supporting streaming models and automatically selecting stream data based on stream data,

The present invention relates to a real-time stream processing technique, and more particularly, to a real-time stream processing method and apparatus for processing stream data in real time.

A real-time processing system should be guaranteed a service level with a delay time of less than a second and should provide a constant response speed and predictable performance.

Event Stream processing method guarantees low latency, enables almost all logic processing, and it is easy to implement state management. However, the event stream processing method has a problem that a bottleneck may occur if all data are concentrated on a specific key, and that all events must be handled separately, which increases the processing cost of the failure.

The micro-batching method reduces the trouble handling cost and the throughput rate because it processes data by batch. However, there is a problem that the micro-batching method has a limitation on logic processing and a large delay time.

Currently there is a real-time processing framework such as Storm, Samza, and Flink that uses event stream processing method and Spark which uses micro-batching method. However, it supports event stream processing method and micro-batching method simultaneously . In addition, when the stream is processed by the micro-batching method using the existing framework, the delay time is rapidly increased due to the data concentrated at a specific time.

 This causes a problem that it is necessary to reset the batch interval at a specific time period during which data is concentrated so as to keep the low delay time constant. In addition, according to the service characteristic or the format classification of stream data (regular, semi-regular, irregular) It has the problem of changing the processing framework or integrating with other platforms to solve performance issues.

SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and it is an object of the present invention to provide a method and apparatus for selecting a real-time stream data processing method according to a type of stream data, And an apparatus and a recording medium to which the present invention is applied.

Another object of the present invention is to select and process a real-time stream data processing method among a micro-batching method or an event stream processing method according to the type of stream data, and a buffer having a fixed size is used as the micro- And an apparatus and a recording medium to which the present invention is applied.

According to an aspect of the present invention, there is provided a real-time stream processing method including: receiving stream data; Selecting a real-time stream data processing method according to a type of the stream data; And processing the input stream data in accordance with the selected real-time stream data processing method.

In the selection step, either the first real-time stream data processing method or the second real-time stream data processing method can be selected according to the type of the stream data.

In addition, the type of stream data may include fixed data, semi-fixed data, and unstructured data.

In the selection step, when the stream data is unstructured data or semi-fixed data, the real-time stream data processing method can be selected as the first real-time stream data processing method.

Also, the first real-time stream data processing method may be a micro-batching method.

A buffer of a fixed size may be used for the micro-batching method.

In the selection step, when the stream data is formatted data, the real-time stream data processing method may be selected by the second real-time stream data processing method.

The second real-time stream data processing method may be an event stream processing method.

In addition, the method may further include setting an automatic mode or a manual mode, and the selecting step may select a real-time stream data processing method set by a user when the manual mode is set.

According to another aspect of the present invention, there is provided a real-time stream processing apparatus including an input unit for receiving stream data; A selecting unit for selecting a real-time stream data processing method according to the type of the stream data; And a processing unit for processing the input stream data according to the selected real-time stream data processing method.

According to another aspect of the present invention, there is provided a computer-readable recording medium having a computer readable recording medium having recorded thereon a program for causing a computer to execute the steps of: receiving stream data; Selecting a real-time stream data processing method according to a type of the stream data; And processing the input stream data in accordance with the selected real-time stream data processing method.

According to another aspect of the present invention, there is provided a real-time stream processing method comprising: receiving stream data; Selecting a real-time stream data processing method from a micro-batching method or an event stream processing method according to a type of stream data; And processing input stream data according to the selected real-time stream data processing method. In the micro-batching method, a fixed-size buffer is used.

According to various embodiments of the present invention, a real-time stream processing method for selecting a real-time stream data processing method according to a type of stream data and processing input stream data according to a selected real-time stream data processing method, So that the throughput of the real-time processing service can be increased by automatically selecting the streaming model according to the type of stream data to be input.

In addition, two streaming models (event stream / micro-batching) are supported at the same time so that the existing real-time processing system can be used without modification. When a bottleneck occurs in the stream data processing based on the event stream method, it is possible to automatically select a microbatching method which is an alternative streaming model. In addition, the load balancing performance problem that occurs when the micro-batching method is used can be improved, and a lower delay time can be guaranteed.

FIG. 1 schematically shows a configuration of a real-time stream processing apparatus according to an embodiment of the present invention,
2 is a diagram provided in the description of a real-time stream processing method according to another embodiment of the present invention,
3 is a diagram provided in the description of a real-time stream processing method according to another embodiment of the present invention,
4 is a diagram schematically illustrating a real-time stream processing method according to another embodiment of the present invention,
FIG. 5 is a diagram schematically illustrating a conventional micro-batching system,
6 is a diagram schematically illustrating a micro-batching method according to another embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings.

1. Real-time stream processing device

1 is a diagram schematically showing a configuration of a real-time stream processing apparatus 100 according to an embodiment of the present invention.

As shown in FIG. 1, the real-time stream processing apparatus 100 includes an input unit 110, a selection unit 120, and a processing unit 130.

The input unit 110 receives stream data from a stream data source. The input unit 110 can receive various types of stream data from various sources such as the Internet, the SNS, and the database.

Specifically, the stream data may include fixed data, semi-fixed data, and unstructured data.

Structured Data represents data stored in fixed fields of a database type. Examples of formal data are relational databases and spreadsheets.

Semi-structured data represents data stored in a form that does not conform to a data model of a structured structure connected to relational databases or other types of data tables. Semi-structured data is not a structured structure but contains tags, schemas, or other markers, so that semantic elements can be distinguished and records and field hierarchies in data can be represented. Examples of semi-structured data include extensible markup language (XML), JavaScript Object Notation (JSON), non-SQL databases, and the like.

Unstructured data refers to unstructured data, such as pictures, images, and documents, which are different in form and structure from data of a certain standard or form. Examples of unstructured data include traditional data such as books, magazines, medical records, voice information, and video information, as well as data generated online from mobile devices such as email, Twitter, and blogs.

In this way, the input unit 110 receives various types of stream data.

The selection unit 120 selects a real-time stream data processing method according to the type of the stream data.

At this time, the selection unit 120 may classify the type of the stream data by using the data type information recorded in the input stream data. For example, the pre-stored data type information includes information on what kind of formatted data (for example, tag information or ID information indicating formatted data, etc.) and information on what kind of semi- Information (e.g., tag information or ID information representing semi-structured data, etc.). Then, the selecting unit 120 determines whether the stream data is the fixed data or the semi-fixed data using the data type information of the input stream data. If the stream data is not both the fixed data and the semi-fixed data, . ≪ / RTI > However, it is needless to say that the above-described stream data type selection method is only one embodiment, and that the selection unit 120 may be applied to various stream data type selection methods.

If the stream data is unstructured data or semi-fixed data, the selecting unit 120 selects a real-time stream data processing method by a micro-batching method. The micro-batching method refers to a stream data processing method in which all the events coming in within a predetermined time interval using a buffer are bundled and processed in one batch unit.

At this time, a fixed size buffer is used for the micro-batching method, and a detailed description thereof will be described later with reference to FIG. 5 and FIG.

The selector 120 selects a real-time stream data processing method as an event stream processing method when the stream data is formatted data. The event stream processing method represents a stream data processing method that processes an event for stream data immediately.

In addition, the real-time stream processing apparatus 100 can be set to the automatic mode or the manual mode. When the automatic mode is set, the selection unit 120 automatically determines the stream data processing method according to the type of the stream data as described above. However, when the mode is set to the manual mode, the selection unit 120 selects the real-time stream data processing method set by the user.

The processing unit 130 processes the input stream data according to the selected real-time stream data processing method. Then, the processing unit 130 proceeds to a subsequent process such as sending the processed stream data to a stream sink.

The real-time stream processing apparatus 100 having such a configuration can automatically select and process the real-time stream data processing method according to the type of the stream data, thereby increasing the throughput of the real-time stream processing service.

The real-time stream processing apparatus 100 shown in FIG. 1 may be implemented as a device that is physically independent of itself, as a part of a certain apparatus or system, a program installed in a computer, a server, or the like But may be implemented in software such as a framework or an application. In addition, each component of the real-time stream processing apparatus 100 may be implemented as a physical component or as a functional form of software.

2. How to process live streams

2 is a diagram provided in the description of a real-time stream processing method according to another embodiment of the present invention.

First, the real-time stream processing apparatus 100 receives stream data from a stream data source (S210). The real-time stream processing apparatus 100 can receive various types of stream data from various sources such as Internet, SNS, database, and the like.

The real-time stream processing apparatus 100 selects a real-time stream data processing method according to the type of the input stream data.

Specifically, if the stream data is unstructured data or semi-fixed data (S220-Y), the real-time stream processing apparatus 100 selects a real-time stream data processing method as a micro-batching method (S230). At this time, a fixed size buffer is used for the micro-batching method, and a detailed description thereof will be described later with reference to FIG. 5 and FIG.

On the other hand, if the stream data is the form data (S220-N, S240), the real-time stream processing apparatus 100 selects the real-time stream data processing method as an event stream processing method (S250).

Thereafter, the real-time stream processing apparatus 100 processes the inputted stream data according to the selected real-time stream data processing method (S260). Then, the real-time stream processing apparatus 100 performs a subsequent process such as sending the processed stream data to a stream sink.

On the other hand, the real-time stream processing apparatus 100 may be set to an automatic mode or a manual mode, which will be described with reference to FIG. FIG. 3 is a diagram provided in a description of a real-time stream processing method according to another embodiment of the present invention.

First, the real-time stream processing apparatus 100 receives stream data from a stream data source (S210).

Then, the real-time stream processing apparatus 100 confirms whether the currently set mode is the automatic mode or the manual mode. If it is set to the automatic mode (S310-Y), the real-time stream processing apparatus 100 automatically determines the stream data processing method based on the type of the stream data according to the above-described steps S220 to S250 of FIG. 2 (S320) . However, if it is set to the manual mode (S310-N, S330), the real-time stream processing apparatus 100 selects the real-time stream data processing method set by the user (S340). Thereafter, the real-time stream processing apparatus 100 deletes the inputted stream data according to the selected real-time stream data processing method (S260).

Through this process, the real-time stream processing apparatus 100 can automatically select and process the real-time stream data processing method according to the type of the stream data, thereby increasing the throughput of the real-time stream processing service.

3. Simultaneous streaming model support and automatic selection of streaming model

4 is a diagram schematically illustrating a real-time stream processing method according to another embodiment of the present invention. As shown in FIG. 4, according to the embodiment of the present invention, it is confirmed that the streaming model is simultaneously supported and automatically selected by the micro-batching method and the event stream processing method.

According to FIG. 4, when the stream data source is a text file, it corresponds to unstructured data, so it can be confirmed that the application 1 is processed by the micro-batching method.

According to FIG. 4, when the stream data source is JSON, it corresponds to semi-formed data, and therefore it can be confirmed that the application 2 is processed by the micro-batching method.

In addition, according to FIG. 4, when the stream data source is the RDBMS, it corresponds to the formatted data, and therefore, it can be confirmed that the application 3 is processed by the event stream processing method.

As described above, the real-time stream processing apparatus 100 can confirm that the streaming model is simultaneously supported by the micro-batching method and the event stream processing method and is automatically selected according to the type of the stream data.

4. New micro-batching method

5 is a diagram schematically illustrating a conventional micro-batching method.

As shown in FIG. 5, the conventional micro-batching scheme can confirm that all stream data input during the interval N (Interval N) are bundled and transmitted in one batch unit. However, since all the data input during the interval N is processed in one batch unit, there is a problem that the delay time is rapidly increased due to the stream data concentrated at a specific time.

A solution to this problem is the new micro-batching method disclosed in FIG. 6 is a diagram schematically illustrating a micro-batching method according to another embodiment of the present invention.

As shown in FIG. 6, it can be seen that a fixed size buffer is used in the micro-batch method according to the embodiment of the present invention. That is, the real-time stream processing apparatus 100 divides the stream data input during the interval N (Interval N) into buffer units of a fixed size to generate a batch unit when the stream data is processed by the micro-batching method.

Specifically, when the stream data input during the interval N is larger than the buffer unit, the real-time stream processing apparatus 100 divides input stream data into two or more batch units. When the stream data input during the interval N has the same size as the buffer unit, the real-time stream processing apparatus 100 processes the inputted stream data in one batch unit. In addition, when the stream data input during the interval N is smaller than the buffer unit, the real-time stream processing apparatus 100 processes the input stream data in one batch unit including an empty space.

As such, the real-time stream processing apparatus 100 can improve the load balancing performance problem compared to the conventional method and can guarantee a low delay time by applying the micro-batching method using a buffer unit of a fixed size.

Needless to say, the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the real-time stream processing apparatus 100 and the real-time stream processing method according to the present embodiment . In addition, the technical idea according to various embodiments of the present invention may be realized in the form of a computer-readable programming language code recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, a flash memory, a solid state disk (SSD), or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

100: Real-time stream processing device
110: input unit
120:
130:

Claims (12)

A real-time stream processing apparatus, comprising: receiving stream data;
A real-time stream processing apparatus comprising: selecting a real-time stream data processing method according to a type of stream data; And
Processing the inputted stream data in accordance with the selected real-time stream data processing method,
In the selection step,
Selects one of a first real-time stream data processing method and a second real-time stream data processing method depending on the type of the stream data,
The type of the stream data is,
Characterized in that the data stream includes fixed data, semi-fixed data, and unstructured data.
delete delete The method according to claim 1,
In the selection step,
Wherein when the stream data is unstructured data or semi-fixed data, the real-time stream data processing method is selected as the first real-time stream data processing method.
The method of claim 4,
In the first real-time stream data processing method,
Wherein the method is a micro-batching method.
The method of claim 5,
The micro-
Wherein a fixed size buffer is used.
The method of claim 5,
In the selection step,
And when the stream data is formatted data, selecting a real-time stream data processing method as a second real-time stream data processing method.
The method of claim 7,
In the second real-time stream data processing method,
Wherein the event stream processing method is an event stream processing method.
The method according to claim 1,
In the selection step,
Wherein when the real time stream processing apparatus is set to the manual mode, the real time stream data processing method set by the user is selected.
An input unit for receiving stream data;
A selecting unit for selecting a real-time stream data processing method according to the type of the stream data; And
And a processing unit for processing the inputted stream data according to the selected real-time stream data processing method,
The selection unit,
Selects one of a first real-time stream data processing method and a second real-time stream data processing method depending on the type of the stream data,
The type of the stream data is,
Wherein the real-time stream data includes fixed data, semi-fixed data, and unstructured data.
Receiving stream data;
Selecting a real-time stream data processing method according to a type of the stream data; And
Processing input stream data according to a selected real-time stream data processing method,
In the selection step,
Selects one of a first real-time stream data processing method and a second real-time stream data processing method depending on the type of the stream data,
The type of the stream data is,
Characterized in that the program includes a formatted data, semi-structured data, semi-structured data, and unstructured data.
A real-time stream processing apparatus, comprising: receiving stream data;
Selecting a real-time stream data processing method from a micro-batching method or an event stream processing method according to the type of the stream data; And
Processing the inputted stream data in accordance with the selected real-time stream data processing method,
The micro-
A fixed size buffer is used,
The type of the stream data is,
Characterized in that the data stream includes fixed data, semi-fixed data, and unstructured data.
KR1020170023893A 2016-05-18 2017-02-23 Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data KR101867220B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020170023893A KR101867220B1 (en) 2017-02-23 2017-02-23 Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data
US15/464,798 US10671636B2 (en) 2016-05-18 2017-03-21 In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020170023893A KR101867220B1 (en) 2017-02-23 2017-02-23 Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data

Publications (1)

Publication Number Publication Date
KR101867220B1 true KR101867220B1 (en) 2018-06-12

Family

ID=62622482

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020170023893A KR101867220B1 (en) 2016-05-18 2017-02-23 Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data

Country Status (1)

Country Link
KR (1) KR101867220B1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11640402B2 (en) 2020-07-22 2023-05-02 International Business Machines Corporation Load balancing in streams parallel regions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system
US8978034B1 (en) * 2013-03-15 2015-03-10 Natero, Inc. System for dynamic batching at varying granularities using micro-batching to achieve both near real-time and batch processing characteristics
KR20150084098A (en) * 2014-01-13 2015-07-22 한국전자통신연구원 System for distributed processing of stream data and method thereof
JP2016539427A (en) * 2013-12-05 2016-12-15 オラクル・インターナショナル・コーポレイション Pattern matching across multiple input data streams

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8978034B1 (en) * 2013-03-15 2015-03-10 Natero, Inc. System for dynamic batching at varying granularities using micro-batching to achieve both near real-time and batch processing characteristics
JP2016539427A (en) * 2013-12-05 2016-12-15 オラクル・インターナショナル・コーポレイション Pattern matching across multiple input data streams
KR20150084098A (en) * 2014-01-13 2015-07-22 한국전자통신연구원 System for distributed processing of stream data and method thereof
CN103761309A (en) * 2014-01-23 2014-04-30 中国移动(深圳)有限公司 Operation data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
손재기 외 1명. ‘Squall: 실시간 이벤트와 마이크로-배치의 동시 처리 지원을 위한 TMO 모델 기반의 실시간 빅데이터 처리 프레임워크’. 정보과학회논문지 제44권 제1호, 2017.1, pp.84-94. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11640402B2 (en) 2020-07-22 2023-05-02 International Business Machines Corporation Load balancing in streams parallel regions

Similar Documents

Publication Publication Date Title
US20170169061A1 (en) NoSQL RELATIONAL DATABASE (RDB) DATA MOVEMENT
US10007656B2 (en) DOM snapshot capture
US10922288B2 (en) Method for storing data elements in a database
US11798208B2 (en) Computerized systems and methods for graph data modeling
US20170220647A1 (en) Pluggable architecture for embedding analytics in clustered in-memory databases
US20110022643A1 (en) Dynamic media content previews
US10353874B2 (en) Method and apparatus for associating information
US20190361607A1 (en) Providing combined data from a cache and a storage device
US10755091B2 (en) Method and apparatus for retrieving image-text block from web page
US20190266024A1 (en) Selective and piecemeal data loading for computing efficiency
US20170032052A1 (en) Graph data processing system that supports automatic data model conversion from resource description framework to property graph
US10671636B2 (en) In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment
KR101867220B1 (en) Device and method for realtime stream processing to enable supporting both streaming model and automatic selection depending on stream data
US11622164B2 (en) System and method for streaming video/s
US20170163555A1 (en) Video file buffering method and system
CN111078697B (en) Data storage method and device, storage medium and electronic equipment
KR101830504B1 (en) In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment
US20140067766A1 (en) Propagating per-custodian preservation and collection requests between ediscovery management applications and content archives
WO2017071210A1 (en) Contact creation method and device
US9767191B2 (en) Group based document retrieval
US9679015B2 (en) Script converter
US9705833B2 (en) Event driven dynamic multi-purpose internet mail extensions (MIME) parser
US10482077B2 (en) System and method for asynchronous update of a search index
US10037155B2 (en) Preventing write amplification during frequent data updates
US8787972B2 (en) Electronic device and method for managing commands

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant