KR20170089067A - Bigdata processing system and method - Google Patents
Bigdata processing system and method Download PDFInfo
- Publication number
- KR20170089067A KR20170089067A KR1020160008634A KR20160008634A KR20170089067A KR 20170089067 A KR20170089067 A KR 20170089067A KR 1020160008634 A KR1020160008634 A KR 1020160008634A KR 20160008634 A KR20160008634 A KR 20160008634A KR 20170089067 A KR20170089067 A KR 20170089067A
- Authority
- KR
- South Korea
- Prior art keywords
- unit
- data
- preprocessing
- processing system
- real time
- Prior art date
Links
Images
Classifications
-
- G06F17/30194—
-
- G06F17/30318—
-
- G06F17/3061—
-
- G06F17/30908—
Abstract
According to one aspect of the present invention, there is provided a big data processing system comprising: a collecting unit for collecting data through various paths; A preprocessing unit for performing preprocessing on data transmitted from the collecting unit; A storage unit for dispersively storing input data; An analysis unit for analyzing data transmitted from the preprocessor or data stored in the storage unit and generating an analysis result; A display unit for receiving and displaying the analysis result; And an interworking process executing unit for outputting an interworking process message so that the collecting unit, the preprocessing unit, the analyzing unit, and the display unit operate in real time and process the data.
Description
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a big data processing technique, and more particularly, to a big data processing system and method capable of collecting, storing, and analyzing various types of big data such as regular, semi-regular and irregular types in real time.
Recently, interest in big data technology that gives meaningful value to massive data such as stereotyped or unstructured data is increasing. Many application services are required to produce accurate and fast results through big data.
The term big data refers to data sets that have common software tools and computer systems to collect, manage, store, search, share, analyze, and visualize data belonging to a certain amount of time (data set). The size of the big data may have a range of terabytes, exabytes, or zeta bytes.
Big data exists in various fields. For example, web logs, radio frequency identification (RFID), social networks, social data, Internet text and documents, internet search indexing, astronomy, meteorology, genomics, Biogeochemistry, biology, military surveillance, medical records, photographic records, video recordings, and electronic commerce.
Big Data is generally based on an ecosystem called Hadoop. Hadoop collects large amounts of data, such as regular or irregular data, which are stored in redundant distributed data and processed in parallel on distributed network clusters.
This Hadoop gives the big data the technical meaning of processing information in a short period of time and extracting valuable information. Hadoop's Hadoop Distributed File System (HDFS) is an open source, distributed storage of large amounts of data. It is a technology that reliably stores collected data.
However, Hadoop has a problem in that it can not process collected data in real time as a batch processing system. In other words, Hadoop stores the collected data for a certain period of time, and then performs analysis on a large amount of data according to an external request for data analysis.
A recent alternative is the Hadoop echo system, such as Storm and Spark, in-memory data processing technologies.
The Storm can process the events in parallel without storing and process the data in a manner similar to the MapReduce model. In addition, according to the mechanism of Storm, spout generates data in units of tuples, processes data in units of tuples in bolts, and stores processing results.
Spark introduces an abstraction object in a dataset called Resilient Distribute Dataset (RDD) to perform data processing.
However, this conventional technique requires a mechanism based on Hadoop, and further efforts are needed to acquire it. In addition, the prior art is useful for applications that perform a lot of repetitive tasks on large amounts of data, for example, scientific applications such as repetitive data operations.
However, most of the recent applications are data analysis that integrates various kinds of data (sensor data, social data, system data, accumulation data, weather data, environmental public data, etc.) rather than quick calculation of repetitive numerical operations of in-memory There are many applications to find value.
For example, there is a need for a real-time processing method for applications based on various big data, such as an application that collects and analyzes various data such as an environmental disaster disaster at a time.
SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for collecting, storing, and analyzing various types of big data, And to provide a large data processing system and method.
According to an aspect of the present invention, there is provided a big data processing system including: a collecting unit collecting data through various paths; A preprocessing unit for performing preprocessing on data transmitted from the collecting unit; A storage unit for dispersively storing input data; An analysis unit for analyzing data transmitted from the preprocessor or data stored in the storage unit and generating an analysis result; A display unit for receiving and displaying the analysis result; And an interworking process executing unit for outputting an interworking process message so that the collecting unit, the preprocessing unit, the analyzing unit, and the display unit operate in real time and process the data.
The collecting unit collects orthopedic, semi-orthopedic, and irregular data using a data collection tool.
The preprocessor includes a plurality of preprocessing modules for preprocessing data that varies depending on an application service.
The storage unit is a Hadoop Distributed File System (HDFS).
The collecting unit transmits data corresponding to the data type included in the interworking process message to the preprocessing unit in real time.
When the interworking process execution unit outputs the interworking process message, the collecting unit transfers the collected data to the preprocessing unit in real time, and the preprocessing unit delivers the preprocessed data to the analyzing unit in real time, and the analyzing unit Analyzes the delivered data in real time, and transmits the analysis result to the display unit.
According to another aspect of the present invention, there is provided a method of operating a big data processing system for storing and analyzing input data, the method comprising: setting the system to operate in a linked processing mode; Transmitting data collected by the collecting unit to the preprocessing unit in real time; Preprocessing the data transferred by the preprocessing unit in real time and transferring the preprocessed data to the analyzer; Analyzing data transmitted from the preprocessing unit in real time and generating an analysis result; And displaying the analysis result generated by the analyzing unit in real time.
The step of setting the system to operate in the interlocking processing mode includes transmitting an interworking process message to the collecting unit, the preprocessing unit, the analyzing unit, and the display unit.
The transmitting of the collected data to the preprocessing unit in real time may include transmitting data corresponding to the data type included in the interworking process message to the preprocessing unit in real time.
And storing the analysis result generated by the analysis unit in a storage unit.
With the big data processing system and method of the present invention, for real-time processing of various kinds of big data, data can be collected regardless of the type of received data, stored in association with the Hadoop system, In addition to performing the analysis, the analysis result can be visualized automatically.
In addition, since the data transfer between the functional modules is performed in real time under the control of the streaming interworking adaptation module, it is possible to provide real-time services to applications based on various big data.
1 is a diagram showing an example of a configuration of a big data processing system according to an embodiment of the present invention.
2 is a diagram showing an example of a configuration of a storage unit of a big data processing system according to an embodiment of the present invention.
3 is a flowchart showing a procedure according to a first operation of a big data processing system according to an embodiment of the present invention.
4 is a flowchart showing a procedure according to a second operation of the big data processing system according to the embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like numbers refer to like elements throughout.
In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.
Hereinafter, a big data processing system and a processing method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 1 is a diagram illustrating an example of a configuration of a big data processing system according to an embodiment of the present invention. FIG. 2 is a diagram illustrating an example of a configuration of a storage unit of a big data processing system according to an embodiment of the present invention.
Referring to FIG. 1, a big
Specifically, the
The
The
The
The data collected by the
Particularly, when the
The preprocessing
The preprocessing
The processed data generated by the preprocessing
In particular, upon receiving the interworking process message transmitted from the interworking
At this time, the
The
Here, the
The
In particular, upon receiving the interworking process message transmitted from the interworking
The
At this time, the
The Hadoop distributed file system is well known in the art, and the structure of the
Referring to FIG. 2, the
The
The
The
In particular, upon receiving the interworking process message from the interworking
The interlocking
The configuration of the big data processing system according to the embodiment of the present invention and the functions of the respective configurations have been described above. Hereinafter, an operation of a big data processing system according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
3 is a flowchart showing a procedure according to a first operation of a big data processing system according to an embodiment of the present invention.
3 shows a process according to an operation of collectively processing data by a big data processing system. The interworking
First, the collecting
After step S320, the
The analyzing
4 is a flowchart showing a procedure according to a second operation of the big data processing system according to the embodiment of the present invention.
4 shows a process according to an operation in which the big data processing system processes data in real time. The linked
That is, the big
If the
Thereafter, the
When the preprocessing according to step S430 is completed, the
When the real-time analysis is completed according to the step S450, the
If the analysis result is provided according to the step S460, the
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Various modifications, alterations, and alterations can be made within the scope of the present invention.
Therefore, the embodiments described in the present invention and the accompanying drawings are intended to illustrate rather than limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and accompanying drawings . The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the present invention.
100: Big data processing system
110: collecting section
120:
130:
140:
150:
160: Interlocking process execution unit
Claims (10)
A preprocessing unit for performing preprocessing on data transmitted from the collecting unit;
A storage unit for dispersively storing input data;
An analysis unit for analyzing data transmitted from the preprocessor or data stored in the storage unit and generating an analysis result;
A display unit for receiving and displaying the analysis result; And
And an interworking process executing unit for outputting an interworking process message so that the collecting unit, the preprocessing unit, the analyzing unit, and the display unit operate in real time and process data,
Big data processing system.
The collecting unit collects the data of the orthopedic, semi-orthopedic, and irregular data using the data collection tool
Big data processing system.
The preprocessor may include a plurality of preprocessing modules for preprocessing data that varies depending on an application service
Big data processing system.
The storage unit may be a Hadoop Distributed File System (HDFS)
Big data processing system.
The collecting unit transmits data corresponding to the data type included in the interworking process message to the preprocessing unit in real time
Big data processing system.
When the interworking process execution unit outputs the interworking process message, the collecting unit transfers the collected data to the preprocessing unit in real time, and the preprocessing unit delivers the preprocessed data to the analyzing unit in real time, and the analyzing unit Analyzing the transmitted data in real time and transmitting the analysis result to the display unit
Big data processing system.
Setting the system to operate in an interworking mode;
Transmitting data collected by the collecting unit to the preprocessing unit in real time;
Preprocessing the data transferred by the preprocessing unit in real time and transferring the preprocessed data to the analyzer;
Analyzing data transmitted from the preprocessing unit in real time and generating an analysis result; And
And displaying the analysis result generated by the analysis unit in real time on the display unit
A method of operating a big data processing system.
The step of setting the system to operate in the interlocking processing mode includes transmitting an interworking process message to the collecting unit, the preprocessing unit, the analyzing unit, and the display unit,
A method of operating a large data processing system.
Wherein the collecting unit transmits the collected data to the preprocessing unit in real time includes transmitting data corresponding to the data type included in the interworking process message among the collected data to the preprocessing unit in real time
A method of operating a large data processing system.
And storing the analysis result generated by the analysis unit in a storage unit
A method of operating a big data processing system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160008634A KR20170089067A (en) | 2016-01-25 | 2016-01-25 | Bigdata processing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160008634A KR20170089067A (en) | 2016-01-25 | 2016-01-25 | Bigdata processing system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20170089067A true KR20170089067A (en) | 2017-08-03 |
Family
ID=59655463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160008634A KR20170089067A (en) | 2016-01-25 | 2016-01-25 | Bigdata processing system and method |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20170089067A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101826827B1 (en) * | 2017-10-11 | 2018-02-09 | (주)데이타뱅크시스템 | Integrated replication system in real time |
KR101859094B1 (en) * | 2017-10-11 | 2018-05-18 | (주)데이타뱅크시스템즈 | Integrated replication system considering change of replication method |
KR101859090B1 (en) * | 2017-10-11 | 2018-05-18 | (주)데이타뱅크시스템즈 | Integrated replication system |
KR20190076352A (en) | 2017-12-22 | 2019-07-02 | 인천대학교 산학협력단 | Hadoop-Based Intelligent Care System and method thereof |
KR20190134982A (en) * | 2018-05-18 | 2019-12-05 | 박병훈 | Big data-based artificial intelligence integration platform |
KR20190134983A (en) * | 2018-05-18 | 2019-12-05 | 박병훈 | Method for providing big data-based artificial intelligence integration platform service |
KR20200095593A (en) | 2019-01-25 | 2020-08-11 | (주)비아이매트릭스 | A system of pretreatment and storage of heterogenous big-data for deep-learning of big-data |
KR20210015527A (en) * | 2019-08-02 | 2021-02-10 | 사회복지법인 삼성생명공익재단 | Medical data warehouse real-time automatic update system, method and recording medium therefor |
KR20210039654A (en) * | 2019-10-02 | 2021-04-12 | (주)디지탈쉽 | Data processing for detecting outlier method and device thereof |
KR20210060830A (en) * | 2019-11-19 | 2021-05-27 | 주식회사 피씨엔 | Big data intelligent collecting method and device |
US11429622B2 (en) | 2018-07-02 | 2022-08-30 | Electronics And Telecommunications Research Institute | Method of supporting big data analysis based on provenance information and apparatuses performing the same |
KR102529547B1 (en) * | 2022-08-18 | 2023-05-10 | 주식회사 에이데이타 | Big-data collection device and method for data linkage automation |
KR20230138109A (en) | 2022-03-23 | 2023-10-05 | 주식회사에이테크 | Big Data Material Collection and Analysis System Using Hadub |
KR20230138099A (en) | 2022-03-23 | 2023-10-05 | 주식회사에이테크 | Method of manage big data materials based on Hadub |
-
2016
- 2016-01-25 KR KR1020160008634A patent/KR20170089067A/en unknown
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101826827B1 (en) * | 2017-10-11 | 2018-02-09 | (주)데이타뱅크시스템 | Integrated replication system in real time |
KR101859094B1 (en) * | 2017-10-11 | 2018-05-18 | (주)데이타뱅크시스템즈 | Integrated replication system considering change of replication method |
KR101859090B1 (en) * | 2017-10-11 | 2018-05-18 | (주)데이타뱅크시스템즈 | Integrated replication system |
WO2019074154A1 (en) * | 2017-10-11 | 2019-04-18 | (주) 데이타뱅크시스템즈 | Integrated replication system |
WO2019074155A1 (en) * | 2017-10-11 | 2019-04-18 | (주) 데이타뱅크시스템즈 | Inter-database real time integrated replication system |
KR20190076352A (en) | 2017-12-22 | 2019-07-02 | 인천대학교 산학협력단 | Hadoop-Based Intelligent Care System and method thereof |
KR20190134982A (en) * | 2018-05-18 | 2019-12-05 | 박병훈 | Big data-based artificial intelligence integration platform |
KR20190134983A (en) * | 2018-05-18 | 2019-12-05 | 박병훈 | Method for providing big data-based artificial intelligence integration platform service |
US11429622B2 (en) | 2018-07-02 | 2022-08-30 | Electronics And Telecommunications Research Institute | Method of supporting big data analysis based on provenance information and apparatuses performing the same |
KR20200095593A (en) | 2019-01-25 | 2020-08-11 | (주)비아이매트릭스 | A system of pretreatment and storage of heterogenous big-data for deep-learning of big-data |
KR20210015527A (en) * | 2019-08-02 | 2021-02-10 | 사회복지법인 삼성생명공익재단 | Medical data warehouse real-time automatic update system, method and recording medium therefor |
KR20210039654A (en) * | 2019-10-02 | 2021-04-12 | (주)디지탈쉽 | Data processing for detecting outlier method and device thereof |
KR20210060830A (en) * | 2019-11-19 | 2021-05-27 | 주식회사 피씨엔 | Big data intelligent collecting method and device |
KR20230138109A (en) | 2022-03-23 | 2023-10-05 | 주식회사에이테크 | Big Data Material Collection and Analysis System Using Hadub |
KR20230138099A (en) | 2022-03-23 | 2023-10-05 | 주식회사에이테크 | Method of manage big data materials based on Hadub |
KR102529547B1 (en) * | 2022-08-18 | 2023-05-10 | 주식회사 에이데이타 | Big-data collection device and method for data linkage automation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR20170089067A (en) | Bigdata processing system and method | |
US11550829B2 (en) | Systems and methods for load balancing in a system providing dynamic indexer discovery | |
US11836579B2 (en) | Data analytics in edge devices | |
US10735536B2 (en) | Scalable data enrichment for cloud streaming analytics | |
US10423469B2 (en) | Router management by an event stream processing cluster manager | |
US10747592B2 (en) | Router management by an event stream processing cluster manager | |
US20200117757A1 (en) | Real-time monitoring and reporting systems and methods for information access platform | |
CN109344170B (en) | Stream data processing method, system, electronic device and readable storage medium | |
US10567557B2 (en) | Automatically adjusting timestamps from remote systems based on time zone differences | |
US9128994B2 (en) | Visually representing queries of multi-source data | |
KR101687239B1 (en) | System and Method for Big Data Stream Modeling | |
EP3813005A1 (en) | Predicting potential incident event data structures based on multi-modal analysis | |
US11416367B2 (en) | Linking computing metrics data and computing inventory data | |
Ferry et al. | Towards a big data platform for managing machine generated data in the cloud | |
CN114372084A (en) | Real-time processing system for sensing stream data | |
CN102055620B (en) | Method and system for monitoring user experience | |
JP2007323143A (en) | Business management system, information system, and business management method | |
EP3945386A1 (en) | System and method for determining manufacturing plant topology and fault propagation information | |
KR101878291B1 (en) | Big data management system and management method thereof | |
US10747812B1 (en) | Video analytics | |
WO2017051518A1 (en) | Communication information calculation apparatus, communication information calculation method, recording medium, and communication management system | |
CN112564984A (en) | Distributed safe operation and maintenance method of Internet of things based on big data | |
KR101865317B1 (en) | Preprocessing device and method of big data for distributed file system of data | |
Aung et al. | Comparative analysis of real-time messages in big data pipeline architecture | |
Zehnder | Automating Industrial Event Stream Analytics: Methods, Models, and Tools |