KR101269428B1 - System and method for data distribution - Google Patents
System and method for data distribution Download PDFInfo
- Publication number
- KR101269428B1 KR101269428B1 KR1020120083209A KR20120083209A KR101269428B1 KR 101269428 B1 KR101269428 B1 KR 101269428B1 KR 1020120083209 A KR1020120083209 A KR 1020120083209A KR 20120083209 A KR20120083209 A KR 20120083209A KR 101269428 B1 KR101269428 B1 KR 101269428B1
- Authority
- KR
- South Korea
- Prior art keywords
- data
- node
- nodes
- data node
- capacity
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a data distribution system and method, comprising: a plurality of data nodes storing data, an input data being analyzed to identify a type pattern, and storing the data based on state information of data nodes in which the type pattern is set; A management node for determining a data node and for distributing the data.
Description
The present invention relates to a data distribution system and method, and more particularly, to analyze a type pattern by analyzing input data, and to determine a data node to store data based on state information of data nodes in which the type pattern is set. A data distribution system and method for distributing / storing data.
As the Internet develops, a lot of data is generated and distributed by netizens a day, and recently, a large amount of data is collected and accumulated as much as possible among many companies, especially search engine companies and web portals. Extracting meaningful information from data as quickly as possible becomes a competitive advantage for companies.
As a result, many companies are investigating large-scale distributed management and distributed workload processing technology by building large clusters at low cost.
In other words, the value of large data that is difficult to process in the existing single-machine system is highlighted, and distributed parallel-based systems have been introduced / used in various fields as an alternative for processing them.
However, in the distributed parallel system that stores and processes data in multiple nodes, the processing speed of the entire system is inevitable due to the load caused by the network IO and the number of join operations between nodes in the process of processing one task. There was an inherent problem with processing large amounts of data at high speed.
The present invention has been made to solve the above problems, to provide a data distribution system and method that can reduce the response time of the overall system by minimizing the network IO time and Join operation between each node of a distributed parallel system There is this.
Another object of the present invention is to provide a data distribution system and method capable of improving query processing speed by distributing and storing data in a data node, and generating a data replica to ensure fault tolerance.
It is still another object of the present invention to provide a data distribution system and method capable of minimizing network IO between data nodes to reduce the speed of an entire task.
According to an aspect of the present invention to achieve the above objects, a plurality of data nodes for storing data, the input data is analyzed to confirm a type pattern, and based on the state information of the data nodes in which the type pattern is set; A data distribution system is provided that includes a management node that determines a data node to store data from and distributes the data.
The state information of the data node may include overlapping storage information, the number of data nodes, a storage capacity of each data node, and a type pattern.
The management node is allocated to one data node when the data includes a plurality of type patterns, and is allocated to an empty data node when the data includes an undistributed type pattern. Replica can be created in the data node, and the replica can be distributed to neighboring data nodes by repeating the replica creation until the replica configuration is satisfied.
According to another aspect of the present invention, a data node information database in which information about connected data nodes is stored, a data analyzer for analyzing typed data and checking a type pattern and capacity, and searching the data node information database for searching A data node selector configured to identify data nodes having a type pattern set and to select a data node to store the data based on the identified state information of the data nodes; and a data distributor configured to distribute data to the selected data nodes. A management node is provided.
The data node information database may store at least one of overlapping storage information, the number of data nodes, a pattern type of each data node, and a storage capacity.
The data node selecting unit selects data nodes having a storage capacity greater than or equal to the capacity of the data from among the identified data nodes, or divides the data into a predetermined size when there are no data nodes greater than or equal to the capacity. Among the data nodes, data nodes larger than the capacity of the divided data may be selected.
The data node selector may be allocated to one data node when the data includes data of a plurality of type patterns, or to an empty data node when data includes a type pattern that is not distributed. As a result, the replica may be generated in the neighboring data node, and the replica may be repeatedly distributed to the neighboring data node until the replica setting is satisfied.
The management node may further include an updater configured to check state information of each data node in real time and update state information of each data node stored in the data node information database.
According to another aspect of the present invention, in a method in which a managed node distributes and stores data among a plurality of data nodes, analyzing the input data to identify type patterns and capacities; and searching the provided data node information database. Identifying the data nodes for which the identified type pattern is set, selecting a data node to store the data based on state information of the identified data nodes, and distributing data to the selected data nodes. A data distribution method is provided.
The data node information database may store at least one of overlapping storage information, the number of data nodes, a pattern type of each data node, and a storage capacity.
Selecting a data node to store the data on the basis of the confirmed state information of the data nodes, selecting data nodes having a storage capacity greater than or equal to the data capacity among the identified data nodes, If the node does not exist, the data may be divided into a predetermined size, and among the identified data nodes, data nodes that are larger than or equal to the capacity of the divided data may be selected.
The selecting of the data node to store the data on the basis of the confirmed status information of the data nodes may include: a type pattern not allocated or distributed to one data node when the data is data including a plurality of type patterns. In the case of the data including the data, the data may be allocated to the empty data node, but the replica may be generated in the neighboring data node according to the preset overlapping storage information, and the replica may be repeatedly distributed to the neighboring data node until the replica setting is satisfied.
According to the present invention, network IO time and join operations between nodes of a distributed parallel system can be minimized to reduce the response speed of the entire system.
In addition, by distributing and storing data in data nodes, query processing speed can be improved, and data replicas can be created to ensure fault tolerance.
In addition, network IO between data nodes can be minimized to speed up the overall task.
1 illustrates a data distribution system in accordance with the present invention.
Figure 2 is a block diagram schematically showing the configuration of a management node according to the present invention.
3 is a flow chart illustrating a method for a managed node to distribute data to a plurality of data nodes in accordance with the present invention.
The foregoing and other objects, features, and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.
1 is a diagram illustrating a data distribution system according to the present invention.
Referring to FIG. 1, a data distribution system includes a plurality of
Each data node 200 is preset with a type pattern of data to be stored according to a preset distribution rule. Therefore, the data node 200 stores data corresponding to the type pattern set to the data node 200.
The
When the input data is data including a plurality of type patterns, the
Detailed description of the
2 is a block diagram schematically illustrating a configuration of a management node according to the present invention.
Referring to FIG. 2, the
The data
The
The distribution rule stored in the
The
The
If there is no data node that is greater than or equal to the capacity of the data, the
In addition, the
In addition, when the overlapping storage information is set in the data
The
Although not shown in the drawing, the
The
3 is a flowchart illustrating a method for distributing data to a plurality of data nodes by a management node according to the present invention.
Referring to FIG. 3, when data is input (S302), the management node analyzes the input data and checks a type pattern and capacity (S304). That is, the management node analyzes the input data in a line unit or a predetermined size unit to check the type pattern and the capacity.
After performing the step S304, the management node searches the provided data node information database and checks the data nodes in which the identified type pattern is stored (S306). That is, the management node searches the data node information database and identifies data nodes in which the same type pattern as that of the data is set.
After performing S306, the management node selects a data node to store the data based on the confirmed state information of the data nodes (S308).
In this case, the management node compares the capacity of the data with the storage capacity of the identified data nodes, and selects data nodes having a storage capacity more than the data capacity. Then, the management node may allocate the data to one data node when the data includes a plurality of type patterns, and to the empty data node when the data includes a non-distributed type pattern.
When the overlapping storage information is set in the data node information database, the management node creates a replica in the neighboring data node according to the overlapping storage information, and repeats the replica generation until the replica setting satisfies to duplicate the data in the neighboring data node. Can be distributed.
After performing the step S308, the management node distributes and stores data to the selected data nodes (S310).
The management node determines whether all input data has been stored (S312), and if the storage is not completed, repeats from step S302.
Hereinafter, a method of distributing and storing data including a plurality of type patterns in a data node will be described as an example.
For example, as a result of analyzing the type pattern of the input data,
ID # 1 => Typepattern # 1,2
ID # 2 = (ID # 1 + ID # 3 + Typepattern # 5,6) = Typepattern # 1,2 + Typepattern # 8,9 + Typepattern # 5,6
ID # 3 => Typepattern # 8,9
ID # 4 => Typepattern # 10,11
ID # 5 => Typepattern # 12,13,14
ID # 6 => Typepattern # 15
The case where ID # 7 => Typepattern # 16, there are 5 data nodes, and the replica is 3 will be described using Table 1.
10,11
12,13,14
15
16
1,2,8,9,5,6
10,11
15
12,13,14
16
15
10,11
16
1,2,8,9,5,6
1,2,8,9,5,6
12,13,14
Referring to Table 1, the management node distributes data to a single data node in the case of data including a plurality of type patterns to shorten the number of joins. That is, Typepattern # 1,2,8,9,5,6 of ID # 2 is distributed to data node 1,
In addition, the management node distributes the data to the empty data node in the case of data including the undistributed type pattern.
That is, the management node distributes the unpatterned Typepattern # 15 with the ID # 6 to the data node 4 and distributes the Typepattern # 16 with the ID # 7 to the data node 5.
In addition, since the replica is set to 3, the management node generates a replica in the neighboring data node, and repeats the replica generation until the replica configuration is satisfied, and distributes the replica to the neighboring data node.
That is, the management node duplicates
Then, the management node duplicates Typepattern # 15 in data node 1, duplicates Typepattern # 16 in data node 2, duplicates
The method for data distribution can be written programmatically, and the codes and code segments constituting the program can be easily inferred by a programmer in the art.
Thus, those skilled in the art will appreciate that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the embodiments described above are to be considered in all respects only as illustrative and not restrictive. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.
100: management node 110: data analysis unit
120: data node selection unit 130: data node information DB
140: distribution rule DB 150: data distribution unit
200: data node
Claims (12)
Analyze the input data to confirm a type pattern, determine a data node to store the data based on state information of data nodes in which the same type pattern as the identified type pattern is set, and distribute the data to the determined data node. Include managed nodes,
The management node allocates to one data node when the data includes data having a plurality of type patterns, and assigns to an empty data node when the data includes data patterns that are not distributed.
Replica generation in the adjacent data node according to the preset overlapping storage information, and repeats the replica generation until the replica setting satisfies the data distribution system characterized in that to distribute the data to the adjacent data node.
The state information of the data node includes overlapping storage information, the number of data nodes, storage capacity and type pattern of each data node.
A data analyzer which analyzes the input data and checks a type pattern and a capacity;
A data node selector configured to search the data node information database to identify data nodes having the same type pattern as the identified type pattern and to select a data node to store the data based on state information of the identified data nodes; And
And a data distribution unit for distributing data to the selected data nodes.
The data node selecting unit allocates the data node to one data node when the data includes a plurality of type patterns, and allocates the data node to an empty data node when the data includes a non-distributed type pattern.
And a replica is generated in a neighboring data node according to preset overlapping storage information, and the replica is distributed to neighboring data nodes by repeating the replica generation until the replica setting is satisfied.
And at least one of overlapping storage information, a number of data nodes, a pattern type of each data node, and a storage capacity in the data node information database.
The data node selector selects data nodes having a storage capacity greater than or equal to the data capacity among the identified data nodes,
And when there are no data nodes above the capacity, splitting the data into a predetermined size and selecting data nodes that are larger than or equal to the capacity of the divided data among the identified data nodes.
And a updating unit which checks the state information of each data node in real time and updates the state information of each data node stored in the data node information database.
(a) analyzing the input data to identify a type pattern and a capacity;
(b) searching the provided data node information database to identify data nodes having the same type pattern as the identified type pattern; And
(c) selecting a data node to store the data based on the identified state information of the data nodes and distributing the data to the selected data nodes;
In the step (c), the data is allocated to one data node when the data includes a plurality of type patterns, and the data is allocated to an empty data node when the data includes an undistributed type pattern.
A replica is generated in a neighboring data node according to preset overlapping storage information, and the replica is repeatedly distributed to a neighboring data node until the replica setting is satisfied.
And at least one of overlapping storage information, a number of data nodes, a pattern type of each data node, and a storage capacity in the data node information database.
The step (c)
Selecting data nodes having a storage capacity greater than or equal to the capacity of the data from among the identified data nodes, or dividing the data into a predetermined size if there are no data nodes greater than or equal to the data capacity, and among the identified data nodes Selecting data nodes that are greater than or equal to the capacity of the divided data; And
Distributing data to the selected data nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020120083209A KR101269428B1 (en) | 2012-07-30 | 2012-07-30 | System and method for data distribution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020120083209A KR101269428B1 (en) | 2012-07-30 | 2012-07-30 | System and method for data distribution |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101269428B1 true KR101269428B1 (en) | 2013-05-30 |
Family
ID=48667188
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020120083209A KR101269428B1 (en) | 2012-07-30 | 2012-07-30 | System and method for data distribution |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101269428B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170040995A (en) * | 2015-10-06 | 2017-04-14 | 삼성전자주식회사 | Method and apparatus for analyzing interaction network |
US9934325B2 (en) | 2014-10-20 | 2018-04-03 | Korean Institute Of Science And Technology Information | Method and apparatus for distributing graph data in distributed computing environment |
KR101927658B1 (en) * | 2018-05-16 | 2019-03-12 | 양동국 | A System of Water Treatment Management Using PLC Data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004252663A (en) | 2003-02-19 | 2004-09-09 | Toshiba Corp | Storage system, sharing range deciding method and program |
JP2012123544A (en) | 2010-12-07 | 2012-06-28 | Nippon Hoso Kyokai <Nhk> | Load distribution device and program |
-
2012
- 2012-07-30 KR KR1020120083209A patent/KR101269428B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004252663A (en) | 2003-02-19 | 2004-09-09 | Toshiba Corp | Storage system, sharing range deciding method and program |
JP2012123544A (en) | 2010-12-07 | 2012-06-28 | Nippon Hoso Kyokai <Nhk> | Load distribution device and program |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9934325B2 (en) | 2014-10-20 | 2018-04-03 | Korean Institute Of Science And Technology Information | Method and apparatus for distributing graph data in distributed computing environment |
KR20170040995A (en) * | 2015-10-06 | 2017-04-14 | 삼성전자주식회사 | Method and apparatus for analyzing interaction network |
KR102183089B1 (en) * | 2015-10-06 | 2020-11-25 | 삼성전자주식회사 | Method and apparatus for analyzing interaction network |
KR101927658B1 (en) * | 2018-05-16 | 2019-03-12 | 양동국 | A System of Water Treatment Management Using PLC Data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10002148B2 (en) | Memory-aware joins based in a database cluster | |
CN104598376B (en) | The layering automatization test system and method for a kind of data-driven | |
US8140625B2 (en) | Method for operating a fixed prefix peer to peer network | |
CN112163048A (en) | Method and device for realizing OLAP analysis based on ClickHouse | |
CN110032549B (en) | Partition splitting method, partition splitting device, electronic equipment and readable storage medium | |
CN103678609A (en) | Large data inquiring method based on distribution relation-object mapping processing | |
CN102932415A (en) | Method and device for storing mirror image document | |
CN104423960A (en) | Continuous project integration method and continuous project integration system | |
CN105683940A (en) | Processing a data flow graph of a hybrid flow | |
CN107239468B (en) | Task node management method and device | |
Wang et al. | BENU: Distributed subgraph enumeration with backtracking-based framework | |
CN103902544A (en) | Data processing method and system | |
CN104871153A (en) | System and method for flexible distributed massively parallel processing (mpp) database | |
KR101269428B1 (en) | System and method for data distribution | |
US10452685B2 (en) | Method and apparatus for replicating data | |
CN105045917A (en) | Example-based distributed data recovery method and device | |
CN103971036A (en) | Page field access control system and method | |
CN105556474A (en) | Managing memory and storage space for a data operation | |
CN108062314B (en) | Dynamic sub-table data processing method and device | |
CN102385588A (en) | Method and system for improving performance of data parallel insertion | |
CN101673374A (en) | Bill processing method and device | |
CN102207935A (en) | Method and system for establishing index | |
Lwin et al. | Non-redundant dynamic fragment allocation with horizontal partition in Distributed Database System | |
CN107239568A (en) | Distributed index implementation method and device | |
CN111858739A (en) | Mapreduce-based data aggregation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
A302 | Request for accelerated examination | ||
E902 | Notification of reason for refusal | ||
AMND | Amendment | ||
E601 | Decision to refuse application | ||
AMND | Amendment | ||
X701 | Decision to grant (after re-examination) | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20160406 Year of fee payment: 4 |
|
FPAY | Annual fee payment |
Payment date: 20170327 Year of fee payment: 5 |
|
FPAY | Annual fee payment |
Payment date: 20181030 Year of fee payment: 6 |