CN113626098B - Data node dynamic configuration method based on information interaction - Google Patents

Data node dynamic configuration method based on information interaction Download PDF

Info

Publication number
CN113626098B
CN113626098B CN202110823675.6A CN202110823675A CN113626098B CN 113626098 B CN113626098 B CN 113626098B CN 202110823675 A CN202110823675 A CN 202110823675A CN 113626098 B CN113626098 B CN 113626098B
Authority
CN
China
Prior art keywords
node
heartbeat
task
nodes
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110823675.6A
Other languages
Chinese (zh)
Other versions
CN113626098A (en
Inventor
张经宇
舒政文
王菲菲
王进
李文军
何施茗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202110823675.6A priority Critical patent/CN113626098B/en
Publication of CN113626098A publication Critical patent/CN113626098A/en
Application granted granted Critical
Publication of CN113626098B publication Critical patent/CN113626098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services

Abstract

The invention discloses a data node dynamic configuration method based on information interaction, which comprises the following steps: setting related parameters in an HDFS configuration file; step two: the data node n sends heartbeat information at an initial heartbeat interval; step three: respectively acquiring the number of data blocks and the node access probability of the data node n in the first k continuous periods of the period of the current moment i; step four: calculating the average data block number of the cluster node in the first k continuous periods including the current period i; step five: calculating the average access heat of the cluster nodes in the first k continuous periods including the current period i; step six: classifying nodes in the cluster; step seven: the node dynamically adjusts the new heartbeat interval time to provide bandwidth services with different weights for the heartbeat transmission of the node; step eight: repeating the third step to the seventh step; the invention enhances the real-time property of cluster data update and the utilization rate of cluster resources; network bandwidth resources are fully utilized.

Description

Data node dynamic configuration method based on information interaction
Technical Field
The invention is particularly applied to communication among big data distributed nodes, belongs to the technical field of distribution, and particularly relates to a data node dynamic configuration method based on information interaction.
Background
In order to process the increasing data volume, a plurality of computers are adopted to jointly process the current data, and the plurality of computers form a distributed architecture. Because the distance between different nodes in the distributed system architecture is far too, information interaction and data transmission between the nodes need to be realized through remote communication based on the RPC framework, and a heartbeat mechanism in the bottom RPC framework is important for the information interaction between the nodes of the whole system.
One HDFS distributed file includes a master node (NameNode) and a slave node (DataNode). The NameNode is mainly used for managing and maintaining storage of all metadata in a distributed file system and is responsible for mapping information and access operation between data files and the DataNodes, and the DataNodes in different geographic positions in a cluster manage and store actual data.
However, in the prior art, the heartbeat interval between the home node and the slave node is fixedly set in the configuration file of the HDFS, the fixed period is 3s, and the heartbeat transmission bandwidth between the nodes is not dynamically divided. Thus, there are the following drawbacks ①: the fixed time for sending the heartbeat is preset in the HDFS configuration file, the heartbeat interval cannot be dynamically adjusted according to the data block stored by each node and the accessed frequency of the node, if the nodes need frequent information interaction, the time of the heartbeat interval needs to be reduced, the real-time performance of data updating is increased, otherwise, the sending time of the heartbeat interval needs to be increased, the pressure of a main node is reduced, the consumption of a cluster is reduced, and ②: the heartbeat transmission task of the node cannot be dynamically allocated with network bandwidth, and for the node frequently carrying out information interaction, the bandwidth with higher weight is provided to accelerate the transmission speed of the heartbeat task, so that the timeliness of the main node for transmitting the task is improved, and the utilization rate of resources is improved. For the nodes with few information interaction, the bandwidth with lower weight is provided to reduce the transmission speed of the heartbeat task, so that excessive heartbeat information is prevented from being repeatedly processed, and the consumption of cluster resources is reduced.
Disclosure of Invention
Because the prior art does not consider the data blocks stored by the nodes and the probability of the nodes being accessed, the fixed interval time of the heartbeat is 3s, and the bandwidth of the heartbeat transmission is not dynamically allocated, which is not beneficial to the performance of the HDFS system. Aiming at the defects of the prior art, a new heartbeat interval mechanism is provided, the nodes are classified through the data blocks stored by the nodes and the accessed frequency, the heartbeat interval time is dynamically adjusted, an HDFS system is deployed in the SDN framework, the QoS service provided by the HDFS system provides network bandwidth resources with higher weight for the hot nodes, network congestion is avoided, the real-time property of cluster data updating is ensured, the network bandwidth resources with lower weight are provided for the cold nodes, the consumption of the cluster resources is reduced, and excessive useless information is avoided.
The invention aims to provide a data node dynamic configuration method based on information interaction, which solves the problems of data updating instantaneity and resource utilization rate of a cluster.
The invention is realized in such a way that a data node dynamic configuration method based on information interaction comprises the following steps:
Step one: setting related parameters in an HDFS configuration file;
step two: the data node n sends heartbeat information at an initial heartbeat interval;
Step three: respectively acquiring the number of data blocks and the node access probability of the previous k continuous periods of the DataNode node n including the current period i, and carrying out heartbeat tasks by the node in the previous i-1 periods according to the default heartbeat interval and bandwidth of the system;
Step four: calculating the average data block number of the cluster node in the first k continuous periods including the current period i;
Step five: calculating the average access heat of the cluster nodes in the first k continuous periods including the current period i;
step six: classifying the nodes in the cluster, and dividing the nodes into cold nodes, hot nodes and general nodes;
Taking heartbeat transmission corresponding to a hot node as an important task, putting a task ni into a queue q0 corresponding to QoS, wherein task ni represents a heartbeat transmission task corresponding to a node n in an ith period, and the q0 queue sets the lowest limit of a link bandwidth resource to accelerate the transmission of heartbeat information;
Regarding heartbeat information corresponding to the cold node as an unimportant task, placing task ni in a queue q1 corresponding to QoS, setting the uppermost limit of link bandwidth resources in the q1 queue, reducing cluster resource consumption and reducing system power consumption;
Regarding heartbeat information corresponding to a general node as a general task, placing task ni in a queue q2 corresponding to QoS, and setting an upper limit and a lower limit of bandwidth resources for the task;
Step seven: the node dynamically adjusts the new heartbeat interval time and provides bandwidth services with different weights for the heartbeat transmission of the node;
Step eight: repeating the third step to the seventh step.
The invention further adopts the technical scheme that: calculating the average data block number of the cluster nodes in the fourth step: wherein Node avg represents an average data block of k consecutive periods before the cluster Node including the current period i, and N represents the total number of nodes of the cluster.
The invention further adopts the technical scheme that: calculating the average access heat of cluster nodes in the fifth step: Wherein Node hot represents the average access heat of the cluster nodes in the previous k periods including the current period i, and N represents the total Node number of the cluster.
The invention further adopts the technical scheme that: in the sixth step, the nodes in the cluster are classified, ifIndicating that the node is a hot node,
If it isIndicating that the node is a cold node;
Wherein: listnode countni denotes the number of data blocks corresponding to the i-th cycle of node n, listnode hotni denotes the access frequency corresponding to the i-th cycle of node n, T' ni denotes the heartbeat interval of node n in the ith period, α denotes the default heartbeat factor, δ denotes the data block impact factor, and β denotes the frequency impact factor.
The invention further adopts the technical scheme that: in the seventh step, through dynamic heartbeat adjustment and bandwidth resource weight allocation of s periods, nodes belong to hot nodes in m periods, and nodes belong to cold nodes in m-1 periods, wherein m & gt s; when a Node is converted from a cold Node to a hot Node, calculating a variance omega and an average value rho of heartbeat intervals of the hot Node in a q0 queue, wherein Node threshold is the minimum value of the heartbeat intervals of the Node in the q0 queue and the heartbeat interval of the current Node, if the heartbeat information of |rho-t' - nm|≤ω*Nodethreshold,Nodenm is an important task, the task service is the queue q0, a bandwidth with a higher weight is provided for the Node, task nm∈q0,Nodenm∈Nodehot,tasknm epsilon q0 indicates that the heartbeat transmission task of the Node n in the m-th period belongs to the queue q0, and Node nm∈Nodehot indicates that the Node n belongs to the hot Node in the m-th period. The heartbeat information of the I rho-t nm|>ω*Nodethreshold,Nodenm is a general task, the task service is a queue q2, the bandwidth of the general weight is provided for the node, and the task nm epsilon q2.
The invention further adopts the technical scheme that: in the seventh step, through dynamic heartbeat adjustment and bandwidth resource weight allocation of s periods, nodes belong to cold nodes in m periods, and nodes belong to hot nodes in m-1 periods, wherein m & gt s; when a Node is converted from a hot Node to a cold Node, calculating a variance omega and an average value rho of the heartbeat interval of a common Node in a q2 queue, wherein Node threshold is the maximum value of the heartbeat interval of the Node in the q2 queue and the heartbeat interval of the current Node, if the heartbeat information of |rho-t' - nm|≤ω*Nodethreshold,Nodenm is an important task, the task service is a queue q0, and a bandwidth with a higher weight is provided for the Node, and task nm∈q0,Nodenm∈Nodehot; if |ρ -t' nm|>ω*Nodethreshold and task n(m-1))∈q2&&taskn(m-2)) ∈q2, the heartbeat information of Node nm∈Nodecold,Nodenm is an unimportant task, the task service is queue q1, the bandwidth with lower weight is provided for the Node, task nm ∈q1; if none of the above is true, the heartbeat information of the Node nm is a general task, the task service is a queue q2, the bandwidth of the general weight is provided for the Node, and the task nm epsilon q2.
The invention further adopts the technical scheme that: and in the first step, setting a fixed heartbeat interval sending time as a related parameter.
The invention has the beneficial effects that: the invention dynamically adjusts the heartbeat interval by the data blocks stored by the nodes and the accessed frequency of the nodes, classifies the nodes, dynamically adjusts the bandwidth resources in the network, provides the bandwidth with higher weight for the hot nodes to accelerate the transmission time of the heartbeat, and provides the bandwidth with lower weight for the cold nodes to reduce the consumption of heartbeat transmission.
According to the information interaction condition, the heartbeat interval time is dynamically adjusted, and the network bandwidth is dynamically allocated to the nodes, so that the real-time performance of cluster data update and the utilization rate of cluster resources are enhanced. Dynamic adjustment of the adaptive heartbeat mechanism and full utilization of network bandwidth resources.
Drawings
FIG. 1 is a flow chart provided by the present invention;
FIG. 2 is a flow chart of the present invention at the time of a cold-hot node transition;
fig. 3 is a schematic diagram of an interaction method in a first embodiment of the present invention.
Detailed Description
The disadvantage ① in the background art is caused because the existing heartbeat interval is 3s, and static configuration is performed in the HDFS file, and the difference of the data blocks stored by the nodes and the access frequency is not considered. The reason for the disadvantage ② is that the nodes are not classified and the network resources are dependent on congestion control of the underlying TCP, and the current cluster resource management mainly focuses on computing resources and storage resources, involves less network resources, and cannot provide effective network bandwidth allocation for different nodes. Although research in the related aspects takes the heat and bandwidth of the nodes into consideration when affecting the heartbeat interval, the nodes are not dynamically divided, the nodes should provide the bandwidth with higher weight only when performing frequent interaction, and the heartbeat interval of the nodes is reduced if the bandwidth is sufficient, but the nodes may be cold nodes, frequent heartbeat information causes excessive and useless messages, and consumption of cluster resources is increased. Aiming at the problems, the invention dynamically allocates the bandwidth resources in the network according to the classification of the cold and hot nodes, and more reasonably allocates the network bandwidth, so that the problems are solved.
The invention provides a data node dynamic configuration method based on information interaction, which comprises the following steps:
1: setting related initial parameters in a configuration file of the HDFS system, wherein the initial parameters comprise heartbeat intervals.
2: The data node sends heartbeat information to the Namenode at a default heartbeat interval, reports the storage capacity of the node and the state information of the node, and the master node receives the heartbeat packet from the slave node, copies and deletes the data block according to task information submitted by a user and updates the state information of the slave node.
3: The node performs heartbeat tasks with default heartbeat intervals and bandwidths in the first i periods.
4: The number of data blocks and the node access frequency of the data node slave nodes in the first k continuous periods including the current period i are respectively obtained, and the information interaction condition of the slave nodes can be obtained, and the number of times that the slave nodes are accessed by users in unit time can be obtained. The value of k can be dynamically adjusted according to the information interaction condition.
5. The heartbeat interval of the current period can be dynamically adjusted through the data blocks stored by the slave nodes in the previous periods and the accessed probability, and the network bandwidth of the heartbeat transmission is dynamically allocated.
6. The average number of data blocks of the cluster node in the first k consecutive periods including the current period i is calculated,Wherein Node avg represents an average data block of k consecutive periods before the cluster Node including the current period i, and N represents the total number of nodes of the cluster.
7. The average access heat of the cluster nodes in the first k consecutive periods including the current period i,Wherein Node hot represents the average access heat of the cluster nodes in the previous k periods including the current period i, and N represents the total Node number of the cluster.
8. Classifying nodes in a cluster ifIndicating that the node is a hot node,
If it isIndicating that the node is a cold node;
Wherein: listnode countni denotes the number of data blocks corresponding to the i-th cycle of node n, listnode hotni denotes the access frequency corresponding to the i-th cycle of node n, T' ni denotes the heartbeat interval of node n in the ith period, and α denotes the default heartbeat factor. If the node is a hot node, the node needs to perform frequent information interaction, the interval of heartbeat transmission is reduced, the network bandwidth is increased to perform the transmission of heartbeat tasks, and the real-time performance of cluster data updating is improved. If the node is a cold node, the node does not need frequent information interaction, the interval of heartbeat transmission is increased, excessive useless heartbeat information is prevented, and the consumption of cluster resources is increased.
9. By disposing the HDFS system in the SDN architecture, regarding heartbeat transmission corresponding to a hot node as an important task, placing a task ni in a queue q0 corresponding to QoS, wherein the task ni represents a heartbeat transmission task corresponding to a node n in an ith period, and the q0 queue sets the lowest limit of a link bandwidth resource so as to accelerate the transmission of heartbeat information; the utilization rate of resources is improved, and the real-time performance of data updating is quickened.
10. Regarding heartbeat information corresponding to the cold node as an unimportant task, placing task ni in a queue q1 corresponding to QoS, setting the uppermost limit of link bandwidth resources in the q1 queue, reducing cluster resource consumption and reducing system power consumption;
11. regarding heartbeat information corresponding to a general node as a general task, placing task ni in a queue q2 corresponding to QoS, and setting an upper limit and a lower limit of bandwidth resources for the task; and the phenomenon that the node is dying or resources are wasted due to too long or too slow heartbeat transmission time is avoided.
12. Through the dynamic heartbeat adjustment of s periods and the distribution of bandwidth resource weights, nodes belong to hot nodes in the time of m periods, and nodes belong to cold nodes in m-1 periods, wherein m & gt s; when a Node is converted from a cold Node to a hot Node, calculating a variance omega and an average value rho of heartbeat intervals of the hot Node in a q0 queue, wherein Node threshold is the minimum value of the heartbeat intervals of the Node in the q0 queue and the heartbeat interval of the current Node, if the heartbeat information of |rho-t' - nm|≤ω*Nodethreshold,Nodenm is an important task, the task service is the queue q0, a bandwidth with a higher weight is provided for the Node, task nm∈q0,Nodenm∈Nodehot,tasknm epsilon q0 indicates that the heartbeat transmission task of the Node n in the m-th period belongs to the queue q0, and Node nm∈Nodehot indicates that the Node n belongs to the hot Node in the m-th period; the heartbeat information of the I rho-t nm|>ω*Nodethreshold,Nodenm is a general task, the task service is a queue q2, the bandwidth of the general weight is provided for the node, and the task nm epsilon q2.
13. Through the dynamic heartbeat adjustment of s periods and the distribution of bandwidth resource weights, nodes belong to cold nodes in the time of m periods, and nodes belong to hot nodes in m-1 periods, wherein m & gt s; when a Node is converted from a hot Node to a cold Node, the variance omega and the average value rho of the heartbeat interval of a common Node in the q2 queue are calculated, the Node threshold is the maximum value of the heartbeat interval of the Node in the q2 queue and the heartbeat interval of the current Node, if the heartbeat information of the |rho-t' - nm|≤ω*Nodethreshold,Nodenm is an important task, the task service is the queue q0, the bandwidth with a higher weight is provided for the Node, and the task nm∈q0,Nodenm∈Nodehot. If |ρ -t' nm|>ω*Nodethreshold and task n(m-1))∈q2&&taskn(m-2)) ∈q2, the heartbeat information of Node nm∈Nodecold,Nodenm is an unimportant task, the task service is queue q1, the bandwidth with lower weight is provided for the Node, and task nm ∈q1. If none of the above is true, the heartbeat information of the Node nm is a general task, the task service is a queue q2, the bandwidth of the general weight is provided for the Node, and the task nm epsilon q2.
In order to make the above-described embodiments of the present invention more comprehensible to those skilled in the art, specific examples of transactions are described below.
Embodiment one:
Assuming that there are four DataNode slave nodes, the initial heartbeat α=3s, the data blocks stored by DataNode 1 in the previous t i periods are respectively 22, 17, 28, the access frequency of node f 1i is respectively 0.6, 0.9, 0.7, 0.8, the data blocks stored by DataNode 2 in the previous t i periods are respectively 7, 6, 12, 10, the access frequency of node f 2i is respectively 0.3, 0.5, 0.4, 0.3, the data blocks stored by DataNode 3 in the previous t i periods are respectively 21, 16, 24, 20, the access frequency of node f 3i is respectively 0.3, 0.5, 0.8, 0.9, the access frequency of DataNode 4 in the previous t i periods is respectively 3,6, 11, 20, the access frequency of node f 4i is respectively 0.4, 0.5, 0.4, delta indicates the influence factor of the data blocks is respectively 21, 16, 24, 20, and delta indicates the influence factor of the node is 0.4, and β=0.4.
The total data block of the previous i-1 three periods of the computing node isLet i=4, w=173, the number of data blocks per node in the third period is Listnodecount={Listnodecount1,Listnodecount2,Listnodecount3,Listnodecount4}={28,12,24,11},
The average data block of the node is
The average access heat of the computing node in the first three periods is as follows:
Node hot =0.55, and the data block access frequency of each Node in the third period is: listnode hot={nodehot1,nodehot2,nodehot3,nodehot4 = {0.7,0.4,0.8,0.5}
For Node 1, ifNode 1 is a hot Node, then Node 1 is a heartbeat interval/>, in the third cycle
t′=1.95s.
For the Node 2,Node 2 is a cold Node, then Node 2
Heartbeat interval in the third cycle
In QoS service, for the heartbeat information transmission of the hot node, the lower limit of bandwidth resource is set for the important task, which is the least bandwidth resource that can be occupied, under the condition that network resource is not occupied, more bandwidth can be obtained, for the heartbeat information transmission of the cold node, the upper line of bandwidth resource is set for the unimportant task, ensuring that the heartbeat information transmission of the cold node does not occupy more bandwidth to influence the completion of other tasks, for the heartbeat information of the cold node, the upper limit and the lower limit of bandwidth resource are set for the heartbeat information transmission of the cold node, the heartbeat transmission time is prevented from being too long or too slow, the heartbeat transmission task in the node is dynamically allocated to the bandwidth, and the network performance isolation among different nodes is realized. The bandwidth in the network is 100Mbps, three queues are set to respectively represent bandwidth resources owned by the cold node, the hot node and the heartbeat transmission task which is not the cold node, and q0 is min=60 Mbps, and q1 is max=40 Mbps, and q2 is: the service corresponding to q0 is equal to or less than 40Mbps and equal to or less than 60Mbps, the service corresponding to q1 is the heartbeat task of the cold node, the service corresponding to q2 is the common heartbeat task, q0 represents the lowest bandwidth resource 60Mbps of the heartbeat task of the hot node, q1 represents the highest bandwidth resource of the heartbeat task of the cold node is 40Mbps.q2 represents the bandwidth resource of the heartbeat task of the node between 40Mbps and 60 Mbps. As shown in fig. 3.
Every time when a cold Node is converted into a hot Node, heartbeat information at the time is an important task, bandwidth resources with higher weight are required to be provided, the nodes are accessed in a period of time and possibly are accessed continuously in a later period of time, but in order to prevent sudden access of data and waste of network bandwidth resources, consumption of cluster resources is increased, variance omega and average value rho of heartbeat intervals of the hot Node in a q0 queue are calculated, if |rho-t' | is less than or equal to omega-Node threshold, heartbeat tasks corresponding to the nodes are put into the queue q0 and are regarded as important tasks, bandwidth resources with higher weight are provided, and heartbeat transmission time is shortened. Otherwise, the heartbeat task corresponding to the node is put in a queue q2, and if the node belongs to the cold node in the last period, the heartbeat task corresponding to the node is directly put in the queue q 1. Node threshold is the minimum value of the heartbeat interval of the Node in the q0 queue and the heartbeat interval of the current Node.
Every time when a hot Node is converted into a cold Node, the Node is not frequently accessed for a period of time, and may be suddenly frequently accessed for a period of time later, when the hot Node is to become the cold Node, a buffer period is given to the hot Node, variance omega and average value p of heartbeat intervals of common nodes in a q2 queue are calculated, if |ρ -t' | is less than or equal to ω x Node threshold, heartbeat tasks corresponding to the Node are put into a queue q0, the Node still belongs to the hot Node, if the heartbeat tasks of the Node in the first two periods belong to a queue q2, the heartbeat tasks of the Node are put into a queue q1, and the Node becomes the cold Node. None of the above is true, the heartbeat task corresponding to the Node is put into the queue q2, and Node threshold is the maximum value of the heartbeat interval of the Node in the q2 queue and the heartbeat interval of the current Node.
According to the information interaction condition, the invention dynamically adjusts the heartbeat interval time and efficiently distributes network bandwidth for the nodes so as to enhance the real-time performance of cluster data update and the utilization rate of cluster resources.
Comparative examples:
Assuming that a certain cluster has two DataNode slave nodes, the data blocks stored in the previous t i periods of DataNode 1 are respectively 12, 23, 34 and 20, the access frequencies of the access frequencies f 1i of the nodes are respectively 0.6, 0.5, 0.7 and 0.8, the data blocks stored in the previous t i periods of DataNode 2 are respectively 7, 6, 12 and 14, the access frequencies of the access frequencies f 2i of the nodes are respectively 0.3, 0.5, 0.4 and 0.6,
Case: because the heartbeat sent by the DataNode to the Namenode is fixedly set in the HDFS configuration file, the time between the heartbeat sent by the node 1 and the heartbeat sent by the node 2 is 3s, and the bandwidth allocated by the heartbeat transmission is the same, but the data blocks stored by the node 1 and the accessed frequency are obviously seen through calculation, so that the DataNode needs to perform frequent information interaction with the Namenode, the heartbeat transmission interval needs to be reduced, the network bandwidth is increased to enhance the real-time of data update, the utilization rate of resources is improved, the probability of the data blocks stored by the node 2 and the accessed frequency is smaller, the information interaction between the DataNode and the Namenode is less, the heartbeat transmission interval needs to be increased, repeated heartbeat messages are avoided, the bandwidth resources in the network are reduced, and the consumption of cluster resources is reduced.
The root cause of the above results is that the existing heartbeat interval time has a larger problem, because the bandwidth resources among all nodes cannot be effectively allocated due to the fact that the existing heartbeat interval time is fixedly arranged in the configuration file of the HDFS, and the problems are solved by dynamically allocating the bandwidth resources in the network according to the classification of the cold and hot nodes, more reasonably allocating the network bandwidth, improving the utilization rate of cluster resources and reducing the consumption of the cluster resources.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (7)

1. The data node dynamic configuration method based on information interaction is characterized by comprising the following steps:
Step one: setting related parameters in an HDFS configuration file;
step two: the data node n sends heartbeat information at an initial heartbeat interval;
Step three: respectively acquiring the number of data blocks and the node access probability of the previous k continuous periods of the DataNode node n including the current period i, and carrying out heartbeat tasks by the node in the previous i-1 periods according to the default heartbeat interval and bandwidth of the system;
Step four: calculating the average data block number of the cluster node in the first k continuous periods including the current period i;
Step five: calculating the average access heat of the cluster nodes in the first k continuous periods including the current period i;
step six: classifying the nodes in the cluster, and dividing the nodes into cold nodes, hot nodes and general nodes;
Taking heartbeat transmission corresponding to a hot node as an important task, putting a task ni into a queue q0 corresponding to QoS, wherein task ni represents a heartbeat transmission task corresponding to a node n in an ith period, and the q0 queue sets the lowest limit of a link bandwidth resource to accelerate the transmission of heartbeat information;
Regarding heartbeat information corresponding to the cold node as an unimportant task, placing task ni in a queue q1 corresponding to QoS, setting the uppermost limit of link bandwidth resources in the q1 queue, reducing cluster resource consumption and reducing system power consumption;
Regarding heartbeat information corresponding to a general node as a general task, placing task ni in a queue q2 corresponding to QoS, and setting an upper limit and a lower limit of bandwidth resources for the task;
Step seven: the node dynamically adjusts the new heartbeat interval time and provides bandwidth services with different weights for the heartbeat transmission of the node;
Step eight: repeating the third step to the seventh step;
Calculating the average data block number of the cluster nodes in the fourth step:
wherein Node avg represents an average data block of k continuous periods before the cluster Node including the current period i, and N represents the total Node number of the cluster; j represents the number of data blocks of the node in i-k periods;
Calculating the average access heat of cluster nodes in the fifth step: wherein Node hot represents the average access heat of the cluster nodes in the previous k periods including the current period i, N represents the total Node number of the cluster, and j represents the data block number of the nodes in i-k periods;
In the sixth step, the nodes in the cluster are classified, if Indicating that the node is a hot node,
If it isIndicating that the node is a cold node;
Wherein: listnode countni denotes the number of data blocks corresponding to the i-th cycle of node n, listnode hotni denotes the access frequency corresponding to the i-th cycle of node n, T' ni denotes the heartbeat interval of node n in the ith period, α denotes the default heartbeat factor, δ denotes the data block impact factor, and β denotes the frequency impact factor.
2. The method for dynamically configuring data nodes based on information interaction according to claim 1, wherein in the seventh step, through dynamic adjustment of heartbeat and allocation of bandwidth resource weight values in s periods, nodes belong to hot nodes in m periods, and nodes belong to cold nodes in m-1 periods, wherein m is greater than s; when a Node is converted from a cold Node to a hot Node, calculating a variance omega and an average value rho of the heartbeat interval of the hot Node in the q0 queue, wherein Node threshold is the minimum value of the heartbeat interval of the Node in the q0 queue and the heartbeat interval of the current Node, and if |rho-t' - nm|≤ω*Nodethreshold,t`nm represents the heartbeat interval time of the Node n in the m-th period; the heartbeat information of Node nm is an important task, the task service is queue q0, the task nm∈q0,Nodenm∈Nodehot,tasknm epsilon q0 provides a higher weight bandwidth for the Node, the heartbeat transmission task of the Node n in the mth period belongs to the queue q0, and the Node nm∈Nodehot represents that the Node n belongs to a hot Node in the mth period.
3. The method for dynamically configuring data nodes based on information interaction according to claim 2, wherein the heartbeat information of the i ρ -t' nm|>ω*Nodethreshold,Nodenm is a general task, the task service is a queue q2, the bandwidth of the general weight is provided for the nodes, and the task nm e q2.
4. The method for dynamically configuring data nodes based on information interaction according to claim 1, wherein in the seventh step, through dynamic adjustment of heartbeat and allocation of bandwidth resource weight values in s periods, nodes belong to cold nodes in m periods, and nodes belong to hot nodes in m-1 periods, wherein m is greater than s; when a Node is converted from a hot Node to a cold Node, the variance omega and the average value rho of the heartbeat interval of a common Node in the q2 queue are calculated, the Node threshold is the maximum value of the heartbeat interval of the Node in the q2 queue and the heartbeat interval of the current Node, if the heartbeat information of the |rho-t' - nm|≤ω*Nodethreshold,Nodenm is an important task, the task service is the queue q0, the bandwidth with a higher weight is provided for the Node, and the task nm∈q0,Nodenm∈Nodehot.
5. The method of claim 4, wherein if |ρ -t' nm|>ω*Nodethreshold and (task n(m-1))∈q2&&(taskn(m-2)) e q2, the heartbeat information of Node nm∈Nodecold,Nodenm is an unimportant task, the task service is queue q1, the bandwidth of lower weight is provided for the Node, and task nm e q1.
6. The method for dynamically configuring data nodes based on information interaction according to claim 5, wherein if none of the above is satisfied, the heartbeat information of Node nm is a general task, the task service is queue q2, the bandwidth of the general weight is provided for the Node, and task nm e q2.
7. The method for dynamically configuring data nodes based on information interaction according to any one of claims 1 to 6, wherein the relevant parameter in the step one is to set a fixed heartbeat interval transmission time.
CN202110823675.6A 2021-07-21 2021-07-21 Data node dynamic configuration method based on information interaction Active CN113626098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823675.6A CN113626098B (en) 2021-07-21 2021-07-21 Data node dynamic configuration method based on information interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823675.6A CN113626098B (en) 2021-07-21 2021-07-21 Data node dynamic configuration method based on information interaction

Publications (2)

Publication Number Publication Date
CN113626098A CN113626098A (en) 2021-11-09
CN113626098B true CN113626098B (en) 2024-05-03

Family

ID=78380391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823675.6A Active CN113626098B (en) 2021-07-21 2021-07-21 Data node dynamic configuration method based on information interaction

Country Status (1)

Country Link
CN (1) CN113626098B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421478B1 (en) * 2002-03-07 2008-09-02 Cisco Technology, Inc. Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration
CN102541645A (en) * 2012-01-04 2012-07-04 北京航空航天大学 Dynamic adjustment method for node task slot based on node state feedbacks
CN102739785A (en) * 2012-06-20 2012-10-17 东南大学 Method for scheduling cloud computing tasks based on network bandwidth estimation
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103763155A (en) * 2014-01-24 2014-04-30 国家电网公司 Multi-service heartbeat monitoring method for distributed type cloud storage system
CN104063501A (en) * 2014-07-07 2014-09-24 电子科技大学 Copy balancing method based HDFS
CN104333465A (en) * 2014-10-31 2015-02-04 北京奇虎科技有限公司 Heartbeat interval setting method, device and system
CN104978236A (en) * 2015-07-07 2015-10-14 四川大学 HDFS load source and sink node selection method based on multiple measurement indexes
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN108829720A (en) * 2018-05-07 2018-11-16 麒麟合盛网络技术股份有限公司 Data processing method and device
CN109697193A (en) * 2017-10-24 2019-04-30 中兴通讯股份有限公司 A kind of method, node and the computer readable storage medium of determining abnormal nodes
CN110545315A (en) * 2019-08-14 2019-12-06 长沙理工大学 heartbeat interval adjusting method based on data block quantity change and bandwidth change
CN110958154A (en) * 2019-11-06 2020-04-03 长沙理工大学 Heartbeat interval dynamic adjustment method, device and system based on node heat

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421478B1 (en) * 2002-03-07 2008-09-02 Cisco Technology, Inc. Method and apparatus for exchanging heartbeat messages and configuration information between nodes operating in a master-slave configuration
CN102541645A (en) * 2012-01-04 2012-07-04 北京航空航天大学 Dynamic adjustment method for node task slot based on node state feedbacks
CN102739785A (en) * 2012-06-20 2012-10-17 东南大学 Method for scheduling cloud computing tasks based on network bandwidth estimation
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103763155A (en) * 2014-01-24 2014-04-30 国家电网公司 Multi-service heartbeat monitoring method for distributed type cloud storage system
CN104063501A (en) * 2014-07-07 2014-09-24 电子科技大学 Copy balancing method based HDFS
CN104333465A (en) * 2014-10-31 2015-02-04 北京奇虎科技有限公司 Heartbeat interval setting method, device and system
CN104978236A (en) * 2015-07-07 2015-10-14 四川大学 HDFS load source and sink node selection method based on multiple measurement indexes
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load
CN109697193A (en) * 2017-10-24 2019-04-30 中兴通讯股份有限公司 A kind of method, node and the computer readable storage medium of determining abnormal nodes
CN107832153A (en) * 2017-11-14 2018-03-23 北京科技大学 A kind of Hadoop cluster resources self-adapting distribution method
CN108829720A (en) * 2018-05-07 2018-11-16 麒麟合盛网络技术股份有限公司 Data processing method and device
CN110545315A (en) * 2019-08-14 2019-12-06 长沙理工大学 heartbeat interval adjusting method based on data block quantity change and bandwidth change
CN110958154A (en) * 2019-11-06 2020-04-03 长沙理工大学 Heartbeat interval dynamic adjustment method, device and system based on node heat

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
" 不依赖访问热度信息的分布式文件放置算法研究";田治武;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20190115;第I137-203页 *
"DTN中基于节点价值的效用路由算法";李家瑜、何施茗 等;《计算机应用研究》;20120915;第29卷(第9期);第3379-3382页 *
"HDFS高可用性方案的优化与实现";胡文龙;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20190215;第I137-70页 *
"Optimized Approach (SPCA) for Load Balancing in Distributed HDFS Cluster";K. Manjula 等;《SN Computer Science》;20200328;第1卷(第2期);第1-6页 *
基于大数据HDFS改进的心跳模型;马刚;吕途;;电脑与信息技术;20181015(第05期);第17-20页 *

Also Published As

Publication number Publication date
CN113626098A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN107682135B (en) NOMA-based network slice self-adaptive virtual resource allocation method
CN111277437B (en) Network slice resource allocation method for smart power grid
WO2019119897A1 (en) Edge computing service caching method, system and device, and readable storage medium
US8031655B2 (en) Systems and methods for determining granularity level of information about buffer status
CN107770259A (en) Copy amount dynamic adjusting method based on file temperature and node load
CN109150756B (en) Queue scheduling weight quantification method based on SDN power communication network
CN111338801B (en) Subtree migration method and device for realizing metadata load balance
CN114070758B (en) SDN network-based flow table optimization method and device
CN112463044B (en) Method and system for ensuring tail reading delay of server side of distributed storage system
CN104184514A (en) Bandwidth allocation method used for satellite network
CN115629865B (en) Deep learning inference task scheduling method based on edge calculation
CN106789716B (en) The MAC layer array dispatching method of TDMA ad hoc network
CN110545315B (en) Heartbeat interval adjusting method based on data block quantity change and bandwidth change
WO2014200561A1 (en) Methods and systems for adaptive buffer allocation in systems with adaptive resource allocation
CN107948085A (en) A kind of message sending control method based on business and satellite channel feature
CN104469851B (en) Balanced handling capacity and the resource allocation methods of delay in a kind of LTE downlinks
WO2020083364A1 (en) 5g communication method, system and device, and storage medium
CN110049507B (en) Halter strap theory-based optimal buffer resource allocation method in wireless content distribution network
CN101719869B (en) Method and device for dynamically allocating bandwidth by adopting associated token bucket algorithm
CN113626098B (en) Data node dynamic configuration method based on information interaction
CN108965168A (en) A kind of car networking based on utility function is dominant resource fairness distribution optimization method
CN109815204A (en) A kind of metadata request distribution method and equipment based on congestion aware
CN116302404B (en) Resource decoupling data center-oriented server non-perception calculation scheduling method
EP1695495A1 (en) Distributed medium acces control for broadband access systems
CN105873128B (en) A kind of LTE fairness dispatching method based on delay sensitive

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant