CN112463754B - Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment - Google Patents

Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment Download PDF

Info

Publication number
CN112463754B
CN112463754B CN202011339567.3A CN202011339567A CN112463754B CN 112463754 B CN112463754 B CN 112463754B CN 202011339567 A CN202011339567 A CN 202011339567A CN 112463754 B CN112463754 B CN 112463754B
Authority
CN
China
Prior art keywords
data
time
data node
switching
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011339567.3A
Other languages
Chinese (zh)
Other versions
CN112463754A (en
Inventor
丁顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202011339567.3A priority Critical patent/CN112463754B/en
Publication of CN112463754A publication Critical patent/CN112463754A/en
Application granted granted Critical
Publication of CN112463754B publication Critical patent/CN112463754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a method and a device for switching data nodes in an HDFS (Hadoop distributed File System), and computer equipment, wherein the method comprises the following steps: monitoring the amount of time that a user terminal accesses each data packet in a target data block from a first data node; accumulating the sum of the time of the user side continuously accessing n data packets from the first data node as the section time; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously. The present application also provides a computer-readable storage medium. The method and the device for switching the data nodes in the HDFS system can effectively improve the flexibility of the switching process of the data nodes in the HDFS system and effectively improve the data access efficiency.

Description

Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for switching data nodes in an HDFS, and a computer device.
Background
HDFS (Hadoop Distributed File System) is an excellent Distributed File System, and can be used for storing mass data. At present, HDFS has been widely used in various large-scale online services and large-scale storage systems. The HDFS adopts a block mechanism to store files in a distributed mode, the reliability of a system is improved through a data block redundancy strategy, a plurality of copies exist in each data block in the system at the same time, and the copies are distributed on a plurality of nodes in a plurality of racks in the system, so that the loss of the data blocks caused by the failure of a single node is prevented. In order to implement such a data block redundancy policy, the HDFS file system must ensure that multiple copies are written simultaneously when writing data, and the number of written copies is referred to as a replication factor of a data block.
The HDFS generally consists of a name node and a plurality of data nodes, wherein the name node is responsible for managing the name space and data block mapping information of the HDFS, configuring copy policies, and processing client requests. The data node is used for storing actual data, performing read-write operation of the data block, and periodically reporting the information of the stored data block to the name node.
After receiving a data access request sent by a user side, the HDFS can inquire all data nodes where data to be accessed are located through name nodes, and then distribute the data nodes with light load and short distance to the user side; when the data node has a sudden load increase or a network disconnection, another data node storing the data to be accessed is redistributed and allocated to the user side. However, in the prior art, the HDFS switches data nodes with poor flexibility, which results in low data access efficiency.
Disclosure of Invention
The application provides a method and a device for switching data nodes in an HDFS (Hadoop distributed File System) and computer equipment, which can solve the problems that the flexibility of the switching process of the data nodes in the HDFS system is poor and the data access efficiency is low in the prior art.
First, to achieve the above object, the present application provides a method for switching data nodes in an HDFS, where the method includes:
monitoring the amount of time that a user terminal accesses each data packet in a target data block from a first data node; accumulating the sum of the time amounts of the user side continuously accessing n data packets from the first data node as a section time amount, wherein n is an integer; judging whether the section time quantity is greater than or equal to a preset time threshold value; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; judging whether the data output efficiency is smaller than a preset efficiency threshold value or not; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously.
In one example, the data output efficiency is a data throughput of the first data node in response to the packet access procedure of the user terminal in the block time amount.
In one example, when the session time amount is less than the time threshold, the time amount for the ue to access n +1 data packets in the target data block from the first data node is accumulated as the session time amount, and it is determined whether the session time amount is greater than or equal to the preset time threshold again.
In one example, when the data output efficiency is greater than or equal to the efficiency threshold, keeping the user side accessing the unaccessed data packets in the target data block from the first data node; and clearing the section time quantum, and accumulating the section time quantum again and judging whether the section time quantum is larger than or equal to the time threshold value.
In one example, the switching the user terminal to the second data node to access the unaccessed data packets in the target data block comprises: selecting a first data node to be selected, which stores the target data block and has a load smaller than a preset load threshold value, as the second data node, and switching and connecting the user side to the second data node; judging whether the connection time of the user side and the second data node is less than a preset first rated time or not; when the connection time is shorter than the first rated time, judging whether the request time of the user side for requesting to access the data packet on the second data node is shorter than a preset second rated time; and when the connection time is less than the first rated time and when the request time is less than the second rated time, switching the second data node to access the unaccessed data packets in the target data block continuously by the user side.
In one example, a first timeout point is recorded when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time; and reselecting a second data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node.
In one example, when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time when the second candidate data node performs handover as a second data node, recording as a second timeout point; judging whether the time interval from the first timeout point to the second timeout point is smaller than a preset timeout threshold or not; when the time interval is smaller than the overtime threshold, adjusting the first rated time and/or the second rated time and the overtime threshold; and reselecting a third data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node.
In one example, said adjusting said first nominal time and/or second nominal time and said timeout threshold comprises: and increasing the first rated time and/or the second rated time and the timeout threshold value in the same proportion, and enabling the timeout threshold value to be smaller than or equal to a preset maximum threshold value.
In addition, to achieve the above object, the present application further provides an apparatus for switching a data node in an HDFS, including:
the monitoring module is used for monitoring the time quantum of each data packet in the target data block accessed from the first data node by the user terminal; a counting module, configured to accumulate a sum of time amounts for the user side to continuously access n data packets from the first data node as a segment time amount, where n is an integer; the judging module is used for judging whether the section time quantum is greater than or equal to a preset time threshold value; the calculation module is used for counting the data output efficiency of the first data node in the section time quantum when the section time quantum is larger than or equal to the time threshold; the judging module is further used for judging whether the data output efficiency is smaller than a preset efficiency threshold value; and the switching module is used for switching the user side to a second data node to continuously access the data packet which is not accessed in the target data block when the data output efficiency is smaller than the efficiency threshold value.
Further, the present application also provides a computer device, which includes a memory and a processor, where the memory stores a computer program that can run on the processor, and the computer program, when executed by the processor, implements the steps of the method for switching data nodes in the HDFS.
Further, to achieve the above object, the present application also provides a computer-readable storage medium storing a computer program, which is executable by at least one processor to cause the at least one processor to execute the steps of the data node switching method in the HDFS as described above.
Compared with the prior art, the method, the device, the computer equipment and the computer-readable storage medium for switching the data nodes in the HDFS provided by the application can monitor the time for the user side to access each data packet in the target data block from the first data node; accumulating the sum of the time of the user side continuously accessing n data packets from the first data node as the section time; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously. By executing the switching of the data nodes according to whether the data output efficiency of the first data node in the section time quantum is smaller than the preset efficiency threshold value or not, the flexibility of the switching process of the data nodes in the HDFS system is improved, and the data access efficiency is also effectively improved.
Drawings
FIG. 1 is a schematic diagram of an application environment according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for switching data nodes in an HDFS of the present application according to an embodiment;
FIG. 3 is a diagram illustrating the effect of switching data nodes in an exemplary embodiment of the present application;
FIG. 4 is a graph illustrating the effect of dynamically adjusting the first nominal time and/or the second nominal time when a data node is switched according to an exemplary embodiment of the present disclosure;
FIG. 5 is a block diagram of an embodiment of a data node switching apparatus in the HDFS of the present application;
FIG. 6 is a diagram of an alternative hardware architecture of the computer device of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the descriptions in this application referring to "first", "second", etc. are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
Fig. 1 is a schematic diagram of an application environment according to an embodiment of the present application. Referring to fig. 1, a computer device 1 is connected to a user side, an HDFS is run on the computer device 1, the HDFS includes a name node and a data node, various data resources are stored in the data node, and the name node manages storage mapping information of the various data resources and processes a request of the user side. The user side accesses the HDFS on the computer device 1 through a preset command line interface, a browser interface, and a code API interface, so that the HDFS on the computer device 1 can be connected, and requests and accesses data resources of data nodes in the HDFS are executed.
In this embodiment, after the user side is connected to the HDFS, the user side can send a data resource request to a name node, then the name node queries all data nodes where a target data block corresponding to the data resource request is located, including at least a first data node, a second data node, and a third data node, selects a data node with a small load amount as the first data node, and feeds back the data node to the user side, where the user side can be connected to the first data node to access the target data block. The computer device 1 monitors the amount of time a user terminal accesses each data packet in a target data block from a first data node; accumulating the sum of the time of the user side continuously accessing n data packets from the first data node as the section time; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously. The computer device 1 performs the switching of the data nodes according to whether the data output efficiency of the first data node in the segment time quantum is smaller than a preset efficiency threshold, so that the flexibility of the switching process of the data nodes in the HDFS system is improved, and the data access efficiency is effectively improved.
In this embodiment, the user terminal may be an electronic device such as a mobile phone, a tablet, a portable device, or a PC; the computer device 1 may be used as an electronic device such as a mobile phone, a tablet, a portable device, a PC, or a server, or may be used as an independent functional module attached to the electronic device to implement a data node switching function in the HDFS.
Example one
Fig. 2 is a flowchart illustrating an embodiment of a data node switching method in the HDFS of the present application. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is made by way of example with the computer apparatus 1 as the execution subject.
As shown in fig. 2, the method for switching data nodes in the HDFS may include steps S200 to S210.
Step S200, monitoring an amount of time that the user terminal accesses each data packet in the target data block from the first data node.
Step S202, accumulating a sum of time amounts that the user side continuously accesses n data packets from the first data node as a segment time amount, where n is an integer.
Specifically, the user side is connected to the computer device 1, then sends a data resource request to a name node in the HDFS on the computer device 1, then the name node queries all data nodes storing a target data block corresponding to the data resource request, including at least a first data node, a second data node, a third data node, and a fourth data node, and selects a data node with a small load amount as the first data node, and then feeds back the data node to the user side, where the data node can be connected to the first data node to access the target data block. The computer device 1 may then monitor the amount of time that the user terminal accesses each of the data packets in the target data block from the first data node.
In this embodiment, since the HDFS stores data resources in the form of data blocks, the data blocks perform processing or transmission on data in the basic unit of a data packet. Therefore, when the user accesses the target data block to the first data node of the HDFS on the computer device 1, the computer device 1 can monitor the time for each access process of one data packet, that is, the amount of time for the user to access each data packet in the target data block from the first data node. Next, the computer apparatus 1 accumulates the sum of the amounts of time that the user side continuously accesses n data packets from the first data node as the segment amount of time. In this embodiment, the computer device 1 accumulates the first data packet accessed from the first data node by the user side, that is, accumulates the total time of accessing the 1 st to nth data packets from the first data node by the user side as a segment time.
Step S204, judging whether the section time quantum is larger than or equal to a preset time threshold value.
Step S206, when the section time quantum is greater than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum.
Specifically, after accumulating the segment time amount, the computer device 1 further compares the segment time amount with a preset time threshold, and determines whether the segment time amount is greater than or equal to the time threshold; when the section time amount is greater than or equal to the time threshold, the computer device 1 further counts the data output efficiency of the first data node in the section time amount. In this embodiment, since the time consumed by the user end to access each data packet of the target data block from the first data node is not consistent, the computer device 1 cannot directly intercept a fixed time period, the time for the user end to access an integer number of data packets from the first data node needs to be counted as the block time amount, and when the block time amount is greater than the time threshold, the data output efficiency of the user end to access the first data node in the block time amount is calculated. Therefore, when the session time amount is less than the time threshold, the computer device 1 accumulates the time amount of the client accessing n +1 data packets in the target data block from the first data node as the session time amount, and determines whether the session time amount is greater than or equal to the preset time threshold again.
In this embodiment, the data output efficiency is the data throughput of the first data node in response to the packet access procedure of the user end in the segment time. Specifically, the computer device 1 calculates the data throughput generated by the first data node when the user terminal accesses all the data packets in the first data node in the segment time quantum. For example, the user terminal accesses L data packets in the block time, the total size of the L data packets is M, the block time is X, and then the data throughput of the first data node in the block time is: and M/X.
Step S208, determining whether the data output efficiency is smaller than a preset efficiency threshold.
Step S210, when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the unaccessed data packet in the target data block.
Specifically, after the computer device 1 counts the data output efficiency of the first data node in the section time quantum, it may further determine whether the data output efficiency is greater than or equal to a preset efficiency threshold; when the data output efficiency of the first data node in the block time quantum is the data throughput of the first data node in the block time quantum, the computer device 1 compares the calculated data throughput with a preset data throughput threshold, and when the data throughput is smaller than the data throughput threshold, the computer device 1 switches the user terminal to a second data node to access the data packets that are not accessed in the target data block. When the data output efficiency is greater than or equal to the efficiency threshold, the computer device 1 keeps the user side accessing the unaccessed data packets in the target data block from the first data node; and clearing the section time quantum, and accumulating the section time quantum again and judging whether the section time quantum is larger than or equal to the time threshold value.
Fig. 3 is a diagram illustrating the effect of switching data nodes according to an exemplary embodiment of the present invention. As shown in fig. 3, the following table is combined:
Figure GDA0003635681380000091
Figure GDA0003635681380000101
the target data block which the user side needs to access is block-1, the block-1 comprises a plurality of copies which are respectively stored in a plurality of data nodes, and each copy comprises all data packets corresponding to the block-1. When a user side requests the HDFS on the computer equipment 1 to access block-1, the HDFS allocates a first data node to the user side, then the user side is connected with the first data node, and data packets are sequentially accessed from packet-1 of the block-1. Then, the computer device 1 monitors the time consumed by the user side for accessing each data packet in the block-1 from the first data node, and accumulates the sum of the time consumed by the user side for continuously accessing n data packets from the first data node as the section time quantum; and when the section time quantum is greater than or equal to a preset time threshold, taking the channel time quantum as a window, and judging the data throughput of the first data node in the window. Then, the computer device 1 determines whether the data throughput in the window is smaller than a preset throughput threshold; if the value is less than the threshold value, switching to a second data node to access other data packets in the block-1 continuously. For example, the user terminal is connected to a first data node, and accesses packet-1: reading 131072bytes, which takes 20 ms; access packet-2: reading 131072bytes, taking 30 ms; access packet-3: reading 124bytes, which takes 18 ms; the time for accessing packets-1-3 is 20+30+18, 68ms, which is greater than the preset time threshold of 64 ms. Therefore, the computer device 1 takes the section time amount as a window, and then calculates the data throughput of the first data node in the window as: (131072+131072+124)/(20+30+18) ═ 5767 bytes/ms, the data throughput is greater than the preset data throughput threshold 4474 bytes/ms. Therefore, the computer device 1 keeps the user accessing the unaccessed data packets in the block-1 from the first data node, that is, the user continues to connect to the first data node and starts to access packet-4 and subsequent data packets.
Then, the computer device 1 repeats the above steps, calculates the data throughput of the first data node in another window, and determines whether the data throughput is greater than or equal to the data throughput threshold. For example, the user side accesses packet-4: reading 131072bytes, which takes 10 ms; access packet-5: reading 12bytes takes 80 ms. The time for accessing packets-4-5 is 10+ 80-90 ms, which is greater than a preset time threshold of 64 ms. Therefore, the computer device 1 takes the section time amount as a window, and then calculates the data throughput of the first data node in the window as: (131072+12)/(10+60) ═ 1456 bytes/ms, the data throughput is less than the preset data throughput threshold 4474 bytes/ms. Therefore, the computer device 1 switches the user end to the second data node to continue accessing the data packets that are not accessed in the block-1, that is, the packet-6 and the subsequent data packets. Then, the computer device 1 continues to monitor the time consumed by the user side for accessing each data packet in the block-1 from the second data node, and accumulates the sum of the time consumed by the user side for continuously accessing n data packets from the second data node as the section time quantum; when the section time quantum is greater than or equal to a preset time threshold, taking the channel time quantum as a window, and judging whether the data throughput of the second data node in the window is greater than or equal to the data throughput threshold; if the number of the data nodes is less than the preset value, switching to the next data node is continued.
Compared with the prior art, the computer device 1 can shorten the longest reading time p 993.1 minutes and the average reading time p994.7 seconds for the user side to read the data packets of the target data block in the HDFS to the longest reading time p 9960 seconds and the average reading time p 992.5 seconds, so that the reading speed is obviously improved, and the phenomenon of blocking cannot occur in the data reading process.
It is to be noted that, in the present embodiment, the computer apparatus 1 confirms whether the first data node is not sufficiently loaded to support high-speed data access by determining whether the data output efficiency of the first data node in the block amount of time is greater than or equal to the efficiency threshold. In other embodiments, the computer device 1 may further determine whether the first data node supports high-speed data access by determining a data packet loss rate, a network stability, and other factors that affect data access or data transmission during the data access in the segment time.
In an embodiment, the switching, by the computer device 1, the user end to the second data node to access the unaccessed data packets in the target data block sequentially includes: selecting a first data node to be selected, which stores the target data block and has a load smaller than a preset load threshold value, as the second data node, and switching and connecting the user side to the second data node; judging whether the connection time of the user side and the second data node is less than a preset first rated time or not; when the connection time is less than the first rated time, judging whether the request time of the user side for requesting to access the data packet on the second data node is less than a preset second rated time; and when the connection time is less than the first rated time and when the request time is less than the second rated time, switching the second data node to access the unaccessed data packets in the target data block continuously by the user side. When the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time, recording as a first timeout point; and reselecting a second data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node. And recording as a second timeout point when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time when the second data node to be selected is used as a second data node to perform switching. Then, the computer device 1 determines whether a time interval from the first timeout point to the second timeout point is smaller than a preset timeout threshold; when the time interval is smaller than the timeout threshold, adjusting the first rated time and/or the second rated time and the timeout threshold, including: and increasing the first rated time and/or the second rated time and the timeout threshold value in the same proportion, and enabling the timeout threshold value to be smaller than or equal to a preset maximum threshold value. Then, the computer apparatus 1 reselects a third candidate data node, which stores the target data block and has a load smaller than the load threshold, as the second data node.
Specifically, the computer device 1 sets a first rated time as an timeout time for the user to connect to the data node, sets a second rated time as an timeout time for the user to request to access the data node, and then, in a process of connecting the user to the data node or requesting to access the data node, determines whether the connection time is greater than or equal to the first rated time or whether the request time is greater than or equal to the second rated time, thereby reselecting a next data node storing a target data block as the second data node to perform switching. Therefore, the data nodes which are not switched in the past can be effectively avoided. And also; the first and second timeout points of two times of timeout (namely, the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time) before and after the first and second timeout points are recorded, and whether the interval between the first and second timeout points is greater than the timeout threshold value is judged, so that the first and second rated times are dynamically adjusted, and the data node is not always switched in the process of switching the data node.
Fig. 4 is a diagram illustrating the effect of dynamically adjusting the first rated time and/or the second rated time when switching the data node according to an exemplary embodiment of the present invention. As shown in fig. 4, the first rated time and the second rated time are both 128ms, when the user terminal is connected to a data node (not shown in the figure), then the connection is over-time and recorded as a first over-time point, the computer device 1 switches the user terminal to the data node 1, then the connection is over-time and recorded as a second over-time point, then the computer device 1 determines whether an interval from the first over-time point to the second over-time point is less than 256ms, that is, twice the rated time (2 x 128 ms); and when the time is judged to be less than 256ms, doubling and adjusting the first rated time and the second rated time, wherein the time is 256 ms. Therefore, when the computer device 1 switches the user end to the data node 2, it is determined whether the connection time is longer than the first rated time 256 ms. When the connection time is determined to be less than 256ms, the ue may further send a request for accessing data of the data node 2.
The computer device 1 then determines whether the request time of the user side for requesting access to the data node 2 is longer than 256ms, and if the request time is longer than 256ms, records the time as a third timeout point and switches the user side to the data node 3. Then, it is determined whether the connection time of the ue to the data node 3 is over time (i.e. greater than 256ms), and if the connection time is not over time, it is determined whether the request time of the ue for accessing the data node 3 is over time (i.e. greater than 256 ms). When the request time is overtime, recording the request time as a fourth overtime point, then judging whether the interval from the third overtime point to the fourth overtime point is smaller than an overtime threshold value 512ms (or twice the rated time, but doubled), and when the interval is smaller than the overtime threshold value, adjusting the rated time again by the computer equipment 1, namely adjusting the first rated time and the second rated time to 512 ms; of course, the timeout time is doubled to 1024 ms. Therefore, when the computer device 1 switches the user end to the data node 4, whether the connection is overtime or the request is overtime is determined according to the adjusted first rated time and second rated time, namely 1024 ms.
In this way, the computer device 1 continuously increases the timeout period by setting a lower timeout period, i.e. the first rated time and/or the second rated time, according to the time status; and the adjusted timeout time is limited to the maximum, such as 30s, so that the data node switching flexibility is improved, and the frequent data node switching situation is reduced.
In summary, the data node switching method in the HDFS provided in this embodiment can monitor the amount of time that the user side accesses each data packet in the target data block from the first data node; accumulating the sum of the time of the user side continuously accessing n data packets from the first data node as the section time; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously. By executing the switching of the data nodes according to whether the data output efficiency of the first data node in the section time quantum is smaller than the preset efficiency threshold value or not, the flexibility of the switching process of the data nodes in the HDFS system is improved, and the data access efficiency is also effectively improved.
Example two
Fig. 5 is a block diagram schematically illustrating a data node switching apparatus in an HDFS according to a second embodiment of the present application, where the data node switching apparatus may be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the second embodiment of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can perform specific functions, and the following description will specifically describe the functions of the program modules in the embodiments.
As shown in fig. 5, the data node switching apparatus 400 in the HDFS may include a monitoring module 410, a counting module 420, a determining module 430, a calculating module 440, and a switching module 450, wherein:
a monitoring module 410, configured to monitor an amount of time that a user terminal accesses each data packet in the target data block from the first data node.
A counting module 420, configured to accumulate a sum of time amounts that the ue continuously accesses n data packets from the first data node as a segment time amount, where n is an integer.
The determining module 430 is configured to determine whether the segment time amount is greater than or equal to a preset time threshold.
A calculating module 440, configured to count a data output efficiency of the first data node in the section time amount when the section time amount is greater than or equal to the time threshold.
In an exemplary embodiment, the data output efficiency is a data throughput of the first data node in response to the packet access procedure of the user terminal in the section time amount.
The determining module 430 is further configured to determine whether the data output efficiency is smaller than a preset efficiency threshold.
A switching module 450, configured to switch the user end to a second data node to continue accessing the unaccessed data packets in the target data block when the data output efficiency is smaller than the efficiency threshold.
In an exemplary embodiment, the counting module 420 accumulates an amount of time that the ue accesses n +1 data packets in the target data block from the first data node as the segment time amount when the segment time amount is less than the time threshold, and determines whether the segment time amount is greater than or equal to a preset time threshold again.
In an exemplary embodiment, the switching module 450 is further configured to: selecting a first data node to be selected, which stores the target data block and has a load smaller than a preset load threshold value, as the second data node, and switching and connecting the user side to the second data node; judging whether the connection time of the user side and the second data node is less than a preset first rated time or not; when the connection time is less than the first rated time, judging whether the request time of the user side for requesting to access the data packet on the second data node is less than a preset second rated time; and when the connection time is less than the first rated time and when the request time is less than the second rated time, switching the second data node to access the unaccessed data packets in the target data block continuously by the user side.
In an exemplary embodiment, when the data output efficiency is greater than or equal to the efficiency threshold, the switching module 450 is further configured to: keeping the user side to access the data packets which are not accessed in the target data block from the first data node; and clearing the section time quantum, and accumulating the section time quantum again and judging whether the section time quantum is larger than or equal to the time threshold value.
In an exemplary embodiment, the switching module 450 is further configured to: when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time, recording as a first timeout point; and reselecting a second data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node. When the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time when the second data node to be selected is used as a second data node to execute switching, recording the connection time as a second timeout point; judging whether the time interval from the first timeout point to the second timeout point is smaller than a preset timeout threshold or not; when the time interval is smaller than the overtime threshold, adjusting the first rated time and/or the second rated time and the overtime threshold; and reselecting a third data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node.
In an exemplary embodiment, the switching module 450 performing the adjusting the first rated time and/or the second rated time and the timeout threshold includes: and increasing the first rated time and/or the second rated time and the timeout threshold value in the same proportion, and enabling the timeout threshold value to be smaller than or equal to a preset maximum threshold value.
EXAMPLE III
Fig. 6 schematically shows a hardware architecture diagram of a computer device 1 suitable for implementing a data node switching method in HDFS according to a third embodiment of the present application. In the present embodiment, the computer device 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. For example, the server may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers) with a gateway function. As shown in fig. 6, the computer device 1 includes at least, but is not limited to: memory 510, processor 520, and network interface 530 may be communicatively linked to each other by a system bus. Wherein:
the memory 510 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 510 may be an internal storage module of the computer device 1, such as a hard disk or a memory of the computer device 1. In other embodiments, the memory 510 may also be an external storage device of the computer device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 1. Of course, the memory 510 may also comprise both an internal memory module of the computer device 1 and an external memory device thereof. In this embodiment, the memory 510 is generally used for storing an operating system installed in the computer device 1 and various application software, such as program codes of a data node switching method in the HDFS. In addition, the memory 510 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 520 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 520 is generally used for controlling the overall operation of the computer device 1, such as performing control and processing related to data interaction or communication with the computer device 1. In this embodiment, processor 520 is configured to execute program codes stored in memory 510 or process data.
Network interface 530 may include a wireless network interface or a wired network interface, and network interface 530 is typically used to establish communication links between computer device 1 and other computer devices. For example, the network interface 530 is used to connect the computer apparatus 1 with an external terminal through a network, establish a data transmission channel and a communication link between the computer apparatus 1 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), or Wi-Fi.
It should be noted that FIG. 6 only shows a computer device having components 510 and 530, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
In this embodiment, the program code of the data node switching method in the HDFS stored in the memory 510 may also be divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 520) to complete the embodiment of the present application.
Example four
The present embodiments also provide a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:
monitoring the amount of time that a user terminal accesses each data packet in a target data block from a first data node; accumulating the sum of the time quantum of continuously accessing n data packets from the first data node by the user side as section time quantum, wherein n is an integer; judging whether the section time quantity is greater than or equal to a preset time threshold value; when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum; judging whether the data output efficiency is smaller than a preset efficiency threshold value or not; and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously.
In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In this embodiment, the computer-readable storage medium is generally used for storing an operating system and various types of application software installed in a computer device, for example, a program code of the data node switching method in the HDFS in the embodiment, and the like. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications that can be made by the use of the equivalent structures or equivalent processes in the specification and drawings of the present application or that can be directly or indirectly applied to other related technologies are also included in the scope of the present application.

Claims (11)

1. A method for switching data nodes in an HDFS (Hadoop distributed File System), which is characterized by comprising the following steps:
monitoring the amount of time that a user terminal accesses each data packet in a target data block from a first data node;
accumulating the sum of the time amounts of the user side continuously accessing n data packets from the first data node as a section time amount, wherein n is an integer;
judging whether the section time quantity is greater than or equal to a preset time threshold value;
when the section time quantum is larger than or equal to the time threshold, counting the data output efficiency of the first data node in the section time quantum;
judging whether the data output efficiency is smaller than a preset efficiency threshold value or not;
and when the data output efficiency is smaller than the efficiency threshold, switching the user side to a second data node to access the data packet which is not accessed in the target data block continuously.
2. The method for switching data nodes in the HDFS according to claim 1, wherein the data output efficiency is a data throughput of the first data node in response to the packet access procedure of the user side in the session amount of time.
3. The method as claimed in claim 1, wherein when the segment time quantum is smaller than the time threshold, accumulating the time quantum of the client accessing n +1 packets in the target data block from the first data node as the segment time quantum, and determining whether the segment time quantum is greater than or equal to a predetermined time threshold again.
4. The method for switching data nodes in an HDFS according to claim 1, wherein when the data output efficiency is greater than or equal to the efficiency threshold, the user side is kept accessing unaccessed data packets in the target data block from the first data node; and
and clearing the section time quantum, and accumulating the section time quantum again and judging whether the section time quantum is larger than or equal to the time threshold value.
5. The method as claimed in any of claims 1-4, wherein the switching the user end to the second data node to access the unaccessed data packets in the target data block sequentially comprises:
selecting a first data node to be selected, which stores the target data block and has a load smaller than a preset load threshold value, as the second data node, and switching and connecting the user side to the second data node;
judging whether the connection time of the user side and the second data node is less than a preset first rated time or not;
when the connection time is less than the first rated time, judging whether the request time of the user side for requesting to access the data packet on the second data node is less than a preset second rated time;
and when the connection time is less than the first rated time and when the request time is less than the second rated time, switching the second data node to access the unaccessed data packets in the target data block continuously by the user side.
6. The method for data node switching in HDFS according to claim 5,
when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time, recording as a first timeout point;
and reselecting a second data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node.
7. The method for data node switching in HDFS according to claim 6,
when the connection time is greater than or equal to the first rated time or the request time is greater than or equal to the second rated time when the second data node to be selected is used as a second data node to execute switching, recording as a second timeout point;
judging whether the time interval from the first timeout point to the second timeout point is smaller than a preset timeout threshold value or not;
when the time interval is smaller than the overtime threshold, adjusting the first rated time and/or the second rated time and the overtime threshold; and
and reselecting a third data node to be selected, which stores the target data block and has a load smaller than the load threshold value, as the second data node.
8. The method for data node switching in HDFS according to claim 7, wherein said adjusting said first and/or second nominal time and said timeout threshold comprises:
and increasing the first rated time and/or the second rated time and the timeout threshold value in the same proportion, and enabling the timeout threshold value to be smaller than or equal to a preset maximum threshold value.
9. An apparatus for switching data nodes in an HDFS, the apparatus comprising:
the monitoring module is used for monitoring the time quantum of each data packet in the target data block accessed from the first data node by the user terminal;
a counting module, configured to accumulate a sum of time amounts for the user side to continuously access n data packets from the first data node as a segment time amount, where n is an integer;
the judging module is used for judging whether the section time quantum is greater than or equal to a preset time threshold value;
the calculation module is used for counting the data output efficiency of the first data node in the section time quantum when the section time quantum is larger than or equal to the time threshold;
the judging module is further used for judging whether the data output efficiency is smaller than a preset efficiency threshold value;
and the switching module is used for switching the user side to a second data node to continuously access the data packet which is not accessed in the target data block when the data output efficiency is smaller than the efficiency threshold value.
10. Computer arrangement, characterized in that the computer arrangement comprises a memory, a processor, a computer program stored on the memory being executable on the processor, the computer program, when being executed by the processor, realizing the steps of the method for data node switching in an HDFS according to any of the claims 1-8.
11. A computer-readable storage medium, characterized in that it stores a computer program which is executable by at least one processor to cause the at least one processor to perform the steps of the method for data node switching in HDFS according to any of claims 1-8.
CN202011339567.3A 2020-11-25 2020-11-25 Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment Active CN112463754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011339567.3A CN112463754B (en) 2020-11-25 2020-11-25 Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011339567.3A CN112463754B (en) 2020-11-25 2020-11-25 Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment

Publications (2)

Publication Number Publication Date
CN112463754A CN112463754A (en) 2021-03-09
CN112463754B true CN112463754B (en) 2022-08-02

Family

ID=74808202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011339567.3A Active CN112463754B (en) 2020-11-25 2020-11-25 Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment

Country Status (1)

Country Link
CN (1) CN112463754B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101115A1 (en) * 2014-12-23 2016-06-30 华为技术有限公司 Resource scheduling method and related apparatus
CN105657770B (en) * 2016-03-29 2019-05-03 Oppo广东移动通信有限公司 The switching method and device of data network
CN105760556B (en) * 2016-04-19 2019-05-24 江苏物联网研究发展中心 More wave files of low delay high-throughput read and write optimization method
CN107846715A (en) * 2016-09-20 2018-03-27 深圳市盛路物联通讯技术有限公司 Access point switching method and device of the Internet of Things based on transmission rate
TWI740885B (en) * 2017-01-23 2021-10-01 香港商阿里巴巴集團服務有限公司 Service node switching method and device of distributed storage system
CN109033298A (en) * 2018-07-14 2018-12-18 北方工业大学 Data distribution method under heterogeneous HDFS cluster
CN111275415A (en) * 2020-01-13 2020-06-12 北京三快在线科技有限公司 Resource channel switching method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112463754A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
US9882975B2 (en) Method and apparatus for buffering and obtaining resources, resource buffering system
CN109936511B (en) Token obtaining method, device, server, terminal equipment and medium
US20160259802A1 (en) Adaptive data striping and replication across multiple storage clouds for high availability and performance
CN111522636B (en) Application container adjusting method, application container adjusting system, computer readable medium and terminal device
CN112416569B (en) Cache memory adjusting method and device and computer equipment
US8914501B2 (en) Method, apparatus, and system for scheduling distributed buffer resources
RU2617331C2 (en) Time-outs of self-adaptive service
CN111277640B (en) User request processing method, device, system, computer equipment and storage medium
CN110659151A (en) Data verification method and device and storage medium
CN112269661B (en) Partition migration method and device based on Kafka cluster
CN112367384B (en) Kafka cluster-based dynamic speed limiting method and device and computer equipment
CN112256433B (en) Partition migration method and device based on Kafka cluster
CN112600761A (en) Resource allocation method, device and storage medium
CN111930305A (en) Data storage method and device, storage medium and electronic device
CN101470733A (en) Data block copy amount regulation method and distributed file system
CN112416888B (en) Dynamic load balancing method and system for distributed file system
CN112463754B (en) Data node switching method and device in HDFS (Hadoop distributed File System) and computer equipment
CN112835702A (en) Service switching method and device, computer equipment and storage medium
CN116248699B (en) Data reading method, device, equipment and storage medium in multi-copy scene
CN109344012B (en) Data reconstruction control method, device and equipment
CN111404828A (en) Method and device for realizing global flow control
CN109962941B (en) Communication method, device and server
US20170269864A1 (en) Storage Array Operation Method and Apparatus
CN117666926A (en) Data storage method and device and electronic equipment
CN114020468A (en) Elastic expansion method and device of algorithm container based on GPU (graphics processing Unit) index and application of elastic expansion method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant