CN114598708A - Information processing method, device, system, equipment and readable storage medium - Google Patents

Information processing method, device, system, equipment and readable storage medium Download PDF

Info

Publication number
CN114598708A
CN114598708A CN202011308062.0A CN202011308062A CN114598708A CN 114598708 A CN114598708 A CN 114598708A CN 202011308062 A CN202011308062 A CN 202011308062A CN 114598708 A CN114598708 A CN 114598708A
Authority
CN
China
Prior art keywords
datanode
network topology
request
client
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011308062.0A
Other languages
Chinese (zh)
Other versions
CN114598708B (en
Inventor
杨光
冯仕炳
蒋宁
刘德华
吴海英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202011308062.0A priority Critical patent/CN114598708B/en
Publication of CN114598708A publication Critical patent/CN114598708A/en
Application granted granted Critical
Publication of CN114598708B publication Critical patent/CN114598708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1053Group management mechanisms  with pre-configuration of logical or physical connections with a determined number of other peers
    • H04L67/1057Group management mechanisms  with pre-configuration of logical or physical connections with a determined number of other peers involving pre-assessment of levels of reputation of peers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1061Peer-to-peer [P2P] networks using node-based peer discovery mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses an information processing method, device, system, equipment and readable storage medium, and relates to the technical field of internet, so as to ensure the stability of an HDFS cluster. The method comprises the following steps: receiving a first resource request sent by a DataNode in an HDFS cluster, wherein the first resource request comprises target network topology information; performing current limiting analysis according to the first resource request to obtain a current limiting analysis result; sending a first resource response to the DataNode according to the current limiting analysis result; the embodiment of the invention can ensure the stability of the HDFS cluster.

Description

Information processing method, device, system, equipment and readable storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to an information processing method, apparatus, system, device, and readable storage medium.
Background
In internet technology, users generally use Hadoop as their basic big data platform, and use a Hadoop Distributed File System (HDFS) cluster as a storage System.
The HDFS cluster is a distributed file system with a fault tolerance mechanism. An HDFS cluster is composed of one NameNode and a plurality of datanodes. The NameNode is responsible for storing metadata such as a file system name space and the like, and the DataNode is responsible for storing actual data of the file. When a client reads and writes a file, the client firstly needs to communicate with the NameNode to obtain a DataNode corresponding to the file to be read and written, and then reads and writes the specific content of the file to the corresponding DataNode.
In the prior art, the DataNode initializes the blancertrottler (current limiter for data balance) according to the configuration item to allow how much bandwidth is used per second for data balance at the time of startup. However, the above scheme can only control the flow of a single DataNode, thereby affecting the stability of the whole HDFS cluster.
Disclosure of Invention
The embodiment of the invention provides an information processing method, device, system, equipment and a readable storage medium, which are used for ensuring the stability of an HDFS cluster.
In a first aspect, an embodiment of the present invention provides an information processing method, applied to a distributed current limiter, including:
receiving a first resource request sent by a DataNode in an HDFS cluster, wherein the first resource request comprises target network topology information;
performing current-limiting analysis according to the first resource request to obtain a current-limiting analysis result;
sending a first resource response to the DataNode according to the current limiting analysis result;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
In a second aspect, an embodiment of the present invention further provides an information processing method, which is applied to a NameNode in an HDFS cluster, and includes:
receiving a first data processing request of a client;
and sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode.
In a third aspect, an embodiment of the present invention further provides an information processing method, which is applied to a DataNode in an HDFS cluster, and includes:
receiving a second data processing request of the client, wherein the second data processing request comprises target network topology information;
sending a first resource request to a distributed current limiter according to the second data processing request, wherein the first resource request comprises the target network topology information;
receiving a first resource response of the distributed current limiter;
processing according to the first resource response;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
In a fourth aspect, an embodiment of the present invention further provides an information processing method, including:
receiving a first data processing request of a client;
sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode;
receiving a second data processing request of a client, wherein the second data processing request comprises the target network topology information;
performing current-limiting analysis according to the second data processing request to obtain a current-limiting analysis result;
and processing according to the current limiting analysis result.
In a fifth aspect, an embodiment of the present invention further provides an information processing apparatus, which is applied to a distributed current limiter, and includes:
the system comprises a first receiving module, a second receiving module and a sending module, wherein the first receiving module is used for receiving a first resource request sent by a DataNode in an HDFS cluster, and the first resource request comprises target network topology information;
the first processing module is used for carrying out current limiting analysis according to the first resource request to obtain a current limiting analysis result;
a first sending module, configured to send a first resource response to the DataNode according to the current limiting analysis result;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
In a sixth aspect, an embodiment of the present invention further provides an information processing apparatus, which is applied to a NameNode in a cluster, and includes:
the first receiving module is used for receiving a first data processing request of a client;
a first sending module, configured to send first information to the client according to the first data processing request, where the first information includes information of a target DataNode used for processing a second data processing request of the client and target network topology information corresponding to the target DataNode.
In a seventh aspect, an embodiment of the present invention further provides an information processing apparatus, which is applied to a DataNode in an HDFS cluster, and includes:
the first receiving module is used for receiving a second data processing request of the client, wherein the second data processing request comprises target network topology information;
a first sending module, configured to send a first resource request to a distributed current limiter according to the second data processing request, where the first resource request includes the target network topology information;
a second receiving module for receiving a first resource response of the distributed current limiter;
the first processing module is used for processing according to the first resource response;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
In an eighth aspect, an embodiment of the present invention further provides an information processing system, including: NameNode, at least one DataNode and distributed current limiter in the HDFS cluster;
the NameNode is used for receiving a first data processing request of a client and sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode;
the DataNode is configured to receive a second data processing request of the client, where the second data processing request includes the target network topology information; sending a first resource request to a distributed current limiter according to the second data processing request, wherein the first resource request comprises the target network topology information; receiving a first resource response of the distributed current limiter; processing according to the first resource response;
the distributed current limiter is used for receiving a first resource request sent by the DataNode and carrying out current limiting analysis according to the first resource request to obtain a current limiting analysis result; and sending a first resource response to the DataNode according to the current limiting analysis result.
In a ninth aspect, an embodiment of the present invention further provides an electronic device, including: the information processing method comprises the following steps of a memory, a processor and a program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps in the information processing method.
In a ninth aspect, the embodiment of the present invention further provides a readable storage medium, where a program is stored on the readable storage medium, and when the program is executed by a processor, the program implements the steps in the information processing method described above.
In the embodiment of the present invention, the distributed current limiter may perform current limiting analysis according to a first resource request of a DataNode carrying network topology information, and send a first resource response to the DataNode. Since the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client, the distributed current limiter may determine whether current limiting is required by considering resource usage of a plurality of datanodes according to the network topology information. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
Drawings
FIG. 1 is a schematic diagram of an information handling system according to an embodiment of the present invention;
FIG. 2 is a flow chart of an information processing method provided by an embodiment of the invention;
FIG. 3 is a second flowchart of an information processing method according to an embodiment of the present invention;
FIG. 4 is a third flowchart of an information processing method according to an embodiment of the present invention;
FIG. 5 is a fourth flowchart of an information processing method according to an embodiment of the present invention;
FIG. 6 is a process diagram of an information processing method according to an embodiment of the present invention;
FIG. 7 is one of the structural diagrams of an information processing apparatus provided by the embodiment of the present invention;
FIG. 8 is a second block diagram of an information processing apparatus according to an embodiment of the present invention;
FIG. 9 is a third block diagram of an information processing apparatus according to an embodiment of the present invention;
fig. 10 is a fourth block diagram of an information processing apparatus according to an embodiment of the present invention.
Detailed Description
In the embodiment of the present invention, the term "and/or" describes an association relationship of an associated object, and indicates that three relationships may exist, for example, a and/or B, and may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
A brief introduction to HDFS clustering follows.
As described earlier, one HDFS cluster is composed of one NameNode and a plurality of datanodes. In the process of storing data or files, the HDFS cluster splits the data or files into data blocks, and then stores the data blocks on each DataNode node in the HDFS cluster.
The HDFS cluster can realize that all data block copies stored on the node can still be read on other nodes in the cluster under the condition that a node storage medium is damaged or the whole node is abnormally off-line by storing the copies of the data blocks on the selected node in the cluster.
For a large HDFS cluster, its nodes are deployed in multiple racks in the computer room. Thus, to achieve rack level fault tolerance, HDFS clusters store copies of data blocks into multiple racks through rack-aware functionality. An HDFS cluster administrator can help the HDFS cluster to realize rack perception by defining any level of network topology through a configuration file. For example, in a large computer room, an administrator may define the following example racks according to the actual network topology of the computer room:
/root-switch/core-switch1/rack-switch1;
/root-switch/core-switch1/rack-switch2;
/root-switch/core-switch2/rack-switch1。
the HDFS cluster selects the copy storage nodes based on the number of configured file copies and rack awareness. In an ideal case, the first copy selects a storage node where the file is written into the same client, the second copy selects a storage node which is not in the same rack as the client, and the third copy selects a storage node which is in the same rack as the client. If the number of copies is greater than three, then the HDFS cluster randomly selects one node in the cluster for storage for the copies after the third copy.
In order to accelerate the writing speed of the file, the HDFS cluster writes the file in a data pipeline mode. Taking the copy of the file as 3 as an example, when the data written by the client is written to the first storage node, the first storage node will simultaneously write the data to the second storage node while storing the data in the local storage medium, and the second storage node will simultaneously write the data to the third storage node while storing the data in the local storage medium.
On the basis of the above, the information processing system according to the embodiment of the present invention introduces a distributed current limiter for performing distributed global flow control and current limiting rule definition.
Referring to fig. 1, fig. 1 is a block diagram of an information processing system according to an embodiment of the present invention. Wherein, this system can include: a NameNode101, at least one DataNode102 and a distributed current limiter 103 in the HDFS cluster.
The NameNode101 is configured to receive a first data processing request of a client, and send first information to the client according to the first data processing request, where the first information includes information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode;
the DataNode102 is configured to receive a second data processing request of the client, where the second data processing request includes the target network topology information; sending a first resource request to a distributed current limiter according to the second data processing request, wherein the first resource request comprises the target network topology information; receiving a first resource response of the distributed current limiter; processing according to the first resource response;
the distributed current limiter (distribution thread) 103 is configured to receive a first resource request sent by the DataNode, perform current limiting analysis according to the first resource request, and obtain a current limiting analysis result; and sending a first resource response to the DataNode according to the current limiting analysis result.
Wherein the first data processing request may be an ADD BLOCK request to request a new data BLOCK; the second data processing request may be a read data request or a write data request.
In the embodiment of the present invention, the distributed current limiter may perform current limiting analysis according to a first resource request of a DataNode carrying network topology information, and send a first resource response to the DataNode. Since the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client, the distributed current limiter may determine whether current limiting is required by considering resource usage of a plurality of datanodes according to the network topology information. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
The operation principle of each component in the system according to the embodiment of the present invention will be described in detail with reference to specific embodiments.
Referring to fig. 2, fig. 2 is a flowchart of an information processing method provided by an embodiment of the present invention, and is applied to a distributed current limiter, as shown in fig. 2, including the following steps:
step 201, receiving a first resource request sent by a DataNode in an HDFS cluster, where the first resource request includes target network topology information.
The target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client. The network topology information may be respective network topology information of all datanodes for processing the second data processing request of the client. That is, each DataNode for processing the second data processing request of the client has corresponding network topology information.
Wherein the first data processing request may be an ADD BLOCK request to request a new data BLOCK; the second data processing request may be a read data request or a write data request; the first resource request may be a TRY _ ACCQUIRE for requesting a resource for reading and writing data from and to the distributed current limiter. Wherein the first resource request includes a resource requested by the DataNode, such as a size of the requested resource.
In the distributed current limiter, the bandwidth between network topologies, the current limiting rule and other information can be configured through a configuration file (such as dfs. By the method, the resource use conditions of a plurality of DataNodes can be set, so that accurate current limiting judgment can be carried out. The current limiting rule may be, for example, which network topologies need to be current limited and which network topologies do not need to be current limited.
And step 202, performing current limiting analysis according to the first resource request to obtain a current limiting analysis result.
Specifically, in this step, in order to improve the accuracy of the current limiting analysis result, the distributed current limiter performs current limiting analysis according to the resource requested by the DataNode, the target network topology information, and a preset current limiting rule, so as to obtain a current limiting analysis result.
Wherein, the current limit analysis result may be that no current limit is needed, etc. If the result of the current limit analysis indicates that current limit is not required, a response message may be directly sent to the DataNode to indicate that current limit is not required. Under the condition that current limitation is needed, if the resource of the distributed current limiter is larger than the resource requested by the DataNode, the current limitation analysis result can also be that the current limitation is needed but the requested resource is returned to the DataNode; in the case of current limiting, if the resource of the distributed current limiter itself is smaller than the resource requested by the DataNode, the current limiting analysis result is that current limiting is required.
Step 203, sending a first resource response to the DataNode according to the result of the current limiting analysis.
In this step, if the result of the current limit analysis indicates that current limit is not required, a first resource response is sent to the DataNode, where the first resource response includes a first identifier, and the first identifier is used to indicate that current limit is not required. For example, the first flag may be long.
And if the current limiting analysis result shows that current limiting is needed but the available resources of the distributed current limiter are larger than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises transmission resources determined according to the resources requested by the DataNode. The transmission resource is used for processing a second data processing request of the client.
And if the current limiting analysis result shows that current limiting is needed but the available resources of the distributed current limiter are smaller than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises a second identifier which is used for showing that the DataNode is not allowed to process the current data processing request. For example, the second flag may be 0.
In the embodiment of the present invention, the distributed current limiter may perform current limiting analysis according to a first resource request of a DataNode carrying network topology information, and send a first resource response to the DataNode. Since the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client, the distributed current limiter may determine whether current limiting is required by considering resource usage of a plurality of datanodes according to the network topology information. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
Referring to fig. 3, fig. 3 is a flowchart of an information processing method provided in an embodiment of the present invention, and is applied to a NameNode in an HDFS cluster, as shown in fig. 3, including the following steps:
step 301, receiving a first data processing request of a client.
Wherein, the meaning of the first data processing request can refer to the description of the foregoing embodiments.
Step 302, according to the first data processing request, sending first information to the client, where the first information includes information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode.
In the embodiment of the invention, the protocol between the NameNode and the client is expanded. Adding a throttleDefine field in a response message returned by the NameNode to the client, and carrying the target network topology information in the throttleDefine. According to the obtained target network topology information, each target DataNode can form a data pipeline.
The NameNode may store information of each DataNode and network topology information of the DataNode. The network topology information of each DataNode may be obtained through a registration procedure of the DataNode.
In practical application, the NameNode may select a target DataNode for processing the second data processing request from the datanodes in the system according to information such as a load condition of each current DataNode in the system. For example, the NameNode may select the less loaded DataNode as the target DataNode. And meanwhile, according to the selected target DataNode, the network topology information of the target DataNode is obtained from the information stored by the target DataNode.
Optionally, on the basis of the foregoing embodiment, the method may further include: the NameNode receives a registration request of at least one DataNode, and acquires network topology information of the at least one DataNode, such as an IP address and a network address of the DataNode, information (such as an identifier) of the network topology to which the DataNode belongs, and the like according to the registration request of the at least one DataNode.
In the embodiment of the present invention, the distributed current limiter may perform current limiting analysis according to a first resource request of a DataNode carrying network topology information, and send a first resource response to the DataNode. Since the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client, the distributed current limiter may determine whether current limiting is required by considering resource usage of a plurality of datanodes according to the network topology information. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
Referring to fig. 4, fig. 4 is a flowchart of an information processing method provided in an embodiment of the present invention, and is applied to a DataNode in an HDFS cluster, as shown in fig. 4, including the following steps:
step 401, receiving a second data processing request of the client, where the second data processing request includes target network topology information.
The target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
In the embodiment of the invention, the protocol between the client and the DataNode is expanded, and a throttleDefine field is added in a second data processing request sent to the DataNode by the client and carries the target network topology information. The second data processing request may be WRITE _ BLOCK (for writing data), READ _ BLOCK (for reading data), or the like.
In the embodiment of the invention, each target DataNode can form a data pipeline according to the obtained target network topology information. The DataNode used in the method of the embodiment shown in fig. 4 may be the first DataNode of the data pipe, that is, the DataNode that is the entry of the data pipe. Of course, other datanodes may be processed according to the processing method of the first DataNode.
Step 402, according to the second data processing request, sending a first resource request to a distributed current limiter, where the first resource request includes the target network topology information.
Wherein, a throttleDefine field is added in a first resource request field sent by the DataNode to the distributed current limiter and carries the target network topology information. The first resource request may be a TRY _ ACCQUIRE, and the requested resource of the DataNode, such as the size of the requested resource, may also be included in the request.
Step 403, receiving a first resource response of the distributed current limiter.
And step 404, processing according to the first resource response.
Specifically, in this step, if the first resource response includes a first identifier or the first resource response includes a transmission resource determined according to a resource requested by the DataNode, writing first data corresponding to the second data processing request into the DataNode, and sending the first data to a downstream DataNode, or sending second data corresponding to the second data processing request to a client; the first flag is used to indicate that current limitation is not required, for example, the first flag may be long.
For example, if the second data processing request is a request for writing data, the first data node may obtain a data block with a corresponding size according to the transmission resource, write the data block into the first data node, and then send the data block to the downstream data node. Wherein, the downstream DataNode refers to a DataNode which is next to the first DataNode in the data pipeline. If the second data processing request is a request for reading data, the first data node may send its own data to the client via the transmission resource according to the second data processing request.
And if the first resource response comprises a second identifier, sending the first resource request to the distributed current limiter again after a preset time, wherein the second identifier is used for indicating that the DataNode is not allowed to process the current data processing request. For example, the second flag may be 0. The preset time may be set arbitrarily, for example, defined in a configuration file, such as dfs.
In the embodiment of the present invention, the distributed current limiter may perform current limiting analysis according to a first resource request of a DataNode carrying network topology information, and send a first resource response to the DataNode. Since the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client, the distributed current limiter may determine whether current limiting is required by considering resource usage of a plurality of datanodes according to the network topology information. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
Referring to fig. 5, fig. 5 is a flowchart of an information processing method according to an embodiment of the present invention. As shown in fig. 5, taking the example of writing data, the method may include the following steps:
step 501, the client sends ADD _ BLOCK request to NameNode to request new data BLOCK write data.
Step 502, the NameNode sends the information of the DataNode for processing the ADD _ BLOCK request and the network topology information of the DataNode to the client.
Step 503, the client sends a WRITE _ BLOCK request to the first DataNode in the data pipe according to the network topology information and starts to WRITE data, and the WRITE _ BLOCK request carries the network topology information received in step 502.
Step 504, the first DataNode sends a TRY _ ACCQUIRE request to the distributed current limiter to request the resource. The request at TRY _ ACCQUIRE carries the network topology information received at step 503.
Step 505, the distributed current limiter determines whether to limit the data pipe according to the current limiting rule defined by the configuration dfs.
Step 506, the distributed current limiter sends a response message to the first DataNode.
If the current limitation is needed, the distributed current limiter sends a response message to the first DataNode, wherein the response message comprises a first identifier, and the first identifier is used for indicating that the current limitation is not needed. For example, the first flag may be long.
If the current limit is needed but the available resource of the distributed current limiter is larger than the resource requested by the first DataNode, the distributed current limiter sends a response message to the first DataNode, wherein the response message comprises the resource requested by the first DataNode.
If the current limitation is needed but the available resource of the distributed current limiter is smaller than the resource requested by the first DataNode, the distributed current limiter sends a response message to the first DataNode, wherein the response message comprises a second identifier, and the second identifier is used for indicating that the first DataNode is not allowed to process the current data processing request. For example, the second flag may be 0.
Correspondingly, if the response message includes the first identifier or the response message includes the resource requested by the first DataNode, the first DataNode writes the acquired data into the first DataNode, and sends the data to be written to the downstream DataNode. If the response message includes the second identifier, the first DataNode again sends a TRY _ ACCURE request to the distributed current limiter after a preset time.
Taking the example that the current limitation is needed but the available resource of the distributed current limiter is larger than the resource requested by the DataNode, the method may further include:
and 507, the first DataNode receives the block data of the same byte according to the resource returned by the distributed current limiter and writes the block data into the first DataNode.
Step 508, the first DataNode sends WRITE _ BLOCK request to downstream DataNode, and carries the data to be written into downstream DataNode.
In practical applications, such as in the financial industry, financial service providers must ensure that the same piece of data is stored in a two-place-three-center manner due to regulatory requirements for data storage security. For the HDFS cluster, one of the solutions is to deploy one HDFS cluster in three centers, and each center stores two copies, so that the security requirement of data storage can be met. However, when writing a file to an HDFS cluster, its data pipe spans three centers. Since the bandwidth between data centers is typically only 20Gbps and 100Gbps, if there are a large number of write requests in the same period of time, this will result in the bandwidth between data centers being completely filled by file write requests. When the bandwidth is full, a large number of other requests may be delayed by the full bandwidth, which may affect the stability of the entire HDFS cluster.
Therefore, by adding the distributed current limiter in the HDFS cluster, the problem that when a large number of data center-crossing concurrent reads and writes exist, the received center-crossing traffic occupies the whole bandwidth, and other requests cannot be processed quickly is solved. By the scheme of the embodiment of the invention, the stability of the HDFS cluster can be ensured.
An embodiment of the present invention further provides an information processing method, which is applicable to the information processing system shown in fig. 1, and includes:
and S1, receiving a first data processing request of the client.
Specifically, in this step, the NameNode receives a first data processing request of the client.
S2, sending a first message to the client according to the first data processing request, where the first message includes information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode.
Specifically, in this step, the NameNode sends the first information to the client.
S3, receiving a second data processing request of the client, wherein the second data processing request comprises the target network topology information.
Specifically, in this step, the DataNode receives a second data processing request of the client.
And S4, performing current limiting analysis according to the second data processing request to obtain a current limiting analysis result.
Specifically, in this step, the distributed current limiter performs current limiting analysis to obtain a current limiting analysis result.
In steps S3 and S4, the DataNode sends a first resource request to the distributed current limiter according to the second data processing request, where the first resource request includes the target network topology information. The distributed current limiter performs current limiting analysis according to the first resource request to obtain a current limiting analysis result; and sending a first resource response to the DataNode according to the current limiting analysis result.
And S5, processing according to the current limiting analysis result.
Specifically, in this step, the DataNode performs processing according to the first resource response.
In the above steps, the specific working principle of each component in the system can refer to the description of the foregoing method embodiment.
In the embodiment of the invention, the resource use conditions of a plurality of DataNodes can be considered according to the reported network topology information so as to judge whether the current limitation is needed. Therefore, the stability of the HDFS cluster can be ensured by utilizing the scheme of the embodiment of the invention.
As shown in fig. 6, HDFS clusters are deployed across centers in a first data center (/ cq/lj) and a second data center (/ cq/xy). Define dfs, triple, cq/lj-/cq/xj, 10GB in the distributed limiter, meaning that the bandwidth between two data centers will be limited to 10 GB.
Referring to fig. 6, the DataNode sends a REGISTER request to the NameNode, and the NameNode can acquire network topology information and the like to which the DataNode belongs by the request. When the client prepares to write in the file, an ADD _ BLOCK request is sent to the NameNode to apply for a new data BLOCK. And the NameNode returns a to-be-written DataNode list and network topology information of the data pipeline to the client. The client sends data to the first DataNode of the data pipeline and sends network topology information. The first DataNode sends a TRY _ ACCQUIRE request to the distributed restrictor to acquire traffic resources. If the distributed current limiter returns the flow resource requested by the first DataNode according to the TRY _ ACCURE request, the first DataNode continues to be written into the data pipeline, and data is written into the next hop of DataNode; and if the first DataNode does not acquire the specified number of resources, suspending writing in the data pipeline, and calling TRY _ ACCQUIRE again to request to acquire the flow resources after the time set by the suspended configuration item dfs.
It can be seen from the above description that, in the embodiment of the present invention, flow control of data reading and writing is implemented by introducing the distributed current limiter, so that other requests for communication between network topologies can be protected from being overtime due to network delay, and further, stability of the entire HDFS is ensured. The distributed current limiter can consider the resource use conditions of a plurality of DataNodes according to the network topology information so as to judge whether the current limiting is needed, therefore, by utilizing the scheme of the embodiment of the invention, the global flow sensing and control are realized.
In the above embodiment, by introducing a configuration item dfs. And the flow control can be carried out on the reading and writing of the data, so that the use scenes of the HDFS cluster are enriched. In addition, in the embodiment of the invention, the network topology for communication between the client and the data nodes is transmitted in the data pipeline, and only the first data node in the data pipeline communicates with the distributed current limiter to acquire the flow resource, so that the problem that all the data nodes communicate with the distributed current limiter is avoided, and the pressure of the distributed current limiter is reduced.
The embodiment of the invention also provides an information processing device which is applied to the distributed current limiter. Referring to fig. 7, fig. 7 is a block diagram of an information processing apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the information processing device is similar to the information processing method in the embodiment of the invention, the implementation of the information processing device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 7, the information processing apparatus 700 includes:
a first receiving module 701, configured to receive a first resource request sent by a DataNode in an HDFS cluster, where the first resource request includes target network topology information; a first processing module 702, configured to perform a current limiting analysis according to the first resource request to obtain a current limiting analysis result; a first sending module 703, configured to send a first resource response to the DataNode according to the current limiting analysis result; the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
Optionally, the first resource request includes a resource requested by the DataNode. The device further comprises: the first acquisition module is used for acquiring a preset current limiting rule; the first processing module 702 is configured to perform a current limiting analysis according to the resource requested by the DataNode, the target network topology information, and a preset current limiting rule, so as to obtain a current limiting analysis result.
Optionally, the first sending module 703 is configured to:
if the current limit analysis result shows that the current limit is not needed, sending a first resource response to the DataNode, wherein the first resource response comprises a first identifier, and the first identifier is used for showing that the current limit is not needed;
if the current limiting analysis result indicates that current limiting is needed but the available resources of the distributed current limiter are larger than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises transmission resources determined according to the resources requested by the DataNode;
and if the current limiting analysis result shows that current limiting is needed but the available resources of the distributed current limiter are smaller than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises a second identifier which is used for showing that the DataNode is not allowed to process the current data processing request.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides an information processing device which is applied to the NameNode in the HDFS cluster. Referring to fig. 8, fig. 8 is a block diagram of an information processing apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the information processing device is similar to the information processing method in the embodiment of the invention, the implementation of the information processing device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 8, the information processing apparatus 800 includes:
a first receiving module 801, configured to receive a first data processing request of a client; a first sending module 802, configured to send, according to the first data processing request, first information to the client, where the first information includes information of a target DataNode used for processing a second data processing request of the client and target network topology information corresponding to the target DataNode.
Wherein the target network topology information is carried in a flow limit definition throttleDefine field.
Wherein the apparatus may further comprise:
a second receiving module, configured to receive a registration request of at least one DataNode;
a first obtaining module, configured to obtain network topology information of the at least one DataNode according to the registration request of the at least one DataNode.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides an information processing device which is applied to the DataNode in the HDFS cluster. Referring to fig. 9, fig. 9 is a block diagram of an information processing apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the information processing device is similar to the information processing method in the embodiment of the invention, the implementation of the information processing device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 9, the information processing apparatus 900 includes:
a first receiving module 901, configured to receive a second data processing request of a client, where the second data processing request includes target network topology information; a first sending module 902, configured to send a first resource request to a distributed current limiter according to the second data processing request, where the first resource request includes the target network topology information; a second receiving module 903, configured to receive a first resource response of the distributed current limiter; a first processing module 904, configured to perform processing according to the first resource response;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is the network topology information of the DataNode for processing the second data processing request of the client.
Wherein, the current limit definition throttleDefine field of the second data processing request carries the target network topology information; and carrying the target network topology information in a flow limiting definition (throttleDefine) field of the first resource request.
Wherein, the first processing module 904 is configured to:
if the first resource response comprises a first identifier or the first resource response comprises transmission resources determined according to the resources requested by the DataNode, writing first data corresponding to the second data processing request into the DataNode and sending the first data to a downstream DataNode, or sending second data corresponding to the second data processing request to a client; wherein the first identifier is used for indicating that no current limiting is required;
and if the first resource response comprises a second identifier, sending the first resource request to the distributed current limiter again after a preset time, wherein the second identifier is used for indicating that the DataNode is not allowed to process the current data processing request.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides an information processing device. Referring to fig. 10, fig. 10 is a block diagram of an information processing apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the information processing device is similar to the information processing method in the embodiment of the invention, the implementation of the information processing device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 10, the information processing apparatus 1000 includes:
a first receiving module 1001, configured to receive a first data processing request of a client; a first sending module 1002, configured to send first information to the client according to the first data processing request, where the first information includes information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode; a second receiving module 1003, configured to receive a second data processing request of the client, where the second data processing request includes the target network topology information; the first processing module 1004 is configured to perform a current limiting analysis according to the second data processing request to obtain a current limiting analysis result; a second processing module 1005, configured to perform processing according to the current limiting analysis result.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
An embodiment of the present invention further provides an electronic device, including: a memory, a processor, and a program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement each process of the information processing method embodiment, and can achieve the same technical effect, and for avoiding repetition, details are not described here again.
The embodiment of the present invention further provides a readable storage medium, where a program is stored on the readable storage medium, and when the program is executed by a processor, the program implements each process of the above-mentioned information processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the detailed description is omitted here. The readable storage medium may be any available medium or data storage device that can be accessed by a processor, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical memory (e.g., CD, DVD, BD, HVD, etc.), and semiconductor memory (e.g., ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), Solid State Disk (SSD)), etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. With such an understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (13)

1. An information processing method applied to a distributed current limiter, comprising:
receiving a first resource request sent by a data node DataNode in a Hadoop Distributed File System (HDFS) cluster, wherein the first resource request comprises target network topology information;
performing current-limiting analysis according to the first resource request to obtain a current-limiting analysis result;
sending a first resource response to the DataNode according to the current limiting analysis result;
the target network topology information is sent to the DataNode by a client, and a name node NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
2. The method of claim 1, wherein the first resource request comprises a resource requested by the DataNode;
before performing a current limiting analysis according to the first resource request to obtain a current limiting analysis result, the method further includes:
acquiring a preset current limiting rule;
the current limiting analysis is performed according to the first resource request to obtain a current limiting analysis result, and the method comprises the following steps:
and carrying out current limiting analysis according to the resources requested by the DataNode, the target network topology information and the preset current limiting rule to obtain a current limiting analysis result.
3. The method of claim 1, wherein sending a first resource response to the DataNode based on the current limit analysis comprises:
if the current limit analysis result shows that current limit is not needed, sending a first resource response to the DataNode, wherein the first resource response comprises a first identifier, and the first identifier is used for showing that current limit is not needed;
if the current limiting analysis result indicates that current limiting is needed but the available resources of the distributed current limiter are larger than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises transmission resources determined according to the resources requested by the DataNode;
and if the current limiting analysis result shows that current limiting is needed but the available resources of the distributed current limiter are smaller than the resources requested by the DataNode, sending a first resource response to the DataNode, wherein the first resource response comprises a second identifier which is used for showing that the DataNode is not allowed to process the current data processing request.
4. An information processing method is applied to a NameNode in an HDFS cluster, and is characterized by comprising the following steps:
receiving a first data processing request of a client;
and sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode.
5. The method of claim 4, wherein the target network topology information is carried in a flow restriction definition throttleDefine field; the method further comprises the following steps:
receiving a registration request of at least one DataNode;
and acquiring the network topology information of the at least one DataNode according to the registration request of the at least one DataNode.
6. An information processing method is applied to a DataNode in an HDFS cluster, and is characterized by comprising the following steps:
receiving a second data processing request of the client, wherein the second data processing request comprises target network topology information;
sending a first resource request to a distributed current limiter according to the second data processing request, wherein the first resource request comprises the target network topology information;
receiving a first resource response of the distributed current limiter;
processing according to the first resource response;
the target network topology information is sent to the DataNode by a client, and a NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
7. The method of claim 6,
the target network topology information is carried in a flow limiting definition throttleDefine field of the second data processing request;
and carrying the target network topology information in a flow limiting definition (throttleDefine) field of the first resource request.
8. The method of claim 6, wherein the processing according to the first resource response comprises:
if the first resource response comprises a first identifier or the first resource response comprises transmission resources determined according to the resources requested by the DataNode, writing first data corresponding to the second data processing request into the DataNode and sending the first data to a downstream DataNode, or sending second data corresponding to the second data processing request to a client; wherein the first identifier is used for indicating that no current limiting is required;
and if the first resource response comprises a second identifier, sending the first resource request to the distributed current limiter again after a preset time, wherein the second identifier is used for indicating that the DataNode is not allowed to process the current data processing request.
9. An information processing method characterized by comprising:
receiving a first data processing request of a client;
sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode;
receiving a second data processing request of a client, wherein the second data processing request comprises the target network topology information;
performing current-limiting analysis according to the second data processing request to obtain a current-limiting analysis result;
and processing according to the current limiting analysis result.
10. An information processing apparatus applied to a distributed current limiter, comprising:
the system comprises a first receiving module, a second receiving module and a sending module, wherein the first receiving module is used for receiving a first resource request sent by a DataNode in an HDFS cluster, and the first resource request comprises target network topology information;
the first processing module is used for carrying out current limiting analysis according to the first resource request to obtain a current limiting analysis result;
a first sending module, configured to send a first resource response to the DataNode according to the current limiting analysis result;
the target network topology information is sent to the DataNode by a client, and a name node NameNode in an HDFS cluster sends the target network topology information to the client according to a first data processing request of the client; the target network topology information is network topology information of a DataNode used for processing a second data processing request of the client.
11. An information processing system, comprising: NameNode, at least one DataNode and distributed current limiter in the HDFS cluster;
the NameNode is used for receiving a first data processing request of a client and sending first information to the client according to the first data processing request, wherein the first information comprises information of a target DataNode for processing a second data processing request of the client and target network topology information of the target DataNode;
the DataNode is configured to receive a second data processing request of the client, where the second data processing request includes the target network topology information; sending a first resource request to a distributed current limiter according to the second data processing request, wherein the first resource request comprises the target network topology information; receiving a first resource response of the distributed current limiter; processing according to the first resource response;
the distributed current limiter is used for receiving a first resource request sent by the DataNode and carrying out current limiting analysis according to the first resource request to obtain a current limiting analysis result; and sending a first resource response to the DataNode according to the current limiting analysis result.
12. An electronic device, comprising: a memory, a processor, and a program stored on the memory and executable on the processor; characterized in that the processor, for reading the program implementation in the memory, comprises the steps in the information processing method according to any one of claims 1 to 9.
13. A readable storage medium storing a program, wherein the program realizes, when executed by a processor, a step included in the information processing method according to any one of claims 1 to 9.
CN202011308062.0A 2020-11-20 2020-11-20 Information processing method, device, system, equipment and readable storage medium Active CN114598708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011308062.0A CN114598708B (en) 2020-11-20 2020-11-20 Information processing method, device, system, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011308062.0A CN114598708B (en) 2020-11-20 2020-11-20 Information processing method, device, system, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114598708A true CN114598708A (en) 2022-06-07
CN114598708B CN114598708B (en) 2024-04-26

Family

ID=81802265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011308062.0A Active CN114598708B (en) 2020-11-20 2020-11-20 Information processing method, device, system, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114598708B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059310A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Virtualization-Aware Data Locality in Distributed Data Processing
KR20150091843A (en) * 2014-02-04 2015-08-12 삼성전자주식회사 Distributed processing system and method of operating the same
CN107896175A (en) * 2017-11-30 2018-04-10 北京小度信息科技有限公司 Collecting method and device
KR20190109638A (en) * 2018-03-05 2019-09-26 울산과학기술원 Method for scheduling task in big data analysis platform based on distributed file system, program and computer readable storage medium therefor
CN111309612A (en) * 2020-02-16 2020-06-19 苏州浪潮智能科技有限公司 Distributed file system based data current limiting test method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059310A1 (en) * 2012-08-24 2014-02-27 Vmware, Inc. Virtualization-Aware Data Locality in Distributed Data Processing
KR20150091843A (en) * 2014-02-04 2015-08-12 삼성전자주식회사 Distributed processing system and method of operating the same
CN107896175A (en) * 2017-11-30 2018-04-10 北京小度信息科技有限公司 Collecting method and device
KR20190109638A (en) * 2018-03-05 2019-09-26 울산과학기술원 Method for scheduling task in big data analysis platform based on distributed file system, program and computer readable storage medium therefor
CN111309612A (en) * 2020-02-16 2020-06-19 苏州浪潮智能科技有限公司 Distributed file system based data current limiting test method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
权恒星;魏学才;王漫;: "基于软件定义网络的分布式文件系统设计", 计算机工程, no. 05, pages 3 - 5 *

Also Published As

Publication number Publication date
CN114598708B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
US11586673B2 (en) Data writing and reading method and apparatus, and cloud storage system
US9971823B2 (en) Dynamic replica failure detection and healing
US10637916B2 (en) Method and device for storage resource allocation for video cloud storage
CN111200657B (en) Method for managing resource state information and resource downloading system
CN111880936B (en) Resource scheduling method, device, container cluster, computer equipment and storage medium
CN107105050B (en) Storage and downloading method and system for service objects
BR112017005646B1 (en) COMPOSITE PARTITION FUNCTIONS
CN107172214B (en) Service node discovery method and device with load balancing function
IL278825A (en) Data migration methods and system
US20170153909A1 (en) Methods and Devices for Acquiring Data Using Virtual Machine and Host Machine
CN105224244A (en) The method and apparatus that a kind of file stores
CN110417741B (en) Method and device for filtering security group
CN111694639A (en) Method and device for updating address of process container and electronic equipment
CN107493309B (en) File writing method and device in distributed system
CN113411363A (en) Uploading method of image file, related equipment and computer storage medium
CN101483668A (en) Network storage and access method, device and system for hot spot data
CN111147226B (en) Data storage method, device and storage medium
CN101146107B (en) A method and device for data download
CN114598708A (en) Information processing method, device, system, equipment and readable storage medium
CN115914404A (en) Cluster flow management method and device, computer equipment and storage medium
CN111131497B (en) File transmission method and device, electronic equipment and storage medium
CN111857548B (en) Data reading method, device and system
CN108306859B (en) Method, apparatus and computer-readable storage medium for limiting server access volume
CN113873052B (en) Domain name resolution method, device and equipment of Kubernetes cluster
CN114826919B (en) SDN-based load balancing software nanotube method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant