CN114327909A - Data processing method, data processing device, computer equipment and storage medium - Google Patents

Data processing method, data processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN114327909A
CN114327909A CN202210006923.2A CN202210006923A CN114327909A CN 114327909 A CN114327909 A CN 114327909A CN 202210006923 A CN202210006923 A CN 202210006923A CN 114327909 A CN114327909 A CN 114327909A
Authority
CN
China
Prior art keywords
data
node
storage
transferred
received
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210006923.2A
Other languages
Chinese (zh)
Inventor
梁海昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202210006923.2A priority Critical patent/CN114327909A/en
Publication of CN114327909A publication Critical patent/CN114327909A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data processing method, a data processing device, computer equipment and a storage medium. The method comprises the following steps: under the condition of receiving a data balance command, acquiring the data storage quantity of each storage node in the data storage system, determining a node to be transferred and a node to be received according to the data storage quantity of each storage node, starting a corresponding number of data transmission threads in a storage server according to a balance thread parameter in the data balance command, and performing data transfer through a plurality of data transmission threads so as to improve the data balance efficiency. The number of the data transmission threads in the data balancing process is not adjusted by modifying the configuration items of the HDFS cluster, so that the HDFS cluster does not need to be restarted, the number of the data transmission threads is adjusted by a data balancing command, the number of the data transmission threads started by the storage server in the data balancing process is flexibly and dynamically configured, the service interruption of the HDFS cluster due to restarting is avoided, and the data balancing speed can be dynamically adjusted.

Description

Data processing method, data processing device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
In the big data technology, HDFS (hadoop distribution File system) is generally used as a data storage system, the HDFS is mainly composed of two basic components, Namely Namenode (NN) and Datanode (DN), the NN node is mainly responsible for maintaining the meta information of the stored data, and the DN node is used for storing the actual data File.
A plurality of DN nodes are arranged in an HDFS cluster, the phenomenon of unbalanced distribution of data quantity stored on each DN node can exist in the actual operation process, a user can control the number of data balance threads started on the DN nodes by modifying configuration items related to the number of the data balance threads in a configuration file, generally, the more threads are started, the faster the data move, and the faster the cluster data reach balance; however, the configuration item is read only once when the HDFS is started, if the parameter needs to be adjusted subsequently, the HDFS cluster needs to be restarted, the service will be interrupted in the restarting process, and in a large-scale cluster, the time spent for restarting once may be more than 1 hour, and the execution is not allowed for a cluster with busy service, so the number of data balance threads started on the DN node cannot be dynamically adjusted according to the actual data size.
Disclosure of Invention
In order to solve the technical problem, the application provides a data processing method, a data processing device, a computer device and a storage medium.
In a first aspect, the present embodiment provides a data processing method, including:
under the condition of receiving a data balance command, acquiring the data storage quantity of each storage node in a data storage system, wherein the data balance command comprises a balance thread parameter;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node, wherein the node to be transferred and the node to be received are different storage nodes;
and starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
In a second aspect, the present embodiment provides a data processing apparatus, including:
the data acquisition module is used for acquiring the data storage quantity of each storage node in the data storage system under the condition of receiving a data balance command, wherein the data balance command comprises a balance thread parameter;
a node determining module, configured to determine a node to be transferred and a node to be received according to the data storage amount of each storage node, where the node to be transferred and the node to be received are different storage nodes;
and the data balance module is used for starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
In a third aspect, the present embodiment provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:
under the condition of receiving a data balance command, acquiring the data storage quantity of each storage node in a data storage system, wherein the data balance command comprises a balance thread parameter;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node, wherein the node to be transferred and the node to be received are different storage nodes;
and starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
In a fourth aspect, the present embodiments provide a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:
under the condition of receiving a data balance command, acquiring the data storage quantity of each storage node in a data storage system, wherein the data balance command comprises a balance thread parameter;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node, wherein the node to be transferred and the node to be received are different storage nodes;
and starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
Based on the data processing method, under the condition that a data balance command is received, the data storage capacity of each storage node in the data storage system is obtained, a node to be transferred and a node to be received are determined according to the data storage capacity of each storage node, the node to be transferred and the node to be received are different storage nodes, the node to be transferred refers to the storage node to which partial data are to be transferred, the node to be received refers to the storage node to which partial data are to be received, the data balance command comprises a balance thread parameter, a corresponding number of data transmission threads in the storage server are started according to the balance thread parameter in the data balance command, and data transfer is simultaneously carried out through a plurality of data transmission threads, so that the data balance efficiency is improved. The number of the data transmission threads in the data balancing process is not adjusted by modifying the configuration items of the HDFS cluster, so that the HDFS cluster does not need to be restarted, the number of the data transmission threads is adjusted by a data balancing command, the number of the data transmission threads started by the storage server in the data balancing process is flexibly and dynamically configured, the service interruption of the HDFS cluster due to restarting is avoided, and the data balancing speed can be dynamically adjusted.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a diagram of an application environment of a data processing method in one embodiment;
FIG. 2 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 3 is a block diagram of a data processing apparatus according to an embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 1 is a diagram of an application environment of a data processing method in one embodiment. Referring to fig. 1, the data processing method is applied to a data processing system. The data processing system includes clients 110 and a data storage system 120, the data storage system 120 indicating a HDFS cluster. The client 110 and the HDFS cluster are connected via a network. The client 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
The HDFS cluster includes an NN node 121 and a DN node 122, where the NN node is used to maintain metadata of all files and directories in the HDFS cluster, i.e., two most important relationships for managing the HDFS cluster: 1. directory file tree, i.e. the correspondence between files and data blocks; 2. the corresponding relationship between the DN node and the data block, that is, the data block is stored in which DN nodes, and when the DN node is started, the corresponding relationship between the DN node and the data block stored therein is reported to the NN node, so that the start of the HDFS cluster may take a relatively long time. And the NN node is also used for coordinating the access of the client to the file, the NN node inquires and determines the DN node and the data block where the target file is located according to the access request of the client to the target file, generates a response to the DN node and the data block where the target file is located and returns the response to the client, and the client accesses the DN node and the data block where the target file is located according to the response returned by the NN node, so that the target file is downloaded and obtained.
The DN node stores the data block and executes commands issued by the NN node and/or the terminal, such as operations of copying, deleting, and the like. That is, the HDFS cluster is a file system for storing files and locating the files through a directory file tree, and the HDFS cluster is distributed, and a plurality of servers are combined to realize functions thereof, and the servers in the HDFS cluster have respective roles.
The HDFS cluster further includes a balance node 123, where the balance node 123 is configured to instruct a server carrying an HDFS balance tool, and is configured to balance data stored in each DN node in the HDFS cluster, and finally, it is expected that data in each DN node can be evenly distributed, and the balance node is replaced by a balance node as described below.
In one embodiment, referring to FIG. 2, a data processing method is provided. This embodiment is mainly exemplified by the method applied to the balancing node in the HDFS cluster in fig. 1, where the data processing method specifically includes the following steps:
step S210, in case of receiving the data balancing command, acquiring the data storage amount of each storage node in the data storage system.
In this embodiment, the data balancing command includes a balancing thread parameter, the data balancing command is a command initiated by a user to an NN node in the HDFS cluster through a client, the NN node triggers a balance node when receiving the data balancing command initiated by the user, the balance node acquires a data storage amount of each storage node through the NN node after starting the balance node, the storage node is a DN node, and the data storage amount is a data amount stored in the storage node.
Step S220, determining a node to be transferred and a node to be received according to the data storage amount of each storage node.
In this embodiment, the node to be transferred and the node to be received are different storage nodes, the node to be transferred refers to a storage node to which part of data is to be transferred, and the node to be received refers to a storage node to which part of data is to be stored, that is, part of data in the node to be transferred is transferred to the node to be received, so that data can be distributed on all DN nodes in the HDFS cluster in a balanced manner.
Step S230, starting a corresponding number of data transmission threads in the storage server according to the balanced thread parameter.
In this embodiment, the NN node forwards the balance thread parameters in the data balance command to the HDFS balance tool, so that the balance thread number corresponding to the balance node is updated to the value corresponding to the balance thread parameters, and the balance thread number is used to indicate the number of data transmission threads started by the balance node at the node to be transferred. If the balance thread number corresponding to the balance node needs to be modified, when the balance node is started, the balance thread parameter in the data balance command is assigned to the balance thread number corresponding to the balance node, and the number of the data transmission threads can be dynamically adjusted without restarting the HDFS cluster.
The node to be transferred and the node to be received are respectively positioned on different server hosts, the storage server is a server containing the node to be transferred, the balance node starts a corresponding number of data transmission threads in the storage server according to the balance thread parameters, and part of data in the node to be transferred in the storage server is synchronously and concurrently transferred to the node to be received through the started plurality of data transmission threads, so that the data balance efficiency is improved. The number of the data transmission threads in the data balancing process is not adjusted by modifying the configuration items of the HDFS cluster, so that the HDFS cluster does not need to be restarted, the number of the data transmission threads is adjusted by a data balancing command, the number of the data transmission threads started by the storage server in the data balancing process is flexibly and dynamically configured, the service interruption of the HDFS cluster due to restarting is avoided, and the data balancing speed can be dynamically adjusted.
In one embodiment, the determining a node to be transferred and a node to be received according to the data storage amount of each storage node, that is, step S220, further includes: determining an average storage capacity according to the data storage capacity of each storage node; taking the storage node with the data storage capacity larger than the average storage capacity as the node to be transferred; and taking the storage node with the data storage capacity smaller than the average storage capacity as the node to be received.
In this embodiment, the balance node obtains the average storage amount of all DN nodes through statistics according to the data storage amount uploaded to the NN node by the DN nodes, where the average storage amount is used to indicate a data balance standard, that is, the data storage amounts of all DN nodes in the HDFS cluster are made to reach the average storage amount, so as to achieve data balance. The DN nodes with the data storage amount larger than the average storage amount indicate that the data storage amount in the DN nodes is more, and partial data needs to be transferred to reach the average storage amount, so that the DN nodes are used as the nodes to be transferred; the DN node with the data storage amount smaller than the average storage amount represents that the data storage amount in the DN node is less, and a receiving part of data is needed to reach the average storage amount, so that the DN node is used as a node to be received. And the DN node with the data storage quantity equal to the average storage quantity indicates that the DN node does not need to carry out data transfer operation.
In one embodiment, before starting the corresponding number of data transmission threads according to the balanced thread parameter, i.e. before step S230, the method further includes: obtaining the transfer data volume according to the difference value between the data storage volume and the average storage volume; starting a corresponding number of data transmission threads in the storage server according to the balanced thread parameter, that is, step S230 includes: starting a corresponding number of data transmission threads in the storage server according to the balance thread parameter, wherein the number of the data transmission threads corresponding to the balance thread parameter is at least 1; and transmitting the data corresponding to the transferred data volume in the node to be transferred to the node to be received through the started data transmission thread.
In this embodiment, the transferred data amount is obtained by subtracting the data storage amount and the average storage amount, when the data storage amount is greater than the average storage amount, the transferred data amount is a positive number, it indicates that a DN node corresponding to the data storage amount is a node to be transferred, data transmission threads corresponding to a balance thread parameter are simultaneously started at the node to be transferred, the number of the data transmission threads corresponding to the balance thread parameter is N, N is greater than or equal to 1, and the node to be transferred simultaneously transmits data corresponding to the transferred data amount to a node to be received through N data transmission threads, so as to improve data transfer efficiency and further improve data balance efficiency.
And under the condition that the data storage amount is less than the average storage amount, the transferred data amount is a negative number, the DN node corresponding to the data storage amount is the node to be received, and the node to be received needs to receive the data corresponding to the transferred data amount.
In one embodiment, after the started data transmission thread transmits data corresponding to the transfer data volume in the node to be transferred to the node to be received, the method further includes: and stopping transmitting data to the node to be received when the data storage amount of the node to be received is detected to reach the average storage amount.
In this embodiment, in the data balancing process, a DN node that changes reports update information to an NN node in real time, the NN node forwards the update information to a balance node, where the update information includes a data storage amount, a data block, and file information stored in each data block, and the balance node stops transferring data to a node to be received when determining that the data storage amount of the node to be received reaches an average storage amount according to the update information reported by each DN node that changes, and transfers data to a node to be received whose other data storage amount does not reach the average storage amount. Similarly, when the balance node determines that the data storage amount of the node to be transferred reaches the average storage amount according to the update information, the node to be transferred is stopped transferring data to the node to be received.
In one embodiment, in the event that the data balancing command is not received, the method further comprises: the method comprises the steps of periodically obtaining the data storage capacity of each storage node in a data storage system according to a preset period; acquiring configuration thread parameters under the condition that the data storage system is determined to be in a data storage unbalance state according to the data storage quantity of each storage node; determining a node to be transferred and a node to be received according to the data storage capacity of each storage node; and starting a corresponding number of data transmission threads in the storage server according to the configuration thread parameters, wherein the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
In this embodiment, when a data balancing command is not received from a client, the HDFS cluster needs to self-check whether data distribution on all DN nodes is balanced, all DN nodes in the HDFS cluster actively upload update information to the NN node after changing, the DN nodes periodically upload node information to the NN node according to a preset period even when not changing, and the content of the node information is the same as the update information, i.e., the node information includes the data storage amount, the data blocks, and the file information stored in each data block, the preset period can be freely defined by modifying the configuration file, and illustratively, the preset period is set to 20s, that is, the NN node may obtain the node information of all DN nodes once every 20s, so as to determine whether the data distribution in all DN nodes is balanced according to the node information or update information uploaded by each DN node.
And when the data storage system is determined to be in a data storage unbalance state according to the data storage amount of each DN node, triggering the start of the balance node, and enabling the balance node to acquire a configuration thread parameter from the NN node, wherein the configuration thread parameter is used for indicating the number of data transmission threads required to be started when the data in the configuration file is balanced, and is equivalent to the number of data transmission threads required to be started when the data in the data storage system is balanced by default. The data storage system is in a data storage unbalanced state, which indicates that data distribution of all DN nodes in the HDFS cluster is unbalanced, and data balance processing is required, the node to be transferred and the node to be received are determined according to the data storage amount of each storage node in the above embodiment, because a data balance command of a client is not received at this time, but data balance is required through self-checking, a value corresponding to a configuration thread parameter is assigned to a balance thread number corresponding to a balance node, the balance node starts a corresponding number of data transmission threads at the node to be transferred according to a default configuration thread parameter of the system, and similarly, partial data in the node to be transferred is transferred to the node to be received through the data transmission threads of the corresponding number of configuration thread parameters.
In one embodiment, the determining that the data storage system is in a data storage imbalance state according to the data storage amount of each storage node includes: determining the difference of data storage quantity between each storage node to obtain a corresponding data quantity difference value; determining that the data storage system is in a data storage imbalance state if there is at least one of the data volume differences that is greater than or equal to a threshold.
In this embodiment, the balance node calculates, according to the data storage amount of each storage node, a difference between the data storage amounts of each DN node and other DN nodes to obtain a plurality of data amount difference values, where the data amount difference value is greater than or equal to a threshold value, and indicates that the data storage amount difference between two DN nodes corresponding to the data amount difference value is large, that is, the data storage distribution in the data storage system is unbalanced, and then it is determined that the data storage system is in a data storage unbalanced state. And if the data quantity difference value is not greater than or equal to the threshold value, the data storage distribution in the data storage system is balanced, and the data storage system is judged to be in a data storage balanced state.
The threshold value can be customized according to the actual requirement on the data balance frequency, the larger the threshold value is, the more difficult the data storage system is to self-check to determine the data storage unbalanced state, and under the condition that a data balance command is not received, the lower the data balance frequency of the data storage system for self-checking and data balancing is; the smaller the threshold value is, the more strict the judgment standard of the data storage system for data storage unbalance is, the easier the self-check is to judge the data storage unbalance state, and the data balance frequency of the data storage system for performing the self-check on the data balance is higher under the condition that the data balance command is not received.
In one embodiment, the obtaining the configuration thread parameter includes: reading system configuration information under the condition of starting the data storage system, wherein the system configuration information comprises configuration thread parameters.
In this embodiment, the configuration thread parameter in the configuration file may also be modified by the user, but the configuration file is read only when the HDFS cluster is started, so that the configuration thread parameter in the configuration file needs to be read only when the HDFS cluster is restarted, but the service is interrupted when the HDFS cluster is restarted, and in a large-scale cluster, a large amount of time may be required for restarting once, and data service cannot be provided during the restart process, so that the number of starting data transmission threads during data balance is not generally allowed to be dynamically adjusted by modifying the configuration thread parameter in the configuration file. However, the number of balance threads corresponding to the balance node is modified through the data balance command in the above embodiment, so that the starting number of the data transmission threads is dynamically adjusted according to the data change in each DN node without restarting the HDFS cluster.
In one embodiment, before determining that the data storage amount of each storage node in the data storage system is in an unbalanced state, the method comprises: and under the condition that an operation instruction is received, determining that the data storage quantity of each storage node in the data storage system is in an unbalanced state.
In this embodiment, the operation instruction is used to add or remove a storage node in the data storage system, and in the actual operation process of the HDFS cluster, an operation instruction to add or remove a DN node in the HDFS cluster may be received, and the added DN node may not store data or store less data compared with existing DN nodes in the HDFS cluster, that is, the data storage amount of the added DN node is greatly different from the data storage amount of existing DN nodes in the HDFS cluster; the data stored by the removed DN node will be transferred to other DN nodes, resulting in a large difference in data storage between the DN node receiving its transferred data and other DN nodes. That is, when an operation instruction is received, the data stored in each DN node in the HDFS cluster is large or small, and the data distribution stored in each DN node is unbalanced, at this time, balancing operation needs to be performed on the data in each DN node through a balance node, that is, part of the data in the DN nodes with large data volume is migrated to the DN nodes with small data volume, so that the data can be distributed on all the DN nodes in the HDFS cluster in a balanced manner.
FIG. 2 is a flow diagram illustrating a data processing method according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided a data processing apparatus applied to a balancing node in an HDFS cluster, including:
a data obtaining module 310, configured to obtain data storage amounts of storage nodes in a data storage system when a data balancing command is received, where the data balancing command includes a balancing thread parameter;
a node determining module 320, configured to determine a node to be transferred and a node to be received according to the data storage amount of each storage node, where the node to be transferred and the node to be received are different storage nodes;
a data balancing module 330, configured to start a corresponding number of data transmission threads in a storage server according to the balancing thread parameter, where the storage server is a server including the node to be transferred, and the started data transmission threads are used to transfer part of data in the node to be transferred in the storage server to the node to be received.
In one embodiment, the node determining module 320 is further configured to:
determining an average storage capacity according to the data storage capacity of each storage node;
taking the storage node with the data storage capacity larger than the average storage capacity as the node to be transferred;
and taking the storage node with the data storage capacity smaller than the average storage capacity as the node to be received.
In one embodiment, before starting the corresponding number of data transmission threads according to the balanced thread parameter, the apparatus further includes a difference determining module configured to:
obtaining the transfer data volume according to the difference value between the data storage volume and the average storage volume;
the data balancing module 330 is further configured to:
starting a corresponding number of data transmission threads in the storage server according to the balance thread parameter, wherein the number of the data transmission threads corresponding to the balance thread parameter is at least 1;
and transmitting the data corresponding to the transferred data volume in the node to be transferred to the node to be received through the started data transmission thread.
In an embodiment, after the started data transmission thread transmits the data corresponding to the transfer data amount in the node to be transferred to the node to be received, the data balancing module 330 is further configured to:
and stopping transmitting data to the node to be received when the data storage amount of the node to be received is detected to reach the average storage amount.
In one embodiment, in the case that the data balancing command is not received, the data acquisition module 310 is further configured to:
the method comprises the steps of periodically obtaining the data storage capacity of each storage node in a data storage system according to a preset period;
acquiring configuration thread parameters under the condition that the data storage system is determined to be in a data storage unbalance state according to the data storage quantity of each storage node;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node;
and starting a corresponding number of data transmission threads in the storage server according to the configuration thread parameters, wherein the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
In one embodiment, the data acquisition module 310 is further configured to:
determining the difference of data storage quantity between each storage node to obtain a corresponding data quantity difference value;
determining that the data storage system is in a data storage imbalance state if there is at least one of the data volume differences that is greater than or equal to a threshold.
In one embodiment, the data acquisition module 310 is further configured to:
reading system configuration information under the condition of starting the data storage system, wherein the system configuration information comprises configuration thread parameters.
In one embodiment, before determining that the data storage amount of each storage node in the data storage system is in an unbalanced state, the data obtaining module 310 is further configured to:
determining that the data storage quantity of each storage node in the data storage system is in an unbalanced state under the condition that an operation instruction is received, wherein the operation instruction is used for adding or removing the storage nodes in the data storage system.
FIG. 4 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the HDFS cluster in fig. 1. As shown in fig. 4, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by the processor, causes the processor to implement the data processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the data processing apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 4. The memory of the computer device may store various program modules constituting the data processing apparatus, such as the data acquisition module 310, the node determination module 320, and the data balancing module 330 shown in fig. 3. The computer program constituted by the respective program modules causes the processor to execute the steps in the data processing method of the respective embodiments of the present application described in the present specification.
The computer device shown in fig. 4 may execute, by the data obtaining module 310 in the data processing apparatus shown in fig. 3, obtaining the data storage amount of each storage node in the data storage system in a case of receiving a data balancing command, where the data balancing command includes a balancing thread parameter. The computer device may determine, by the node determining module 320, a node to be transferred and a node to be received according to the data storage amount of each storage node, where the node to be transferred and the node to be received are different storage nodes. The computer device may execute, by using the data balancing module 330, starting a corresponding number of data transmission threads in a storage server according to the balancing thread parameter, where the storage server is a server including the node to be transferred, and the started data transmission threads are used to transfer part of data in the node to be transferred in the storage server to the node to be received.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the above embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the method of any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing the relevant hardware through a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method of data processing, the method comprising:
under the condition of receiving a data balance command, acquiring the data storage quantity of each storage node in a data storage system, wherein the data balance command comprises a balance thread parameter;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node, wherein the node to be transferred and the node to be received are different storage nodes;
and starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
2. The method according to claim 1, wherein the determining the nodes to be transferred and the nodes to be received according to the data storage amount of each storage node comprises:
determining an average storage capacity according to the data storage capacity of each storage node;
taking the storage node with the data storage capacity larger than the average storage capacity as the node to be transferred;
and taking the storage node with the data storage capacity smaller than the average storage capacity as the node to be received.
3. The method of claim 2, wherein before starting a corresponding number of data transfer threads according to the balanced thread parameter, the method further comprises:
obtaining the transfer data volume according to the difference value between the data storage volume and the average storage volume;
starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, comprising:
starting a corresponding number of data transmission threads in the storage server according to the balance thread parameter, wherein the number of the data transmission threads corresponding to the balance thread parameter is at least 1;
and transmitting the data corresponding to the transferred data volume in the node to be transferred to the node to be received through the started data transmission thread.
4. The method according to claim 3, wherein after transmitting the data corresponding to the transfer data amount in the node to be transferred to the node to be received through the started data transmission thread, the method further comprises:
and stopping transmitting data to the node to be received when the data storage amount of the node to be received is detected to reach the average storage amount.
5. The method of claim 1, wherein in a case where the data balancing command is not received, the method further comprises:
the method comprises the steps of periodically obtaining the data storage capacity of each storage node in a data storage system according to a preset period;
acquiring configuration thread parameters under the condition that the data storage system is determined to be in a data storage unbalance state according to the data storage quantity of each storage node;
determining a node to be transferred and a node to be received according to the data storage capacity of each storage node;
and starting a corresponding number of data transmission threads in the storage server according to the configuration thread parameters, wherein the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
6. The method of claim 5, wherein determining that the data storage system is in a data storage imbalance state according to the data storage amount of each storage node comprises:
determining the difference of data storage quantity between each storage node to obtain a corresponding data quantity difference value;
determining that the data storage system is in a data storage imbalance state if there is at least one of the data volume differences that is greater than or equal to a threshold.
7. The method of claim 5, wherein obtaining configuration thread parameters comprises:
reading system configuration information under the condition of starting the data storage system, wherein the system configuration information comprises configuration thread parameters.
8. The method of claim 4, wherein prior to determining that the amount of data storage at each storage node in the data storage system is in an unbalanced state, the method comprises:
determining that the data storage quantity of each storage node in the data storage system is in an unbalanced state under the condition that an operation instruction is received, wherein the operation instruction is used for adding or removing the storage nodes in the data storage system.
9. A data processing apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring the data storage quantity of each storage node in the data storage system under the condition of receiving a data balance command, wherein the data balance command comprises a balance thread parameter;
a node determining module, configured to determine a node to be transferred and a node to be received according to the data storage amount of each storage node, where the node to be transferred and the node to be received are different storage nodes;
and the data balance module is used for starting a corresponding number of data transmission threads in a storage server according to the balance thread parameters, wherein the storage server is a server containing the nodes to be transferred, and the started data transmission threads are used for transferring part of data in the nodes to be transferred in the storage server to the nodes to be received.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented when the computer program is executed by the processor.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202210006923.2A 2022-01-05 2022-01-05 Data processing method, data processing device, computer equipment and storage medium Pending CN114327909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210006923.2A CN114327909A (en) 2022-01-05 2022-01-05 Data processing method, data processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210006923.2A CN114327909A (en) 2022-01-05 2022-01-05 Data processing method, data processing device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114327909A true CN114327909A (en) 2022-04-12

Family

ID=81024089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210006923.2A Pending CN114327909A (en) 2022-01-05 2022-01-05 Data processing method, data processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114327909A (en)

Similar Documents

Publication Publication Date Title
US10896102B2 (en) Implementing secure communication in a distributed computing system
US10579364B2 (en) Upgrading bundled applications in a distributed computing system
US11086725B2 (en) Orchestration of heterogeneous multi-role applications
US10642694B2 (en) Monitoring containers in a distributed computing system
US11586374B2 (en) Index lifecycle management
US9489443B1 (en) Scheduling of splits and moves of database partitions
US11113158B2 (en) Rolling back kubernetes applications
JP5798248B2 (en) System and method for implementing a scalable data storage service
CN109992354B (en) Container processing method, device, main body server, system and storage medium
US11347684B2 (en) Rolling back KUBERNETES applications including custom resources
US20150229715A1 (en) Cluster management
US10515228B2 (en) Commit and rollback of data streams provided by partially trusted entities
CN111897558A (en) Kubernets upgrading method and device for container cluster management system
US10620871B1 (en) Storage scheme for a distributed storage system
CN107229649B (en) Data update system and method
CN113094076A (en) Version iteration method, device, equipment and medium based on version control
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
CN114356504A (en) Data migration method and device in cluster, electronic equipment and storage medium
CN112069152A (en) Database cluster upgrading method, device, equipment and storage medium
CN114327909A (en) Data processing method, data processing device, computer equipment and storage medium
CN116594734A (en) Container migration method and device, storage medium and electronic equipment
CN116010446A (en) Database switching method, device, equipment and medium
CN115129789A (en) Bucket index storage method, device and medium of distributed object storage system
CN115033551A (en) Database migration method and device, electronic equipment and storage medium
CN114416689A (en) Data migration method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination