CN111949628A - Data operation method and device and distributed storage system - Google Patents

Data operation method and device and distributed storage system Download PDF

Info

Publication number
CN111949628A
CN111949628A CN201910406607.2A CN201910406607A CN111949628A CN 111949628 A CN111949628 A CN 111949628A CN 201910406607 A CN201910406607 A CN 201910406607A CN 111949628 A CN111949628 A CN 111949628A
Authority
CN
China
Prior art keywords
data
erasure code
code operation
node
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910406607.2A
Other languages
Chinese (zh)
Other versions
CN111949628B (en
Inventor
刘强
毛宝龙
张�林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910406607.2A priority Critical patent/CN111949628B/en
Priority claimed from CN201910406607.2A external-priority patent/CN111949628B/en
Publication of CN111949628A publication Critical patent/CN111949628A/en
Application granted granted Critical
Publication of CN111949628B publication Critical patent/CN111949628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1012Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore

Abstract

The invention discloses a data operation method, a data operation device and a distributed storage system, and relates to the technical field of distributed storage. The data operation method comprises the following steps: the name node acquires one or more executable commands corresponding to erasure code operation; the name node generates an erasure code operation instruction comprising one or more executable commands, wherein the erasure code operation instruction is an object realized based on a general instruction base class; and the name byte point sends the erasure code operation instruction to the data node so that the data node can analyze the erasure code operation instruction and execute the executable command in the analysis result to realize erasure code operation. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.

Description

Data operation method and device and distributed storage system
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a data operation method and system.
Background
An Erasure Coding (EC) technology is designed to solve the problem that backup of data in a Hadoop Distributed File System (HDFS) occupies too much storage space. Erasure coding techniques can reconstruct source file information from remaining data when data is lost by creating redundancy check data for the original file.
Disclosure of Invention
After the analysis, the inventor finds that if the erasure code conversion tool is added into the HDFS system, when iteration is carried out on a large-scale and super-large-scale Hadoop cluster, great time delay is generated,
the embodiment of the invention aims to solve the technical problem that: how to improve the efficiency of the Hadoop cluster for iterative online.
According to a first aspect of some embodiments of the present invention, there is provided a data manipulation method, comprising: the name node acquires one or more executable commands corresponding to erasure code operation; the name node generates an erasure code operation instruction comprising one or more executable commands, wherein the erasure code operation instruction is an object realized based on a general instruction base class; and the name byte point sends the erasure code operation instruction to the data node so that the data node can analyze the erasure code operation instruction and execute the executable command in the analysis result to realize erasure code operation.
In some embodiments, the erasure code operation instruction further includes at least one of an environment variable, an external data address.
In some embodiments, the name node generates erasure code operation instructions that include one or more executable commands, including executable commands for environment variable configuration, and an environment variable.
In some embodiments, the name node generates an erasure code operation instruction that includes one or more executable commands, including an executable command for downloading, and a dynamic library address.
In some embodiments, with the addition of a new erasure code operation, the name node generates an erasure code operation instruction including one or more executable commands, including a download command, and an address of an execution script corresponding to the new erasure code operation.
In some embodiments, the name node generates one or more executable commands according to the erasure code operation policy corresponding to the acquired erasure code operation.
In some embodiments, the operation policy is an erasure code conversion policy, and parameters of the erasure code conversion policy include a first number for indicating the backup number of the original data and a second number for indicating the backup number of the check data; the one or more executable commands include: a read command for reading original data; a data dividing command for dividing the read original data into a first number of data units; a verification data generation command for generating verification data according to the read original data; and the data copy command is used for respectively storing each of the first number of data units to the first number of data nodes and storing the verification data to the second number of data nodes.
In some embodiments, the operation policy is a periodic copy number reduction policy, and parameters of the periodic copy number reduction policy include preset target copy number, processing period, and step length; the one or more executable commands include: the data removal command is used for deleting the target data in the data node or transferring the target data in the data node to the recycle bin; and the name node sends an erasure code operation instruction to the data nodes which store the same data and have the number equal to the step length in response to the condition that the time interval of the last copy operation of the current distance check is up to the processing period and the copy number of the same data is greater than the target copy number.
According to a second aspect of some embodiments of the present invention, there is provided a data manipulation method, including: the data node acquires an erasure code operation instruction sent by the name node, wherein the erasure code operation is an object realized based on a general instruction base class and comprises one or more executable commands; the data node analyzes the erasure code operation instruction to obtain an analysis result, wherein the analysis result comprises one or more executable commands; the data node executes the executable command to implement the erasure code operation.
According to a third aspect of some embodiments of the present invention, there is provided a data manipulation apparatus comprising: the executable command acquisition module is configured to acquire one or more executable commands corresponding to the erasure code operation; the system comprises an erasure code operation instruction generation module, a erasure code operation instruction generation module and an erasure code operation instruction processing module, wherein the erasure code operation instruction generation module is configured to generate an erasure code operation instruction comprising one or more executable commands, and the erasure code operation instruction is an object realized based on a general instruction base class; and the sending module is configured to send the erasure code operation instruction to the data node so that the data node can analyze the erasure code operation instruction and execute the executable command in the analysis result to realize the erasure code operation.
According to a fourth aspect of some embodiments of the present invention, there is provided a data manipulation apparatus comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing data manipulation methods based on instructions stored in the memory.
According to a fifth aspect of some embodiments of the present invention, there is provided a distributed storage system comprising: the name node comprises any one of the data operation devices; and the data node is configured to analyze the erasure code operation instruction sent by the name node to obtain an analysis result, and execute one or more executable commands in the analysis result so as to realize the erasure code operation.
According to a sixth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned person-to-person data manipulation method.
Some embodiments of the above invention have the following advantages or benefits: by the method, the name node can convert the erasure code operation corresponding to the on-line new function into one or more executable instructions, and transmits the executable instructions to the data node through the erasure code operation instruction realized based on the general instruction base class. Since the data node has the function of analyzing the general instruction base class, the executable instruction in the data node can be obtained. Thus, the data node does not need to be concerned with what kind of erasure coding operation itself is to perform the erasure coding operation. Even if the erasure code function in the HDFS system is updated, the data nodes do not need to be updated and upgraded. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow diagram illustrating a method of data manipulation according to some embodiments of the present invention.
Fig. 2 is a flow diagram illustrating an erasure code operation method in which an environment variable change occurs according to some embodiments of the present invention.
Fig. 3A is a flow chart illustrating a method for enabling new functionality of a system according to some embodiments of the invention.
Fig. 3B is a flowchart illustrating a method for enabling a new function of a system according to another embodiment of the present invention.
Fig. 4 is a flow diagram of an erasure code conversion method according to some embodiments of the invention.
FIG. 5 is a flow diagram illustrating a method for periodically reducing the number of copies, according to some embodiments of the invention.
FIG. 6 is a block diagram of a data manipulation device according to some embodiments of the present invention.
FIG. 7 is a block diagram of a distributed storage system according to some embodiments of the invention.
FIG. 8 is a block diagram of a data manipulation device according to further embodiments of the present invention.
FIG. 9 is a block diagram of a data manipulation device according to further embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
By adding an erasure code offline conversion tool into the HDFS system, the conversion function from three backups of cold data to EC, the data integrity and consistency verification function and the function of periodically reducing the number of copies according to configurable time information can be automatically realized.
However, after the inventor has analyzed, it is found that, in the process of using the erasure code offline conversion tool, when the name node (NameNode, NN) indicates the data node (DataNode, DN), an erasure code command implemented by an abstract class based on an erasure code needs to be sent to the data node by the name node. When the erasure code offline conversion tool is added with new functions, the name node and the data node are required to be updated iteratively.
However, after updating, there is a significant delay in iterating online across large-scale and very large-scale Hadoop clusters. The reason is that one cluster can only roll a part of online data nodes each time, if a problem occurs, the software version can be rolled back in time, and meanwhile, the storage and conversion of data can be guaranteed not to be affected by the data nodes which are suddenly online and offline. Therefore, 1-2 weeks are required for upgrading all the data nodes in the cluster once. The time delay not only can increase the compatibility problem among data nodes of different software versions, increase the logic complexity and online risk of software development, but also can increase the deep binding and functional coupling degree of the data nodes and name nodes.
In order to solve the problem, the inventor changes the interaction mode between the name node and the data node in the process of implementing erasure code operation. The name node does not send an erasure code command realized based on the abstract class of the erasure code to the data node, but sends an erasure code operation command realized based on a general command base class and supported by the name node and the data node. An embodiment of the data manipulation method of the present invention is described below with reference to fig. 1. In an embodiment of the invention, the data node and the name node are nodes in the HDFS system.
FIG. 1 is a flow diagram illustrating a method of data manipulation according to some embodiments of the present invention. As shown in fig. 1, the data manipulation method of this embodiment includes steps S102 to S112.
In step S102, the name node obtains one or more executable commands corresponding to the erasure code operation.
The erasure code operation instruction is used for implementing an erasure code related operation, and the erasure code operation includes, for example, conversion of an erasure code, data verification based on the erasure code, periodic reduction of the number of copies after the erasure code is generated, provision of an operating environment for implementation of an erasure code function, and the like.
The executable command is a command which can be supported by the data node without additional upgrading. The executable command may be, for example, a Shell command.
In step S104, the name node generates an erasure code operation instruction including the one or more executable commands, wherein the erasure code operation instruction is an object implemented based on a generic instruction base class.
The generic instruction base class is a class that the data node and name node themselves can support, and may be, for example, the BaseDnCommand class. The BaseDnCommand class is the Java base class of communication commands between name nodes and data nodes.
In some embodiments, the erasure code operation instruction further includes at least one of an environment variable, an external data address. The external data may include, for example, scripts, resource packages, and the like.
In step S106, the name node sends the erasure code operation instruction to the data node.
In some embodiments, the name node may send an erasure code operation instruction to the data node via heartbeat information.
In step S108, the data node obtains the erasure code operation instruction sent by the name node.
In step S110, the data node parses the erasure code operation instruction to obtain a parsing result, where the parsing result includes one or more executable commands.
For example, the name node may generate an erasure code operation instruction by the following function:
new BaseDnCommand(commandline,script,package,envs);
wherein, commandline is an executable command; script is a script address; the package is the address of the resource packet; envs is an environmental variable. The data node detects the values of the respective parameters. If the commandline is not empty, the data node may execute these executable commands in the form of Shell processes; if the script is not empty, the data node can download the script corresponding to the script and execute the command in the script in the form of the Shell process; if the package is not empty, the data node can download the resource package corresponding to the package; if envs is not null, the data node can analyze the environment variable in envs in a key-value mode and configure the environment variable.
In step S112, the data node executes the executable command to implement the erasure code operation. After execution, the data node may return the execution result to the name node.
Through the method of the embodiment, the name node can convert the erasure code operation corresponding to the on-line new function into one or more executable instructions, and transmit the executable instructions to the data node through the erasure code operation instruction realized based on the general instruction base class. Since the data node has the function of analyzing the general instruction base class, the executable instruction in the data node can be obtained. Thus, the data node does not need to be concerned with what kind of erasure coding operation itself is to perform the erasure coding operation. Even if the erasure code function in the HDFS system is updated, the data nodes do not need to be updated and upgraded. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.
In some embodiments, in addition to executable instructions, environment variables may be included in the erasure code operation instructions. Thus, the data node may be configured based on the environment variable. An embodiment of an erasure code operation method of the present invention in which an environment variable change occurs is described below with reference to fig. 2.
Fig. 2 is a flow diagram illustrating an erasure code operation method in which an environment variable change occurs according to some embodiments of the present invention. As shown in fig. 2, the erasure code operation method of this embodiment includes steps S202 to S210.
In step S202, the name node obtains an executable command corresponding to the erasure code operation, where the executable command includes an executable command for configuring an environment variable.
In step S204, the name node generates an erasure code operation instruction including an executable command and an environment variable.
In step S206, the name node sends an erasure code operation instruction to the data node.
In step S208, the data node parses the erasure code operation instruction to obtain an executable command for environment variable configuration and an environment variable.
In step S210, the data node executes an executable command for environment variable configuration based on the environment variable so as to update the environment variable.
In addition, the data node may also execute other executable commands based on the updated environment variables.
For example, a name node needs to create a Shell for HDFS at a data node, but does not want to inherit the context variables of the HDFS parent process. At this time, the configuration of the environment variable may be performed by the method of the above embodiment.
In some embodiments, in addition to the executable instructions, the parsing result may further include an address of external data, the external data being at least one of a resource package and an execution script. The data node may download the external data according to the external data address so that the data node executes the executable command using the external data. For example, in the case where some functions are added to the system, the data node may implement the new functions by means of external data. An embodiment of the method for on-line of the new function of the system of the present invention is described below with reference to fig. 3A and 3B.
Fig. 3A is a flow chart illustrating a method for enabling new functionality of a system according to some embodiments of the invention. As shown in fig. 3A, the method for enabling the new function of the system of this embodiment includes steps S302 to S310.
In step S302, the name node obtains executable commands corresponding to the new functions, including executable commands for downloading.
In step S304, the name node generates an erasure code operation instruction including an executable command, and a dynamic library address.
In step S306, the name node sends an erasure code operation instruction to the data node.
In step S308, the data node parses the erasure code operation instruction to obtain an executable command for downloading, and a dynamic library address.
In step S310, the data node executes the executable command for downloading based on the dynamic library address to obtain the dynamic library. Thus, the data node may execute other executable commands based on the new dynamic library.
In some embodiments, the dynamic library may be pre-stored on the server.
For example, the process of checking data by the data node needs to be executed by the CPU, which increases the CPU load; and the process of transmitting data can also load the network. In order to reduce the load of the data nodes, an integrated library of assembly instructions can be obtained by downloading external data to reduce the load. The data node downloads the dynamic library by executing the executable command 'package', so that the interface in the downloaded dynamic library is preferentially used in the subsequent operation.
Fig. 3B is a flowchart illustrating a method for enabling a new function of a system according to another embodiment of the present invention. As shown in fig. 3B, the method for enabling the new function of the system of this embodiment includes steps S312 to S322.
In step S312, the name node obtains an executable command corresponding to the new erasure code operation, where the executable command includes a download command.
In step S314, the name node generates an erasure code operation instruction of the executable command and the address of the execution script corresponding to the new erasure code operation.
In step S316, the name node sends an erasure code operation instruction to the data node.
In step S318, the data node parses the erasure code operation instruction to obtain an executable command for downloading and an address of the executable script.
In step S320, the data node executes the executable command for downloading based on the address of the executable script to obtain the executable script.
In some embodiments, the execution script may be pre-stored on the server.
In step S322, the data node executes the command in the executable script to implement the new erasure code operation.
Of course, the name node may also generate an erasure code operation instruction corresponding to the new erasure code operation only through the executable command as needed.
By the method of the embodiment, when the system has a new function, the execution script can be stored in the server or other equipment in advance. The data node does not need to be updated, and new functions can be realized by downloading and executing the execution script. In the process, the data node has no perception on line of the new function, and the on-line efficiency of the new function is improved.
In some embodiments, the name node may generate the erasure code operation instruction according to the erasure code operation policy corresponding to the acquired erasure code operation. The operation policy may be sent by the user through the client, or may be preset in the system and executed when a preset condition is triggered. The erasure code conversion method and the method of periodically reducing the number of copies will be described below with reference to fig. 4 and 5, respectively.
Fig. 4 is a flow diagram of an erasure code conversion method according to some embodiments of the invention. As shown in fig. 4, the erasure code conversion method of this embodiment includes steps S402 to S416.
In step S402, the name node generates a plurality of executable commands including a read command, a data dividing command, a check data generating command, and a data copying command according to the obtained erasure code conversion policy. Parameters of the erasure code conversion strategy include a first number representing the backup number of the original data and a second number representing the backup number of the verification data.
The executable command may further include a command to set a path, a command to set an erasure correction code method (Policy), and the like, as needed.
In step S404, the name node generates an erasure code operation instruction from the plurality of executable commands.
In step S406, the name node sends an erasure code operation instruction to the data node.
In step S408, the data node parses the erasure code operation instruction to obtain a parsing result.
In step S410, the data node executes a read command to read the original data.
In step S412, the data node executes a data division command to divide the read original data into a first number of data units.
In step S414, the data node executes a check data generation command to generate check data of a second number of data units from the read original data.
In step S416, the data node executes the data copy command to store each of the first number of data units of the original data to the first number of data nodes and each of the second number of data units of the verification data to the second number of data nodes, respectively.
The erasure code conversion strategy is set to be RS-6-3-1024K, which means that a Reed-Solomon (RS) coding mode is adopted, the backup number of original data is 6 data units, the parity code is 3 data units, and the size of each data unit is 1024K, namely 1 MB. The executable instructions obtained by the data node analysis comprise: reading 6MB original data; dividing the read original data into 6 data units; respectively storing each data unit of the divided original data into 6 first data nodes; generating 3MB check data, namely check data of 3 data units according to the 6MB original data; each data unit of the check data is stored in 3 second data nodes respectively.
By the method of the above embodiment, the data node can complete the erasure code conversion process by sequentially executing the executable commands in the parsing result without defining a dedicated function or command for the erasure code conversion process. Therefore, when the erasure code function is improved, the data node does not need to be changed, and the on-line efficiency of a new function is improved.
An embodiment of the method of the present invention for periodically reducing the number of copies is described below with reference to fig. 5.
FIG. 5 is a flow diagram illustrating a method for periodically reducing the number of copies, according to some embodiments of the invention. As shown in fig. 5, the method for periodically reducing the number of copies of the embodiment includes steps S502 to S510.
In step S502, the name node generates an executable command for deleting the copy according to the acquired periodic copy number reduction policy. The parameters of the strategy for periodically reducing the copy number comprise preset target copy number, processing period and step length.
In step S504, the name node generates an erasure code operation instruction from the executable command for deleting the copy.
In step S506, in response to that the time interval from the last copy reduction operation of the current check reaches the processing cycle and the number of copies of the same data is greater than the target number of copies, the name node sends an erasure code operation instruction to the data nodes storing the same data and having the number equal to the step size.
In step S508, the data node receiving the erasure code operation instruction parses the erasure code operation instruction to obtain a data removal command.
In step S510, the data node that received the erasure code operation instruction executes a data removal command to delete the copy or transfer the copy to the recycle bin.
For example, the original data stores 3 copies. Erasure code data of 1 copy is generated through an erasure code conversion strategy, and at the moment, 4 pieces of related data are stored on the system. The periodic reduction of the number of copies is to delete 3 copies of the original data periodically or transfer them to the recycle bin according to the configured step size. Taking step length 1 as an example, after the number of copies is reduced for the first time, 2 copies are also stored in the system; after the number of copies is reduced for the second time, 1 copy is stored in the system; after the number of copies is reduced for the last time, the number of copies of the original data is 0. According to the configuration, whether the last copy is put in a recycle bin or deleted directly can be determined, if deleted directly, the data only remains the erasure code converted data in the system, and other original data are deleted, so that the aim of saving the storage space is fulfilled.
By the method of the above embodiment, the data node can complete the process of periodically reducing the number of copies by sequentially executing the executable commands in the analysis result without defining a special function or command. Therefore, when the erasure code function is improved, the data node does not need to be changed, and the on-line efficiency of a new function is improved.
An embodiment of the data manipulation device of the present invention is described below with reference to fig. 6.
FIG. 6 is a block diagram of a data manipulation device according to some embodiments of the present invention. As shown in fig. 6, the data operation apparatus 600 of this embodiment includes: an executable command acquiring module 6100 configured to acquire one or more executable commands corresponding to the erasure code operation; an erasure code operation generation module 6200 configured to generate an erasure code operation instruction including the one or more executable commands, wherein the erasure code operation instruction is an object implemented based on a general instruction base class; a sending module 6300 configured to send the erasure code operation instruction to the data node, so that the data node parses the erasure code operation instruction and executes an executable command in a parsing result to implement an erasure code operation.
In some embodiments, the erasure code operation instruction further includes at least one of an environment variable, an external data address.
In some embodiments, erasure code operation generation module 6200 is further configured to generate an erasure code operation comprising the one or more executable commands and an environment variable, the executable commands comprising an executable command for environment variable configuration.
In some embodiments, erasure code operation generation module 6200 is further configured to generate an erasure code operation comprising the one or more executable commands, including an executable command for downloading, and a dynamic library address.
In some embodiments, where a new erasure code operation is added, erasure code operation generation module 6200 is further configured to generate an erasure code operation including the one or more executable commands and an address of an execution script corresponding to the new erasure code operation, the executable commands including a download command.
In some embodiments, the executable command acquisition module 6100 is further configured to generate one or more executable commands according to an erasure code operation policy corresponding to the acquired erasure code operation.
In some embodiments, the operation policy is an erasure code conversion policy, and parameters of the erasure code conversion policy include a first number for indicating the backup number of the original data and a second number for indicating the backup number of the check data; the one or more executable commands comprise: a read command for reading original data; a data dividing command for dividing the read original data into a first number of data units; a verification data generation command for generating verification data according to the read original data; and the data copy command is used for respectively storing each of the first number of data units to the first number of data nodes and storing the verification data to the second number of data nodes.
In some embodiments, the operation policy is a periodic copy number reduction policy, and parameters of the periodic copy number reduction policy include preset target copy number, processing period, and step length; the one or more executable commands include: the data removal command is used for deleting the target data in the data node or transferring the target data in the data node to the recycle bin; the sending module 6300 is further configured to send erasure code operation instructions to data nodes storing the same data and having a number equal to the step size in response to the time interval from the last copy reduction operation of the current check reaching the processing cycle and the number of copies of the same data being greater than the target number of copies.
An embodiment of the distributed storage system of the present invention is described below with reference to fig. 7.
FIG. 7 is a block diagram of a distributed storage system according to some embodiments of the invention. As shown in fig. 7, the distributed storage system 70 of this embodiment includes: a name node 710, said name node comprising a data manipulation device 720; and a data node 730 configured to parse the erasure code operation instruction sent by the name node to obtain a parsing result, and execute one or more executable commands in the parsing result so as to implement the erasure code operation. The data manipulation device 720 may be embodied as the data manipulation device 600 in the embodiment of fig. 6. There may be one or more of data nodes 730.
FIG. 8 is a block diagram of a data manipulation device according to further embodiments of the present invention. As shown in fig. 8, the data manipulation device 80 of this embodiment includes: a memory 810 and a processor 820 coupled to the memory 810, the processor 820 being configured to execute a data manipulation method in any of the embodiments described above based on instructions stored in the memory 810.
Memory 810 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
FIG. 9 is a block diagram of a data manipulation device according to further embodiments of the present invention. As shown in fig. 9, the data manipulation device 90 of this embodiment includes: the memory 910 and the processor 920 may further include an input/output interface 930, a network interface 940, a storage interface 950, and the like. These interfaces 930, 940, 950 and the memory 910 and the processor 920 may be connected, for example, by a bus 960. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is configured to implement any one of the aforementioned data manipulation methods when executed by a processor.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (13)

1. A method of data manipulation, comprising:
the name node acquires one or more executable commands corresponding to erasure code operation;
generating, by the name node, an erasure code operation instruction including the one or more executable commands, wherein the erasure code operation instruction is an object implemented based on a generic instruction base class;
and the name byte point sends the erasure code operation instruction to a data node so that the data node can analyze the erasure code operation instruction and execute an executable command in an analysis result to realize erasure code operation.
2. The data manipulation method of claim 1, wherein the erasure code manipulation instruction further comprises at least one of an environment variable, an external data address.
3. The data manipulation method of claim 1, wherein a name node generates erasure code manipulation instructions including the one or more executable commands, including executable commands for environment variable configuration, and environment variables.
4. The data manipulation method of claim 1, wherein a name node generates an erasure code manipulation instruction that includes the one or more executable commands, including an executable command for downloading, and a dynamic library address.
5. The data manipulation method of claim 1, wherein, with the addition of a new erasure code operation, the name node generates an erasure code manipulation instruction including the one or more executable commands, including a download command, and an address of an execution script corresponding to the new erasure code operation.
6. The data manipulation method of claim 1, wherein the name node generates one or more executable commands according to the erasure code manipulation policy corresponding to the obtained erasure code manipulation.
7. The data manipulation method of claim 6,
the operation strategy is an erasure code conversion strategy, and parameters of the erasure code conversion strategy comprise a first quantity for representing the backup number of the original data and a second quantity for representing the backup number of the check data;
the one or more executable commands comprise:
a read command for reading original data;
a data dividing command for dividing the read original data into a first number of data units;
a verification data generation command for generating verification data according to the read original data;
and the data copy command is used for respectively storing each of the first number of data units to the first number of data nodes and storing the verification data to the second number of data nodes.
8. The data manipulation method of claim 6,
the operation strategy is a strategy for periodically reducing the number of copies, and parameters of the strategy for periodically reducing the number of copies comprise preset target number of copies, processing period and step length;
the one or more executable commands comprise: the data removal command is used for deleting the target data in the data node or transferring the target data in the data node to the recycle bin;
and the name node sends an erasure code operation instruction to the data nodes which store the same data and have the number equal to the step length in response to the condition that the time interval of the last copy reduction operation from the current distance reaches the processing cycle and the number of copies of the same data is greater than the target number of copies.
9. A method of data manipulation, comprising:
the data node acquires an erasure code operation instruction sent by the name node, wherein the erasure code operation is an object realized based on a general instruction base class and comprises one or more executable commands;
the data node analyzes the erasure code operation instruction to obtain an analysis result, wherein the analysis result comprises one or more executable commands;
the data node executes the executable command to implement an erasure code operation.
10. A data manipulation device, comprising:
the executable command acquisition module is configured to acquire one or more executable commands corresponding to the erasure code operation;
an erasure code operation instruction generation module configured to generate an erasure code operation instruction including the one or more executable commands, wherein the erasure code operation instruction is an object implemented based on a generic instruction base class;
and the sending module is configured to send the erasure code operation instruction to a data node so that the data node can analyze the erasure code operation instruction and execute an executable command in an analysis result to realize erasure code operation.
11. A data manipulation device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the data manipulation method of any of claims 1-8 based on instructions stored in the memory.
12. A distributed storage system, comprising:
a name node comprising the data manipulation device of claim 10 or 11; and
and the data node is configured to analyze the erasure code operation instruction sent by the name node to obtain an analysis result, and execute one or more executable commands in the analysis result so as to realize the erasure code operation.
13. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, implements the data manipulation method of any one of claims 1 to 8.
CN201910406607.2A 2019-05-16 Data operation method, device and distributed storage system Active CN111949628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910406607.2A CN111949628B (en) 2019-05-16 Data operation method, device and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910406607.2A CN111949628B (en) 2019-05-16 Data operation method, device and distributed storage system

Publications (2)

Publication Number Publication Date
CN111949628A true CN111949628A (en) 2020-11-17
CN111949628B CN111949628B (en) 2024-05-17

Family

ID=

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140380088A1 (en) * 2013-06-25 2014-12-25 Microsoft Corporation Locally generated simple erasure codes
CN105404561A (en) * 2015-11-19 2016-03-16 浙江宇视科技有限公司 Erasure code implementation method and apparatus for distributed storage system
CN106603673A (en) * 2016-12-19 2017-04-26 上海交通大学 Fine-grained cloud storage scheduling method based on erasure codes
WO2017107095A1 (en) * 2015-12-23 2017-06-29 Intel Corporation Technologies for adaptive erasure code

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140380088A1 (en) * 2013-06-25 2014-12-25 Microsoft Corporation Locally generated simple erasure codes
CN105404561A (en) * 2015-11-19 2016-03-16 浙江宇视科技有限公司 Erasure code implementation method and apparatus for distributed storage system
WO2017107095A1 (en) * 2015-12-23 2017-06-29 Intel Corporation Technologies for adaptive erasure code
CN106603673A (en) * 2016-12-19 2017-04-26 上海交通大学 Fine-grained cloud storage scheduling method based on erasure codes

Similar Documents

Publication Publication Date Title
CN108170740B (en) Data migration method, system and computer readable storage medium
CN108037961B (en) Application program configuration method, device, server and storage medium
CN104252342B (en) Embedded application implementation method and device capable of configuring parameters
WO2017071494A1 (en) Application deployment method and apparatus
KR102042723B1 (en) Methods for updating applications
US9823915B1 (en) Software container format
KR102281052B1 (en) Mobile terminal application update method and apparatus
US20220317997A1 (en) Online Upgrade Method for Household Appliance Multi-MCU System, Electronic Device and Medium
US11979300B2 (en) Standardized format for containerized applications
CN105389191A (en) Software upgrading method, apparatus and system based on local area network
KR20170016347A (en) Enhanced updating for digital content
CN109542461A (en) Dissemination method, terminal device and the medium of application installation package
CN107291481B (en) Component updating method, device and system
CN105045631A (en) Method and device for upgrading client-side applications
CN110825399B (en) Deployment method and device of application program
CN111176717B (en) Method and device for generating installation package and electronic equipment
CN109753300B (en) Algorithm upgrading method, calculation task sending method and related device
US8516480B1 (en) Enabling offline work in a virtual data center
CN103761107A (en) Software package customizing device and method
CN109145236A (en) Page file processing method, apparatus and system
CN112149035A (en) Website static resource processing method and device
CN110532016B (en) Version management method, version updating method and version management system
CN104536752A (en) Method and device for automatically generating APK files
US20180123899A1 (en) Technology agnostic network simulation
CN111984248A (en) Page editing method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant