CN111949628B

CN111949628B - Data operation method, device and distributed storage system

Info

Publication number: CN111949628B
Application number: CN201910406607.2A
Authority: CN
Inventors: 刘强; 毛宝龙; 张�林
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2024-05-17
Anticipated expiration: 2039-05-16
Also published as: CN111949628A

Abstract

The invention discloses a data operation method, a data operation device and a distributed storage system, and relates to the technical field of distributed storage. The data operation method comprises the following steps: the name node acquires one or more executable commands corresponding to the erasure code operation; generating an erasure code operation instruction comprising one or more executable commands by the name node, wherein the erasure code operation instruction is an object based on a general instruction base class implementation; the name byte point sends the erasure code operation instruction to the data node so that the data node analyzes the erasure code operation instruction and executes an executable command in the analysis result to realize erasure code operation. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.

Description

Data operation method, device and distributed storage system

Technical Field

The present invention relates to the field of distributed storage technologies, and in particular, to a data operation method and system.

Background

An Erasure Coding (EC) technology is designed for solving the problem that the backup of data in a distributed file system Hadoop (Hadoop Distributed FILE SYSTEM, HDFS) occupies excessive storage space. The erasure coding technique creates redundancy check data for an original file, and can reconstruct source file information through residual data when the data is lost.

Disclosure of Invention

After analysis, the inventor finds that if the erasure code conversion tool is added into the HDFS system, when iteration on-line is performed on the large-scale and ultra-large-scale Hadoop clusters, great time delay is generated,

One technical problem to be solved by the embodiment of the invention is as follows: how to improve the efficiency of the Hadoop cluster for iterative online.

According to a first aspect of some embodiments of the present invention, there is provided a data manipulation method comprising: the name node acquires one or more executable commands corresponding to the erasure code operation; generating an erasure code operation instruction comprising one or more executable commands by the name node, wherein the erasure code operation instruction is an object based on a general instruction base class implementation; the name byte point sends the erasure code operation instruction to the data node so that the data node analyzes the erasure code operation instruction and executes an executable command in the analysis result to realize erasure code operation.

In some embodiments, the erasure code operating instructions further comprise at least one of an environment variable, an external data address.

In some embodiments, the name node generates erasure code operation instructions that include one or more executable commands, including executable commands for performing configuration of the environment variables, and the environment variables.

In some embodiments, the name node generates erasure code operation instructions that include one or more executable commands, including executable commands for downloading, and a dynamic library address.

In some embodiments, in the event that a new erasure code operation is added, the name node generates erasure code operation instructions comprising one or more executable commands, including a download command, and addresses of execution scripts corresponding to the new erasure code operation.

In some embodiments, the name node generates one or more executable commands according to an erasure code operation policy corresponding to the obtained erasure code operation.

In some embodiments, the operating policy is an erasure transcoding policy, the parameters of the erasure transcoding policy including a first number representing the number of primary data preparations and a second number representing the number of parity data preparations; the one or more executable commands include: a read command for reading the original data; a data dividing command for dividing the read original data into a first number of data units; a check data generation command for generating check data according to the read original data; a data copy command for storing each of the first number of data units to the first number of data nodes and storing the check data to the second number of data nodes, respectively.

In some embodiments, the operation policy is a periodic copy number reduction policy, and parameters of the periodic copy number reduction policy include a preset target copy number, a processing period and a step size; the one or more executable commands include: a data removal command for deleting the target data in the data node or transferring the target data in the data node to the recycle bin; and the name node sends an erasure code operation instruction to the data nodes which store the same data and have the same step length under the condition that the time interval of the last checking and copy reduction operation of the current distance reaches a processing period and the copy number of the same data is larger than the target copy number.

According to a second aspect of some embodiments of the present invention, there is provided a data manipulation method comprising: the method comprises the steps that a data node obtains an erasure code operation instruction sent by a name node, wherein the erasure code operation is an object realized based on a general instruction base class and comprises one or more executable commands; analyzing the erasure code operation instruction by the data node to obtain an analysis result, wherein the analysis result comprises one or more executable commands; the data node executes the executable command to implement the erasure code operation.

According to a third aspect of some embodiments of the present invention, there is provided a data manipulation device comprising: an executable command acquisition module configured to acquire one or more executable commands corresponding to the erasure code operation; an erasure code operation instruction generation module configured to generate erasure code operation instructions comprising one or more executable commands, wherein the erasure code operation instructions are objects implemented based on a generic instruction base class; and the sending module is configured to send the erasure code operation instruction to the data node so that the data node analyzes the erasure code operation instruction and executes an executable command in the analysis result to realize erasure code operation.

According to a fourth aspect of some embodiments of the present invention, there is provided a data manipulation device comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing data manipulation methods based on instructions stored in the memory.

According to a fifth aspect of some embodiments of the present invention, there is provided a distributed storage system comprising: a name node comprising any one of the aforementioned data manipulation devices; and the data node is configured to analyze the erasure code operation instruction sent by the name node to obtain an analysis result, and execute one or more executable commands in the analysis result so as to realize erasure code operation.

According to a sixth aspect of some embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data manipulation method of the aforementioned kind.

Some of the embodiments of the above invention have the following advantages or benefits: by the method provided by the embodiment of the invention, the name node can convert the erasure code operation corresponding to the new function on line into one or more executable instructions, and the executable instructions are transmitted to the data node through erasure code operation instructions realized based on the general instruction base class. Because the data node itself has the function of analyzing the general instruction base class, the executable instructions in the data node can be obtained. Therefore, the data node can realize the erasure code operation without paying attention to the erasure code operation. Even if the erasure code function in the HDFS system is updated, the data node does not need to be updated or upgraded. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.

Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of a data manipulation method according to some embodiments of the present invention.

FIG. 2 is a flow chart of a method of erasure code operation in which an environment variable change occurs, according to some embodiments of the invention.

Fig. 3A is a flow chart of a method for introducing new functions of a system according to some embodiments of the invention.

Fig. 3B is a flow chart of a method for introducing new functions of the system according to other embodiments of the present invention.

Fig. 4 is a flow chart of an erasure transcoding method according to some embodiments of the present invention.

FIG. 5 is a flow chart of a method for periodically reducing the number of copies according to some embodiments of the invention.

Fig. 6 is a schematic structural diagram of a data manipulation device according to some embodiments of the present invention.

Fig. 7 is a schematic diagram of a distributed storage system according to some embodiments of the present invention.

Fig. 8 is a schematic structural view of a data manipulation device according to other embodiments of the present invention.

Fig. 9 is a schematic structural view of a data manipulation device according to still further embodiments of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

By adding the erasure code offline conversion tool into the HDFS system, the conversion function from three backups of cold data to EC, the data integrity and consistency check function and the function of periodically reducing the number of copies according to the configurable time information can be automatically realized.

However, the inventor finds out after analysis that, in the process of using the erasure code offline conversion tool, when a name node (NameNode, abbreviated as NN) instructs a data node (DataNode, abbreviated as DN) to perform erasure code operation, an erasure code command based on an abstract class of erasure codes needs to be sent to the data node by the name node. When the erasure code offline conversion tool adds a new function, both the name node and the data node need to be iteratively updated.

However, after updating, there is a significant delay in iterating on the large-scale and ultra-large-scale Hadoop clusters. This is because a cluster can only roll a portion of the data nodes on-line at a time, and if a problem occurs, the software version can be rolled back in time, and meanwhile, the storage and conversion of the data can be ensured not to be affected by the data nodes on-line and off-line suddenly. Thus, upgrading all data nodes within a cluster once takes approximately 1-2 weeks. The time delay can not only increase the compatibility problem among data nodes of different software versions and increase the logic complexity and the online risk of software development, but also aggravate the deep binding and the functional coupling degree of the data nodes and the name nodes.

The inventor solves this problem by changing the interaction mode between the name node and the data node in the process of implementing the erasure code operation. The name node does not send erasure code commands based on erasure code abstract class implementation to the data node, but sends erasure code operation instructions based on general instruction base class implementation which can be supported by the name node and the data node. An embodiment of the data manipulation method of the present invention is described below with reference to fig. 1. In an embodiment of the invention, the data node and the name node are nodes in an HDFS system.

FIG. 1 is a flow chart of a data manipulation method according to some embodiments of the present invention. As shown in fig. 1, the data manipulation method of this embodiment includes steps S102 to S112.

In step S102, the name node obtains one or more executable commands corresponding to the erasure code operation.

The erasure code operation instruction is used for realizing an erasure code related operation instruction, and the erasure code operation includes erasure code conversion, erasure code-based data verification, periodical reduction of the number of copies after erasure codes are generated, and provision of an operation environment for erasure code function realization.

Executable commands are commands that a data node can support without additional upgrades. The executable command may be, for example, a Shell command.

In step S104, the name node generates erasure code operation instructions including the one or more executable commands, wherein the erasure code operation instructions are objects implemented based on a generic instruction base class.

The generic instruction base class can be a class that can be supported by the data node and the name node itself, for example, baseDnCommand classes. The BaseDnCommand class is a Java base class for communication commands between name nodes and data nodes.

In some embodiments, the erasure code operating instructions further comprise at least one of an environment variable, an external data address. The external data may include, for example, scripts, resource packages, and the like.

In step S106, the name byte point sends the erasure code operation instruction to the data node.

In some embodiments, the name node may send erasure code operation instructions to the data node via the heartbeat information.

In step S108, the data node acquires an erasure code operation instruction sent by the name node.

In step S110, the data node parses the erasure code operation instruction to obtain a parsing result, where the parsing result includes one or more executable commands.

For example, the name node may generate erasure code operation instructions by the following functions:

new BaseDnCommand(commandline,script,package,envs)；

Wherein commandline is an executable command; script is script address; the package is a resource package address; envs are environmental variables. The data node detects the values of the respective parameters. If commandline is not null, the data node may execute these executable commands in Shell process form; if the script is not empty, the data node can download the script corresponding to the script and execute the command in the script in the form of Shell process; if the package is not empty, the data node can download a resource package corresponding to the package; if envs is not empty, the data node may parse out the environment variables in envs in a key-value (key-value) manner and configure.

In step S112, the data node executes an executable command to implement an erasure code operation. After the execution is finished, the data node can return the execution result to the name node.

By the method of the embodiment, the name node can convert the erasure code operation corresponding to the new function on line into one or more executable instructions, and transmit the executable instructions to the data node through erasure code operation instructions realized based on the general instruction base class. Because the data node itself has the function of analyzing the general instruction base class, the executable instructions in the data node can be obtained. Therefore, the data node can realize the erasure code operation without paying attention to the erasure code operation. Even if the erasure code function in the HDFS system is updated, the data node does not need to be updated or upgraded. The embodiment of the invention realizes the decoupling of the erasure code function and the data node and the decoupling of the name node and the data node, thereby improving the online efficiency of the system.

In some embodiments, in addition to the executable instructions, an environment variable may be included in the erasure code operation instructions. Thus, the data node may be configured based on the environment variables. An embodiment of the erasure code operation method in which the environment variable change occurs according to the present invention is described below with reference to fig. 2.

FIG. 2 is a flow chart of a method of erasure code operation in which an environment variable change occurs, according to some embodiments of the invention. As shown in fig. 2, the erasure code operating method of this embodiment includes steps S202 to S210.

In step S202, the name node acquires an executable command corresponding to the erasure code operation, where the executable command includes an executable command for performing environment variable configuration.

In step S204, the name node generates erasure code operation instructions including executable commands, as well as environment variables.

In step S206, the name node transmits an erasure code operation instruction to the data node.

In step S208, the data node parses the erasure code operation instruction to obtain an executable command for performing environment variable configuration and environment variables.

In step S210, the data node executes an executable command for performing environment variable configuration based on the environment variable so as to update the environment variable.

In addition, the data node may execute other executable commands based on the updated environment variables.

For example, a name node needs to create a Shell of an HDFS at a data node, but does not want to inherit the environment variables of the HDFS parent process. At this time, the configuration of the environment variable may be performed by the method of the above embodiment.

In some embodiments, in addition to the executable instructions, the analysis result may further include an external data address, where the external data is at least one of a resource package and an execution script. The data node may download the external data according to the external data address such that the data node performs the executable command using the external data. For example, in case the system has added some functions, the data node may implement the new functions by means of external data. An embodiment of the new functional on-line method of the system of the present invention is described below with reference to fig. 3A and 3B.

Fig. 3A is a flow chart of a method for introducing new functions of a system according to some embodiments of the invention. As shown in fig. 3A, the system new function online method of this embodiment includes steps S302 to S310.

In step S302, the name node obtains a new function corresponding executable command, including an executable command for downloading.

In step S304, the name node generates erasure code operation instructions including executable commands, and dynamic library addresses.

In step S306, the name node transmits an erasure code operation instruction to the data node.

In step S308, the data node parses the erasure code operation instruction to obtain an executable command for downloading, and a dynamic library address.

In step S310, the data node executes an executable command for downloading based on the dynamic library address in order to obtain the dynamic library. Thus, the data node may execute other executable commands based on the new dynamic library.

In some embodiments, the dynamic library may be pre-stored on the server.

For example, the process of the data node checking the data needs to be performed by the CPU, exacerbating the CPU load; and the process of transmitting data can also exacerbate network loading. To reduce the load of the data nodes, an integrated library of assembler instructions may be obtained by downloading external data to reduce the load. The data node downloads the dynamic library by executing the executable command package, so that the interface in the downloaded dynamic library is preferentially used in the subsequent operation.

Fig. 3B is a flow chart of a method for introducing new functions of the system according to other embodiments of the present invention. As shown in fig. 3B, the system new function online method of this embodiment includes steps S312 to S322.

In step S312, the name node obtains executable commands corresponding to the new erasure code operation, and the executable commands include a download command.

In step S314, the name node generates an executable command and an erasure code operation instruction of an address of an execution script corresponding to the new erasure code operation.

In step S316, the name node transmits an erasure code operation instruction to the data node.

In step S318, the data node parses the erasure code operation instruction to obtain an executable command for downloading, and an address of an executable script.

In step S320, the data node executes an executable command for downloading based on the address of the executable script so as to obtain the executable script.

In some embodiments, the execution script may be pre-stored on the server.

In step S322, the data node executes the command in the executable script to implement the new erasure code operation.

Of course, the name node may generate the erasure code operation instruction corresponding to the new erasure code operation only through the executable command, as required.

By the method of the embodiment, when the system has a new function on line, the execution script can be stored in advance on a server or other devices. The data node does not need to be updated, and can realize new functions by downloading and executing the execution script. In the process, the data node has no perception on the online of the new function, and the online efficiency of the new function is improved.

In some embodiments, the name node may generate the erasure code operation instruction according to the erasure code operation policy corresponding to the obtained erasure code operation. The operation policy may be transmitted by the user through the client, or may be preset in the system and executed when a preset condition is triggered. The erasure code conversion method and the method of periodically reducing the number of copies are described below with reference to fig. 4 and 5, respectively.

Fig. 4 is a flow chart of an erasure transcoding method according to some embodiments of the present invention. As shown in fig. 4, the erasure code conversion method of this embodiment includes steps S402 to S416.

In step S402, the name node generates a plurality of executable commands including a read command, a data partitioning command, a check data generation command, and a data copy command according to the obtained erasure code conversion policy. The parameters of the erasure code conversion policy include a first number representing the number of primary data preparations and a second number representing the number of parity data preparations.

The executable commands may also include a command to set a path, a command to set an erasure code method (Policy), etc., as needed.

In step S404, the name node generates erasure code operation instructions according to a plurality of executable commands.

In step S406, the name node transmits an erasure code operation instruction to the data node.

In step S408, the data node parses the erasure code operation instruction to obtain a parsing result.

In step S410, the data node executes a read command to read the original data.

In step S412, the data node executes a data division command to divide the read original data into a first number of data units.

In step S414, the data node executes a check data generation command to generate check data of a second number of data units from the read original data.

In step S416, the data node executes a data copy command to store each of a first number of data units of the original data to the first number of data nodes and each of a second number of data units of the check data to the second number of data nodes, respectively.

The erasure code conversion strategy is set to be RS-6-3-1024K, which means that a Reed-Solomon (RS) coding mode is adopted, the number of original data preparation is 6 data units, the check code is 3 data units, and the size of each data unit is 1024K, namely 1MB. The executable instructions obtained by the data node analysis comprise: reading 6MB of original data; dividing the read original data into 6 data units; storing each data unit of the divided original data into 6 first data nodes respectively; generating 3MB check data, namely 3 data units of check data, according to the 6MB original data; each data unit of the check data is stored in 3 second data nodes, respectively.

By the method of the embodiment, the data node can complete the erasure code conversion process by sequentially executing the executable commands in the analysis result, and a special function or command is not required to be defined for the erasure code conversion process. Therefore, when the erasure code function is improved, the data node does not need to be changed, and the online efficiency of the new function is improved.

An embodiment of the method of periodically reducing the number of copies of the present invention is described below with reference to fig. 5.

FIG. 5 is a flow chart of a method for periodically reducing the number of copies according to some embodiments of the invention. As shown in fig. 5, the method for periodically reducing the number of copies of this embodiment includes steps S502 to S510.

In step S502, the name node generates an executable command for deleting a copy according to the acquired periodic copy number reduction policy. The parameters of the periodic copy number reduction strategy comprise a preset target copy number, a processing period and a step size.

In step S504, the name node generates an erasure code operation instruction according to an executable command for deleting duplicates.

In step S506, the name node sends an erasure code operation instruction to the data nodes storing the same data in the number equal to the step size in response to the current time interval from the last check to reduce the copy operation reaching the processing period and the number of copies of the same data being greater than the target number of copies.

In step S508, the data node that receives the erasure code operation instruction parses the erasure code operation instruction to obtain a data removal command.

In step S510, the data node that received the erasure code operation instruction executes a data removal command to delete the copy or transfer the copy into the recycle bin.

For example, the original data stores 3 copies. And generating 1-copy erasure code data through an erasure code conversion strategy, wherein 4 relevant data are stored on the system. Periodically reducing the copy number is to delete 3 copies of the original data periodically according to a configured step size or transfer the copies to a recycle bin. Taking step 1 as an example, after the number of copies is reduced for the first time, 2 copies are stored in the system; after the number of copies is reduced for the second time, 1 copy is stored in the system; after the last reduction of the copy number, the copy number of the original data is 0. According to the configuration, the last copy can be determined to be put into a recycle bin or be deleted directly, if the last copy is deleted directly, only the data after erasure code conversion is left in the system, and other original data are deleted, so that the aim of saving storage space is achieved.

By the method of the embodiment, the data node can complete the process of periodically reducing the number of the copies by sequentially executing the executable commands in the analysis result, and no special function or command is required to be defined. Therefore, when the erasure code function is improved, the data node does not need to be changed, and the online efficiency of the new function is improved.

An embodiment of the data manipulation device of the present invention is described below with reference to fig. 6.

Fig. 6 is a schematic structural diagram of a data manipulation device according to some embodiments of the present invention. As shown in fig. 6, the data operation device 600 of this embodiment includes: an executable command acquisition module 6100 configured to acquire one or more executable commands corresponding to the erasure code operation; an erasure code operation instruction generation module 6200 configured to generate erasure code operation instructions comprising the one or more executable commands, wherein the erasure code operation instructions are objects implemented based on a generic instruction base class; the sending module 6300 is configured to send the erasure code operation instruction to a data node, so that the data node parses the erasure code operation instruction, and executes an executable command in the parsing result to implement erasure code operation.

In some embodiments, erasure code operation instruction generation module 6200 is further configured to generate erasure code operation instructions comprising the one or more executable commands, including executable commands for performing configuration of the environment variables, and the environment variables.

In some embodiments, erasure code operation instruction generation module 6200 is further configured to generate erasure code operation instructions comprising the one or more executable commands, including executable commands for downloading, and a dynamic library address.

In some embodiments, in the event that a new erasure code operation is added, erasure code operation instruction generation module 6200 is further configured to generate erasure code operation instructions comprising the one or more executable commands, including the download command, and an address of an execution script corresponding to the new erasure code operation.

In some embodiments, executable command acquisition module 6100 is further configured to generate one or more executable commands according to an erasure code operation policy corresponding to the acquired erasure code operation.

In some embodiments, the operation policy is an erasure transcoding policy, the parameters of the erasure transcoding policy including a first number representing the number of primary data pieces and a second number representing the number of parity data pieces; the one or more executable commands include: a read command for reading the original data; a data dividing command for dividing the read original data into a first number of data units; a check data generation command for generating check data according to the read original data; a data copy command for storing each of the first number of data units to the first number of data nodes and storing the check data to the second number of data nodes, respectively.

In some embodiments, the operation policy is a periodic copy number reduction policy, and parameters of the periodic copy number reduction policy include a preset target copy number, a processing period and a step size; the one or more executable commands include: a data removal command for deleting the target data in the data node or transferring the target data in the data node to the recycle bin; the sending module 6300 is further configured to send an erasure code operation instruction to a data node storing the same data, the number of which is equal to the step size, in response to a current time interval from a last check to reduce a copy operation reaching a processing period, and a copy number of the same data being greater than the target copy number.

An embodiment of the distributed storage system of the present invention is described below with reference to fig. 7.

Fig. 7 is a schematic diagram of a distributed storage system according to some embodiments of the present invention. As shown in fig. 7, the distributed storage system 70 of this embodiment includes: a name node 710, said name node comprising data manipulation means 720; and a data node 730 configured to parse the erasure code operation instruction sent by the name node to obtain a parsed result, and execute one or more executable commands in the parsed result, so as to implement the erasure code operation. A specific implementation of the data manipulation device 720 may refer to the data manipulation device 600 in the example of fig. 6. Data node 730 may have one or more.

Fig. 8 is a schematic structural view of a data manipulation device according to other embodiments of the present invention. As shown in fig. 8, the data operation device 80 of this embodiment includes: a memory 810 and a processor 820 coupled to the memory 810, the processor 820 being configured to perform the data manipulation method of any of the previous embodiments based on instructions stored in the memory 810.

The memory 810 may include, for example, system memory, fixed nonvolatile storage media, and so forth. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), and other programs.

Fig. 9 is a schematic structural view of a data manipulation device according to still further embodiments of the present invention. As shown in fig. 9, the data operation device 90 of this embodiment includes: memory 910 and processor 920 may also include input/output interfaces 930, network interfaces 940, storage interfaces 950, and so forth. These interfaces 930, 940, 950 and the memory 910 and the processor 920 may be connected by a bus 960, for example. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, and the like. Network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as SD cards, U discs, and the like.

An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any one of the aforementioned data manipulation methods.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method of data manipulation, comprising:

The name node obtains one or more executable commands corresponding to the erasure code operation, including: generating one or more executable commands by the name node according to an erasure code operation strategy corresponding to the acquired erasure code operation, wherein the operation strategy is an erasure code conversion strategy or a periodical copy number reduction strategy, parameters of the erasure code conversion strategy comprise a first number used for representing the number of backup of original data and a second number used for representing the number of backup of check data, and the parameters of the periodical copy number reduction strategy comprise at least one of a preset target copy number, a preset processing period and a preset step length;

Generating an erasure code operation instruction comprising the one or more executable commands by the name node, wherein the erasure code operation instruction is an object implemented based on a general instruction base class;

And the name byte point sends the erasure code operation instruction to the data node so that the data node analyzes the erasure code operation instruction and executes an executable command in an analysis result to realize erasure code operation.

2. The data manipulation method of claim 1, wherein the erasure code manipulation instruction further comprises at least one of an environment variable, an external data address.

3. The data manipulation method of claim 1, wherein the name node generates erasure code manipulation instructions comprising the one or more executable commands, including executable commands for performing configuration of the environment variables, and the environment variables.

4. The data manipulation method of claim 1, wherein the name node generates erasure code manipulation instructions comprising the one or more executable commands, including executable commands for downloading, and a dynamic library address.

5. The data manipulation method of claim 1, wherein in case a new erasure code operation is added, the name node generates erasure code operation instructions including the one or more executable commands, including a download command, and an address of an execution script corresponding to the new erasure code operation.

6. The data manipulation method according to claim 1, wherein,

The operation policy is an erasure transcoding policy, and the one or more executable commands include:

a read command for reading the original data;

a data dividing command for dividing the read original data into a first number of data units;

a check data generation command for generating check data according to the read original data;

a data copy command for storing each of the first number of data units to the first number of data nodes and storing the check data to the second number of data nodes, respectively.

7. The data manipulation method according to claim 1, wherein,

The operating policy is a periodically decreasing copy number policy, and the one or more executable commands include: a data removal command for deleting the target data in the data node or transferring the target data in the data node to the recycle bin;

And the name node sends an erasure code operation instruction to the data nodes which store the same data and have the same number equal to the step length under the condition that the time interval of the last checking and copy reduction operation of the current distance reaches a processing period and the copy number of the same data is larger than the target copy number.

8. A method of data manipulation, comprising:

The method comprises the steps that a data node obtains an erasure code operation instruction sent by a name node, wherein the erasure code operation is an object realized based on a general instruction base class and comprises one or more executable commands, the one or more executable commands are generated according to an erasure code operation strategy corresponding to the erasure code operation, the operation strategy is an erasure code conversion strategy or a periodical reduction copy number strategy, parameters of the erasure code conversion strategy comprise a first number used for representing the number of original data preparation parts and a second number used for representing the number of verification data preparation parts, and the parameters of the periodical reduction copy number strategy comprise at least one of preset target copy numbers, processing periods and step sizes;

the data node analyzes the erasure code operation instruction to obtain an analysis result, wherein the analysis result comprises one or more executable commands;

the data node executes the executable command to implement the erasure code operation.

9. A data manipulation device comprising:

An executable command acquisition module configured to acquire one or more executable commands corresponding to erasure code operations, comprising: generating one or more executable commands by the name node according to an erasure code operation strategy corresponding to the acquired erasure code operation, wherein the operation strategy is an erasure code conversion strategy or a periodical copy number reduction strategy, parameters of the erasure code conversion strategy comprise a first number used for representing the number of backup of original data and a second number used for representing the number of backup of check data, and the parameters of the periodical copy number reduction strategy comprise at least one of a preset target copy number, a preset processing period and a preset step length;

An erasure code operation instruction generation module configured to generate erasure code operation instructions comprising the one or more executable commands, wherein the erasure code operation instructions are objects implemented based on a generic instruction base class;

And the sending module is configured to send the erasure code operation instruction to a data node so that the data node analyzes the erasure code operation instruction and executes an executable command in an analysis result to realize erasure code operation.

10. A data manipulation device comprising:

A memory; and

A processor coupled to the memory, the processor configured to perform the data manipulation method of any of claims 1-7 based on instructions stored in the memory.

11. A distributed storage system, comprising:

a name node comprising the data manipulation apparatus of claim 9 or 10; and

The data node is configured to analyze the erasure code operation instruction sent by the name node to obtain an analysis result, and execute one or more executable commands in the analysis result so as to realize erasure code operation.

12. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data manipulation method of any one of claims 1 to 7.