CN111782582A

CN111782582A - Data conversion method, system and name node

Info

Publication number: CN111782582A
Application number: CN201910515774.0A
Authority: CN
Inventors: 黄涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2020-10-16

Abstract

The disclosure provides a data conversion method, a data conversion system and a name node, and relates to the field of data processing. The method comprises the following steps: the name node acquires a duplicate file; converting the duplicate file into a file task to be converted; and sending the file task to be converted to the data node so that the data node converts the duplicate file into the erasure code file based on the file task to be converted. According to the data conversion method and device, data conversion can be independently completed without depending on other systems, and the data conversion efficiency is improved.

Description

Data conversion method, system and name node

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a data conversion method, system, and name node.

Background

The HDFS (Hadoop Distributed File System) generally adopts a copy File storage mode, which guarantees the integrity of data by using three copy characteristics, but the three copy storage strategies occupy too much storage space. Therefore, an HDFS EC (Erasure Coding) technology is introduced to reduce the storage space and ensure the integrity of data.

In the related art, the MapReduce technology in Hadoop is generally used to transfer cold data from a copy to EC storage by means of data copy. The YARN cluster is required to be built for supporting the operation of the MapReduce application program in the related data copying scheme, and a timing scheduling system is required to be introduced for ensuring the timely conversion of data.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a data conversion method, system and name node, which can independently complete data conversion without depending on other systems, thereby improving data conversion efficiency.

According to an aspect of the present disclosure, a data conversion method is provided, including: the name node acquires a duplicate file; converting the duplicate file into a file task to be converted; and sending the file task to be converted to the data node so that the data node converts the duplicate file into the erasure code file based on the file task to be converted.

In one embodiment, the name node acquires the attribute information of the duplicate file in response to the success of file conversion of the data node; and setting the attribute information of the erasure code file according to the attribute information of the duplicate file.

In one embodiment, the name node responds to the success of file conversion of the data node, judges whether the current attribute information of the duplicate file is consistent with the corresponding attribute information when the task of the file to be converted is sent, and if not, converts the duplicate file into the task of the file to be converted again.

In one embodiment, the name node responds to the success of file conversion of the data node, writes and locks the copy file, then moves the copy file to a temporary directory, moves the erasure code file to an original copy file directory, and performs log recording in the conversion history; and if an abnormal condition occurs in the moving process, the copy file is moved back to the original copy file directory.

In one embodiment, if the name node detects that the data node does not return response information after exceeding the preset time, or determines that the failure times of the data node in executing the conversion task exceed a threshold value, the name node marks the data node as an abnormal data node.

In one embodiment, obtaining the replica file comprises: and periodically scanning files in a preset directory to obtain copy files, and/or calling consumption classes to obtain newly added copy files in real time.

In one embodiment, the method further comprises at least one of the following steps of filtering the duplicate file: the name node compares the duplicate file with the converted file, and filters the converted file in the duplicate file; filtering files which do not accord with the path condition in the duplicate files by the name node; filtering files which do not accord with the file attribute condition in the duplicate files by the name node; the name node uses the file extension attribute to process the mark on the duplicate file.

In one embodiment, the name node adds the duplicate files meeting the condition into a file queue to be converted; polling a file queue to be converted to determine a copy file to be converted; selecting a data node according to the conversion selection target strategy; packaging the duplicate file to be converted into a file task to be converted based on the file conversion command; and sending the file task to be converted to a file conversion task queue in a data node descriptor corresponding to the data node so that the data node converts the duplicate file into an erasure code file according to the file task to be converted.

In one embodiment, after the name node exits abnormally and restarts, the name node reads the log record in the conversion history, compares the log record with a preset directory to remove the duplicate, and converts the duplicate-removed duplicate file into a file task to be converted.

According to another aspect of the present disclosure, there is also provided a name node, including: a file acquisition unit configured to acquire a duplicate file; the task conversion unit is configured to convert the duplicate file into a file task to be converted; and the task dispatching unit is configured to send the file task to be converted to the data node so that the data node converts the auxiliary file into the erasure code file.

In one embodiment, the attribute setting unit is configured to, in response to the data node successfully converting the file, obtain attribute information of the replica file, and set the attribute information of the erasure code file according to the attribute information of the replica file.

In one embodiment, the atomic exchange unit is configured to respond to the success of file conversion of the data node, add a write lock on the duplicate file and then move the duplicate file to the temporary directory, move the erasure code file to the original duplicate file directory, and log the erasure code file in the conversion history; and if an abnormal condition occurs in the moving process, the copy file is moved back to the original copy file directory.

In one embodiment, the file filtering unit is configured to perform at least one of comparing the duplicate files with the converted files, filtering out the files which are converted in the duplicate files and the files which do not meet the file size, filtering out the files which do not meet the path condition in the duplicate files, filtering out the files which do not meet the file attribute condition in the duplicate files, and performing processed marking on the duplicate files by using the file extension attributes.

According to another aspect of the present disclosure, there is also provided a name node, including: a memory; and a processor coupled to the memory, the processor configured to perform the method as described above based on instructions stored in the memory.

According to another aspect of the present disclosure, there is also provided a data conversion system, including: a name node; and the data node is configured to convert the duplicate file into the erasure code file based on the file task to be converted issued by the name node.

According to another aspect of the present disclosure, a computer-readable storage medium is also proposed, on which computer program instructions are stored, which instructions, when executed by a processor, implement the above-described method.

Compared with the prior art, the name node is improved, so that the name node converts the duplicate file into the file task to be converted after acquiring the duplicate file, and sends the file task to be converted to the data node, so that the data node converts the duplicate file into the erasure code file based on the file task to be converted. The embodiment can independently complete data conversion without depending on other systems, and improves the data conversion efficiency.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a schematic flow chart diagram illustrating an embodiment of a data conversion method according to the present disclosure.

Fig. 2 is a schematic flow chart of another embodiment of the data conversion method of the present disclosure.

Fig. 3 is a schematic structural diagram of an embodiment of a node according to the present disclosure.

Fig. 4 is a schematic structural diagram of another embodiment of a node according to the present disclosure.

FIG. 5 is a schematic block diagram of an embodiment of a data conversion system according to the present disclosure.

Fig. 6 is a schematic structural diagram of another embodiment of a node according to the present disclosure.

Fig. 7 is a schematic structural diagram of another embodiment of a node according to the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In the related technology, a set of YARN clusters need to be maintained when data conversion is realized, and other timing scheduling systems are introduced, but the timing scheduling systems cannot sense newly added data in real time and cannot realize the real-time conversion of the newly added data. In addition, when the external timing scheduling system performs data conversion, file metadata information needs to be requested from the name node, which results in low conversion efficiency.

At step 110, the name node obtains a duplicate file. The name node can monitor the duplicate files in a preset directory in real time or call consumption classes to obtain the newly added duplicate files in real time.

In one embodiment, a replica file may be obtained by periodically scanning (Scanner) files in a directory tree specified by a configuration fs.ttl.dir.configurations using a ConverteManager; or, each time a file is added, when the user finishes writing the file and the file state is complete, triggering the FSNamesystem (file system kernel class) to call a completeFile method to complete the final metadata update of the file, and when the update is successful, calling an add method of a Consumer class to obtain a new copy file.

At step 120, the name node converts the replica file into a file to be converted task.

In one embodiment, the duplicate files meeting the conditions are added into a file queue to be converted, the file queue to be converted is polled to determine the duplicate files to be converted, then data nodes are selected according to a conversion selection target strategy, and the duplicate files to be converted are packaged into a file task to be converted based on a file conversion command.

In step 130, the name node sends the task of the file to be converted to the data node, so that the data node converts the duplicate file into the erasure code file based on the task of the file to be converted.

In one embodiment, the file tasks to be converted are sent to a file conversion task queue in a data node descriptor corresponding to the data node, and the data node may retrieve a specified number of file tasks to be converted from the file conversion task queue and then convert the replica file into the erasure code file based on the file tasks to be converted.

In the above embodiment, the name node is improved so that after the name node acquires the duplicate file, the duplicate file is converted into a file task to be converted, and the file task to be converted is sent to the data node, so that the data node converts the duplicate file into the erasure code file based on the file task to be converted. The embodiment can independently complete data conversion without depending on other systems, and the data read by the name node and the conversion task dispatching on the data are all processed based on the metadata without operating on real data, so that the data conversion efficiency is improved.

At step 210, the conversion manager in the name node obtains a replica file. The conversion manager may obtain the replica file using a scanning mode or a consumer mode, or a combination of both, depending on configuration choice.

Because the scanning mode has a lack of real-time performance, and the consumer mode cannot process the existing files, the lock competition problem needs to be processed when the scanning mode and the consumer mode are used simultaneously. When operations such as owner modification and extended attribute modification are performed on metadata of a file, an exclusive file write lock needs to be acquired, and a shared read lock needs to be acquired when the file is read, wherein after the file is locked by the read lock, the write lock cannot be added, but the read lock can be added, namely, the file can be read by a plurality of users. In addition, if a directory or a file is to be created or deleted, a write lock is required to be added, and it is ensured that only the directory or the file of the HDFS can be operated by one user at one time, that is, in addition to the file needing to be added with a read-write lock, a read-write lock is also required to be added to the file system.

In step 220, the conversion filter of the name node performs filtering processing on the duplicate files to obtain duplicate files meeting the conditions, and adds the duplicate files meeting the conditions into the file queue to be converted. Wherein, the obtained copy file needs to be sent to ConverterFilters for filtering. Wherein one or more of the ConverterDefaultFilter, ConverterPathFilter, ConverterAtttrFilter, and ConverterXAttrFilter can be configured according to the actual situation.

The converter Default Filter can be used for comparing the copy files with the converted files and filtering the converted files in the copy files. For example, the duplicate file is compared with INodeId, a timestamp and a file size of the converted file list, if the duplicate file is not converted, the duplicate file can be input to a next filter for filtering, and the problem that the file is processed repeatedly can be solved through the steps. In addition, duplicate files which do not meet the file size can be excluded by the ConverterDefaultFilter.

The ConverterPathFilter can be used for filtering out files which do not meet the path condition in the copy files. When the duplicate file is converted, only the file under a specific path, such as the file scanned by a scanner, needs to be eliminated if the path condition is not met. And filtering files which do not meet the file attribute condition in the copy file by using the ConvertetTrFilter. And processing and marking the copy file by using the file extension attribute by using the ConverterXAttrFilter, so that the file is prevented from being processed repeatedly, and when the file is not processed successfully, the corresponding extension attribute can be cancelled.

The files to be converted which meet the conditions can be selected by filtering the duplicate files through the conversion filter. In this embodiment, by configuring the category of the specified filter, more filtering rules can be loaded without restarting the name node, that is, richer policies are provided for filtering the data to be converted.

At step 230, the ConverteDefaultResolver thread of the name node polls the queue of files to be converted to determine the replica file to be converted. When the ConverteManager is started, the ConverteDefaultResolver thread is started at the same time.

In step 240, a data node is selected according to the conversion selection target policy, and the replica file to be converted is encapsulated into a file task to be converted based on a file conversion command (filecovercommand). For example, a data node may be selected according to data node capacity, data node busyness, or a random selection policy. And only one strategy can be configured for the name node at the same time period, so that the consistency of selection is ensured.

The FileConvertCommand key fields comprise a private final String jobId, a private final String nsId, a private final Path srcPath and a private final Path destPath, wherein the data node responds to the name node conversion result by using the jobId field, the data node determines which name node to read the file from by using the nsId field, the Path srcPath field identifies the original file Path, and the Path destPath field identifies the converted file storage Path.

In step 250, the file task to be converted is sent to a file conversion task queue in a data node descriptor (dataodedescriptor) corresponding to the data node.

In step 260, the data node converts the replica file into an erasure code file according to the file task to be converted. The data node may retrieve a specified number of file tasks to be converted from the file conversion task queue of the dataodedescribe.

In step 270, the conversion monitor determines whether the data node returns a successful conversion response, if so, step 280 is performed, otherwise, step 290 is performed. The data node responds to the name node with the convertttaskifo class regardless of whether the conversion task was successful. Wherein, the ConvertTaskInfo key field includes private final StringjobId, private final StringtaskId, ConvertTaskStatus status and private int errorCode. Wherein jobId identifies the conversion task index Id, taskId identifies the task Id, ConvertTaskStausstatus identifies the conversion status, for example, success, failure, timeout, processing, etc., and int error code identifies the return code of the conversion command.

Wherein, a conversion monitor (ConvertMonitor) thread can be started, and the ConvertMonitor thread can monitor the dispatch of tasks, the redistribution of failed tasks and the busy degree of data nodes.

In step 280, the conversion manager obtains the attribute information of the duplicate file, and sets the attribute information of the erasure code file according to the attribute information of the duplicate file. For example, a user group, a timestamp, an extension attribute, and the like of the copy file are read, the creation time, the access time, the extension attribute, and the like of the converted erasure code file are set, and the consistency of the attributes of the copy file and the erasure code file is ensured to prevent other users from accessing the file.

The HDFS system sets user group permission, time stamps and the like for directories and files, and when the existing timing scheduling system is introduced to achieve data conversion, name nodes need to open more file attributes to a task scheduling system, namely super permission is given, so that the safety risk of the files is increased. In the embodiment, the file conversion is performed in the name node, so that the attributes before and after the file conversion can be ensured to be consistent, the conversion is ensured to be transparent to users, and larger authority does not need to be opened to the outside, and the safety of the file can be improved.

In step 281, the copy file is moved to a temporary directory after being written and locked, the erasure code file is moved to the original copy file directory, and log recording is performed in the conversion history; and if an abnormal condition occurs in the moving process, the copy file is moved back to the original copy file directory.

The ConvertHistory class manages the state information of the conversion tasks, and is responsible for adding and deleting the states of the conversion tasks and loading the conversion history records.

In the prior art, when the converted erasure code data is moved to the source directory, the program may throw out an abnormal situation that the file cannot be found, and in this embodiment, the name node adds a preparation operation log to the converthhistory, and then performs an atomic file moving operation, that is, by means of a temporary directory, a write lock is applied to the original copy file, and the original copy file is moved to the temporary directory, and the converted erasure code file is moved to the original copy file directory. If no exception occurs in the process, adding an operation ending log to ConvertHistory; and if the exception occurs, the original copy file is moved back to the original directory, and the task is added to the list to be converted for redoing. The original copy file is locked by writing, so that other users cannot modify metadata such as attributes of the file and cannot write the file, namely, the access of other users can be blocked for a short time but the users cannot perceive the file, and therefore uninterrupted service can be provided for the users.

In the process, the mobile is not a real data file, but the metadata information is exchanged in the name node, in the moving process, the exchange metadata occupies a write lock, and the metadata in the name node can be modified after the lock is released, so that uninterrupted service can be provided for a user through atomic operation, and the data security is improved.

In addition, after the name node dispatches the task of the file to be converted to the data node, the duplicate file may be modified during the period of responding to the name node by the data node, therefore, the name node also judges whether the current attribute information of the duplicate file is consistent with the corresponding attribute information when the task of the file to be converted is sent, and if not, the duplicate file is converted into the task of the file to be converted again. If the process is executed by the external scheduling system, the external scheduling system needs to request the file metadata information from the name node, which not only burdens the name node, but also prolongs the reflection time, for example, the reflection time of the external scheduling system may be in the second-level unit, and the name node only needs the millisecond-level unit, so that the name node is executed more quickly.

At step 290, the conversion manager in the name node reads the log record in the conversion history, compares the log record with the predetermined directory to remove duplicate, and re-executes step 210.

In the process of executing conversion, artificial interruption is allowed, and under the condition that the conversion function is abnormal, the file cannot be damaged, and the service provided for the user cannot be interrupted. For example, the operation log is recorded by converthhistory, and the log is stored in the HDFS system. When an exception occurs, for example, the name node itself exits abnormally or exits abnormally due to external factors, the newly running ConvertManager reads the log recorded by the convertmesthistory, compares the log with the scanned directory, and starts a new round of conversion work after deduplication.

In the embodiment, the fault tolerance guarantee is provided for the conversion process through log records, when the conversion process is interrupted or the name node is restarted, the number of the directories to be converted is scanned, the converted files are filtered according to the logs, and the data are ensured not to be converted repeatedly.

In one embodiment, if the name node detects that the data node does not return response information after exceeding the preset time, or determines that the failure times of the data node in executing the conversion task exceed a threshold value, the name node marks the data node as an abnormal data node. For example, if a data node does not report a heartbeat impact to a name node for a long time, the data node may be an abnormal node.

Fig. 3 is a schematic structural diagram of an embodiment of a node according to the present disclosure. The name node includes a file acquiring unit 310, a task converting unit 320, and a task dispatching unit 330.

The file acquisition unit 310 is configured to acquire a duplicate file. The duplicate files in the preset directory can be monitored in real time or the consumption class can be called to obtain the newly added duplicate files in real time.

The task converting unit 320 is configured to convert the replica file into a file task to be converted.

The task dispatch unit 330 is configured to send the task of the file to be converted to the data node so that the data node converts the secondary file into the erasure code file.

In the above embodiment, the name node is improved so that after the name node acquires the duplicate file, the duplicate file is converted into a file task to be converted, and the file task to be converted is sent to the data node, so that the data node converts the duplicate file into the erasure code file based on the file task to be converted. The embodiment can independently complete data conversion without depending on other systems, and improves the data conversion efficiency.

Fig. 4 is a schematic structural diagram of another embodiment of a node according to the present disclosure. The name node includes an attribute setting unit 410 in addition to the file acquiring unit 310, the task converting unit 320, and the task assigning unit 330.

The attribute setting unit 410 is configured to, in response to a data node conversion success, obtain attribute information of the replica file, and set attribute information of the erasure code file according to the attribute information of the replica file. For example, a user group, a timestamp, an extension attribute, and the like of the copy file are read, the creation time, the access time, the extension attribute, and the like of the converted erasure code file are set, and the consistency of the attributes of the copy file and the erasure code file is ensured to prevent other users from accessing the file.

In another embodiment, the name node further comprises an atomicity exchange unit 420 configured to, in response to a successful file conversion by the data node, move the replica file to the temporary directory after write-locking, move the erasure code file to the original replica file directory, and log in a conversion history; and if an abnormal condition occurs in the moving process, the copy file is moved back to the original copy file directory. The original copy file can be added with a write lock, so that other users cannot modify metadata such as attributes of the file, and cannot write the file, that is, the access operation of other users can be blocked temporarily, but the users cannot perceive the operation, and therefore uninterrupted service can be provided for the users.

In the embodiment, the mobile data is not a real data file, but the metadata information is exchanged in the name node, in the moving process, the exchange metadata occupies a write lock, and the metadata in the name node can be modified after the lock is released, so that uninterrupted service can be provided for a user through atomic operation, and the data security is improved.

In another embodiment, the name node further comprises a file filtering unit 430 configured to perform at least one of comparing the duplicate files with the converted files, filtering out the converted files in the duplicate files and the files that do not satisfy the file size, filtering out the files that do not satisfy the path condition in the duplicate files, filtering out the files that do not satisfy the file attribute condition in the duplicate files, and processing and marking the duplicate files using the file extension attribute.

The file filtering unit 430 is, for example, a conversion filter including coverterdefaultfilter, coverterpathfilter, coverterattrtfilter, coverterxattrfilter, and the like.

In the embodiment, richer strategies can be provided for filtering the data to be converted, so that the file to be converted meeting the conditions is obtained.

The functions implemented by the units in the above embodiments may be implemented by a plurality of managers, as shown in fig. 5, the functions of the file obtaining unit 310, the attribute setting unit 410, and the atomicity exchanging unit 420 in the name node 510 are implemented by a conversion manager, the functions of the task converting unit 320 and the task assigning unit 330 are implemented by a conversion default processor, the function of the file filtering unit 430 is implemented by a conversion filter, and the like.

The transition monitor thread may monitor the dispatch of tasks, the redistribution of failed tasks, and the degree to which the data nodes are busy.

The name node 510 dispatches the task of the file to be converted to the data node 520, and the data node 520 converts the replica file into an erasure code file based on the task of the file to be converted.

In the embodiment, data conversion is independently completed without depending on any other system, newly-added data can be converted in real time, uninterrupted service is provided for a user through atomic operation, the converted file attribute is not modified to ensure that the conversion is transparent for the user, in addition, a fault tolerance mechanism is utilized for data conversion, the data is ensured not to be converted repeatedly, and richer strategies can be provided for filtering the data to be converted.

Fig. 6 is a schematic structural diagram of another embodiment of a node according to the present disclosure. The name node includes a memory 610 and a processor 620, where the memory 610 may be a disk, flash memory, or any other non-volatile storage medium. Memory 1210 is configured to store instructions in the embodiments corresponding to fig. 1-2. Processor 620 is coupled to memory 610 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 620 is configured to execute instructions stored in a memory.

In one embodiment, the name node 700, as also shown in FIG. 7, includes a memory 710 and a processor 720. Processor 720 is coupled to memory 710 by BUS 730. The name node 700 may also be coupled to an external storage device 750 via a storage interface 740 for invoking external data, and may also be coupled to a network or another computer system (not shown) via a network interface 760, which will not be described in detail herein.

In the embodiment, the data instruction is stored in the memory, and the instruction is processed by the processor, so that data conversion can be independently completed without depending on other systems, and the data conversion efficiency is improved.

In another embodiment, a computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the corresponding embodiment of fig. 1-2. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of data conversion, comprising:

the name node acquires a duplicate file;

converting the duplicate file into a file task to be converted; and

and sending the file task to be converted to a data node so that the data node converts the duplicate file into an erasure code file based on the file task to be converted.

2. The data conversion method of claim 1, further comprising:

the name node responds to the success of file conversion of the data node, and obtains attribute information of the duplicate file;

and setting the attribute information of the erasure code file according to the attribute information of the duplicate file.

3. The data conversion method of claim 2, further comprising:

and the name node responds to the success of file conversion of the data node, judges whether the current attribute information of the duplicate file is consistent with the corresponding attribute information when the file task to be converted is sent, and if not, converts the duplicate file into the file task to be converted again.

4. The data conversion method of claim 2, further comprising:

the name node responds to the success of file conversion of the data node, writes and locks the copy file, then moves the copy file to a temporary directory, moves the erasure code file to an original copy file directory, and performs log recording in a conversion history;

and if an abnormal condition occurs in the moving process, moving the copy file back to the original copy file directory.

5. The data conversion method of claim 2, further comprising:

and if the name node detects that the data node does not return response information after exceeding the preset time or determines that the failure times of the data node for executing the conversion task exceed a threshold value, marking the data node as an abnormal data node.

6. The data conversion method according to claim 1, wherein obtaining a replica file comprises:

and periodically scanning files in a preset directory to obtain copy files, and/or calling consumption classes to obtain newly added copy files in real time.

7. The data conversion method of claim 1, further comprising the step of filtering the replica file by at least one of:

the name node compares the duplicate files with converted files, and filters the converted files in the duplicate files;

the name node filters files which do not meet the path condition in the duplicate files;

the name node filters files which do not accord with the file attribute condition in the duplicate files;

and the name node carries out processed marking on the duplicate file by using a file extension attribute.

8. The data conversion method according to any one of claims 1 to 7,

the name node adds the duplicate file meeting the condition into a file queue to be converted;

polling a file queue to be converted to determine a copy file to be converted;

selecting a data node according to the conversion selection target strategy;

packaging the duplicate file to be converted into a file task to be converted based on the file conversion command;

and sending the file task to be converted to a file conversion task queue in a data node descriptor corresponding to the data node, so that the data node converts the duplicate file into an erasure code file according to the file task to be converted.

9. The data conversion method of claim 4, further comprising:

and after the name node exits abnormally and is restarted, reading the log record in the conversion history, comparing the log record with a preset directory to remove the duplicate, and converting the duplicate-removed duplicate file into a file task to be converted.

10. A name node, comprising:

a file acquisition unit configured to acquire a duplicate file;

the task conversion unit is configured to convert the duplicate file into a file task to be converted; and

and the task dispatching unit is configured to send the file task to be converted to a data node so that the data node converts the replica file into an erasure code file.

11. The name node of claim 10, further comprising:

and the attribute setting unit is configured to respond to the success of file conversion of the data node, acquire the attribute information of the duplicate file, and set the attribute information of the erasure code file according to the attribute information of the duplicate file.

12. The name node of claim 11, further comprising:

the atomic exchange unit is configured to respond to the success of file conversion of the data nodes, write and lock the copy files, move the copy files to a temporary directory, move the erasure code files to an original copy file directory, and perform log recording in a conversion history; and if an abnormal condition occurs in the moving process, moving the copy file back to the original copy file directory.

13. The name node of any of claims 10-12, further comprising:

the file filtering unit is configured to perform at least one of comparing the duplicate files with the converted files, filtering out the converted files in the duplicate files, filtering out the files which do not accord with the path condition in the duplicate files, filtering out the files which do not accord with the file attribute condition in the duplicate files, and performing processed marking on the duplicate files by using the file extension attributes.

14. A name node, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of any of claims 1-9 based on instructions stored in the memory.

15. A data conversion system, comprising:

the name node of any of claims 10-14; and

and the data node is configured to convert the duplicate file into the erasure code file based on the file task to be converted issued by the name node.

16. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of claims 1 to 9.