CN115168367A

CN115168367A - Data configuration method and system for big data

Info

Publication number: CN115168367A
Application number: CN202211091952.XA
Authority: CN
Inventors: 吕灏; 韩国权; 李庆; 胥月; 黄海峰; 蔡惠民
Original assignee: Taiji Computer Corp Ltd; CETC Big Data Research Institute Co Ltd
Current assignee: Taiji Computer Corp Ltd; CETC Big Data Research Institute Co Ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2022-10-11
Anticipated expiration: 2042-09-07
Also published as: WO2024051027A1; CN115168367B

Abstract

The invention relates to the field of information processing, and particularly discloses a data configuration method and a system for big data, wherein the method comprises the steps of configuring a name node, a data node and a user side, wherein the name node is configured as a central management server, when the data node is set as a storage node, the data synchronization of a main fragment and a sub fragment is completed on the main fragment and the sub fragment stored in the data node through the interaction of confirmation messages, and meanwhile, log event information is set, so that the transmission processing requirements of the information when the data of the main fragment and the sub fragment of the user are inconsistent are met, and the data processing efficiency is improved under the condition of leveling through the log event information.

Description

Data configuration method and system for big data

Technical Field

The invention relates to the field of information processing, in particular to a data configuration method and a data configuration system for big data.

Background

With the improvement of cloud computing capability, mass data processing based on various industries gradually becomes a hot spot of people to pay attention to and research; the application of data mining methods to various fields is a trend.

While valuable information is mined from a large amount of data for management, decision-making and regulation reference, how to ensure the safety and consistency of the data in a mass storage system so that the data can be timely and safely updated when a user processes and accesses the data becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve one of the above problems, the present invention provides a method and a system for configuring big data.

The method comprises the following steps: configuring a name node, a data node and a user side, wherein the name node is configured as a central management server, descriptive metadata is stored in a memory in a list form, and the descriptive metadata responds to the access requirement of the user side on a file and provides internal metadata service;

the data nodes are used for storing data required by a user side, storing the data in a blocking mode, setting the size of each fixed block and performing backup storage; receiving control information forwarded by the name node, performing the work of creating, deleting and copying data blocks under the unified scheduling of the name node, and periodically reporting the control information to the name node;

the user side executes data access through the name node; setting a main fragment and an auxiliary fragment of data in a data node; in order to keep the data consistency between the main fragment and the sub-fragment, the main fragment and the sub-fragment complete the data synchronization of the main fragment and the sub-fragment through the interaction of confirmation messages.

Further, the step of completing the data synchronization of the primary and secondary fragments through the interaction of the confirmation message by the primary and secondary fragments specifically comprises: processing the relational database operation associated with the two fragments on the two fragments simultaneously, when the main fragment needs to submit, sending a submission request to the sub-fragment simultaneously, if the sub-fragment has processed the task, the sub-fragment directly returns an ACK message to the main fragment, and if the sub-fragment has not finished the task, the sub-fragment returns a NACK message to the main fragment to trigger whether waiting is needed or not and to make log record.

Further, the storing data in a block manner, setting the size of each fixed block, and performing backup storage includes: three shares are stored by default, one for the local machine, one for the same rack machine, and one for the other racks.

Further, one name node has at least one backup name node.

Further, the backup name node carries out name node backup at regular time, and normal operation of the cluster is guaranteed through automatic switching.

Further, when the user side creates a file, the user side firstly caches file data to a local temporary file, and when the data accumulated by the temporary file reaches a threshold value, the user side initiates connection with the name node.

Furthermore, each user side sets effective marking state information for the metadata in the storage node, and when a system fault occurs, the effective marking state information before operation is updated and stored in the log.

Further, triggering to indicate whether waiting is needed and logging includes: and setting a back-off value for the consistency of the primary copy in the data node, wherein the back-off value is used for marking the tolerance of the inconsistency between the copy and the primary copy, namely when any one of the primary fragment and the secondary fragment cannot work, allowing the data inconsistency between the primary fragment and the secondary fragment, and allowing the whole transaction of the cluster to be forcibly submitted.

Further, when the name node detects that the number of the copies in the system is lower than a preset copy number threshold of the system, the name node detects a data block contained in the lost copy, and when the system is idle, the missing copy is copied to reach the preset copy threshold; and detecting the operation needing to be recovered from the event log, and then calling the process to complete.

Further, the data storage nodes are arranged to be distributed as copies of virtual shards, a single physical data node is configured to deploy multiple logical shards, and the copy shards of each logical shard are deployed on different physical machines.

According to the scheme, the system comprises the name node, the data node and the user side, wherein the name node is configured as a central management server, when the data node is set as the storage node, data synchronization of the main fragment and the auxiliary fragment is completed on the main fragment and the auxiliary fragment stored by the data node through interaction of confirmation messages, log event information is set at the same time, the transmission processing requirements of the information when the data of the main fragment and the auxiliary fragment of the user are inconsistent are met, and the data processing efficiency is improved under the condition of leveling through the log event information.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way.

FIG. 1 is a schematic representation of the process flow of the present invention.

Detailed Description

These and other features and characteristics of the present invention, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will be better understood by reference to the following description and drawings, which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. It will be understood that the figures are not drawn to scale. Various block diagrams are used in the present invention to illustrate various variations of embodiments according to the present invention.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that "/" herein means "or", for example, A/B may mean A or B; "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone.

It should be noted that, for the convenience of clearly describing the technical solutions of the embodiments of the present application, in the embodiments of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items with basically the same functions or actions, and those skilled in the art can understand that the terms "first", "second", and the like do not limit the quantity and execution order. For example, the first information and the second information are for distinguishing different information, not for describing a specific order of information.

It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "such as" in an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

Example 1

As shown in fig. 1, the present invention provides a data configuration method for big data, where the method includes: configuring a name node, a data node and a user side in a corresponding system, wherein the name node is configured as a central management server, descriptive metadata is stored in a memory in a list form, and the descriptive metadata responds to the access requirement of the user side on a file and provides internal metadata service;

the data nodes are used for storing data required by a user side, storing the data in a blocking mode, setting the size of each fixed block, and performing backup storage; receiving control information forwarded by the name node, performing the work of creating, deleting and copying data blocks under the unified scheduling of the name node, and periodically reporting the control information to the name node;

The data synchronization of the primary and secondary fragments through the interaction of the confirmation message is specifically as follows: processing the relational database operation associated with the two fragments on the two fragments simultaneously, when the main fragment needs to submit, simultaneously sending a submission request to the sub-fragment, if the sub-fragment has already processed the task, the sub-fragment directly returns an ACK message to the main fragment, and if the sub-fragment has not finished the task, the sub-fragment returns a NACK message to the main fragment to trigger whether waiting is needed or not and log recording is carried out.

Illustratively, the system sets a master-slave cluster structure based on an access structure of big data, and the cluster consists of a name node, a backup name node, a plurality of data nodes and a plurality of user terminals.

The name node is a key component in the name node, is used as a central management server in the file system, mainly provides internal metadata service, is responsible for managing a namespace of the file system and responding to the access of a user to a file, saves descriptive metadata of the system into a list form and stores the descriptive metadata in a memory, and the user can quickly access the descriptive metadata. If the name node fails, the entire file system will not be usable because the information of all the data blocks is stored therein, without which the file cannot be reconstructed. And backing up the name nodes, backing up the name nodes at regular time, and ensuring the normal operation of the cluster through automatic switching. The name node includes basic information of the file, a mapping relationship between the file and the data block, and a storage location of the data block therein.

The data node is responsible for storing data of a user, divides a local disk into a plurality of blocks or slices for storing the data, defaults to storing three parts by default, namely one part of the local disk, one part of the local disk on other machines in the same rack and one part of other racks, stores metadata of the blocks and the slices in a memory, performs creation, deletion and duplication of the data blocks under unified scheduling of the name node, and periodically reports the metadata to the name node. The user side is a user interface and is responsible for interacting with the cluster and performing operations such as reading and writing on files.

When the user side wants to create a file, the user side caches file data in a local temporary file, when the data accumulated in the temporary file reaches a threshold value, the user side contacts a name node, when the user side wants to read the file, the user side inquires the storage position of the needed file, the name node returns the address of the data node for storing the data and the addresses of other copies, and the user side can directly transmit the data with the data node and finally interrupt the connection.

Each user end node in the system sets Valid flag state Valid _ flag information for metadata in a data node, and when a system fault occurs, the Valid _ flag state before operation is updated, and the setting of the Valid _ flag is set as a transactional when the setting of the Valid _ flag is submitted in a log pre-writing mode. And for the data generated by the transaction by referring to itself, a multi-version management mechanism for Valid _ flag is adopted to realize the latest reference of the data in self updating.

Setting a main fragment and an auxiliary fragment of data in the storage node; in order to set the data consistency between the main fragment and the sub-fragment, the consistency mechanism of the main fragment and the sub-fragment is realized in a segmented mode. Optionally, the two fragments are processed simultaneously with their associated relational database operations, when the primary fragment needs to be submitted, optionally, a submit request may be sent to the secondary fragment simultaneously according to the above Valid _ flag setting manner, if the secondary fragment has already processed the task, it may directly return an ACK message to the primary fragment, and if the secondary fragment has not yet completed the task, it may return a NACK message to the primary fragment to indicate that waiting is still needed, or to trigger to indicate whether waiting is needed and log recording is made.

Optionally, the waiting is set by the name node with a waiting period, and when the primary segment receives an ACK message sent from the secondary segment, the primary segment instructs the secondary segment to submit updated data.

Optionally, the data on the primary partition is updated first, then the primary partition forwards the data to the secondary partition, and the updated data is transmitted to the secondary partition in a way of incremental transmission, and after the primary partition completes the data update, when the primary partition needs to submit, optionally, a submit request is sent to the secondary partition in a way of Valid _ flag setting as described above, if the secondary partition has already processed the task, it directly returns an ACK message to the primary partition, and if the secondary partition has not completed the task, it returns a NACK message to the primary partition to indicate that waiting is needed, or to trigger whether waiting is needed or not, and log records are made. Illustratively, a back-off value is set in the data node for consistency of the primary and secondary shards, and the back-off value is used for indicating tolerance of inconsistency between the secondary and primary shards, that is, when any shard of the primary shard and the secondary shard cannot work, the system allows data inconsistency between the primary shard and the secondary shard, and allows transactions of the whole cluster to be forcibly submitted.

At this time, on the default or reserved partition of the storage data node, an event log is kept, and an event of the above operation is recorded, wherein the event operation is to record the operation lost on the fault partition, and may be a data updating operation.

And then, when the fragment on the fault node is recovered to be normal, detecting the operation needing to be recovered from the event log, and further calling a process to finish the data leveling of the data on the fault node fragment to the healthy fragment, thereby finishing the data recovery and safety after the fault. And when the user side accesses data, the identification information of the reserved event log is synchronously sent to the user side so as to indicate the data of which the information on the storage node is not successfully synchronized.

For example, when the name node detects that the number of copies in the system is lower than a preset copy number threshold of the system, the name node may quickly detect a data block included in a lost copy, and copy the missing copy when the system is idle to reach the preset copy number threshold. Optionally, the operation to be recovered may also be detected from the event log, so as to invoke a process to complete data leveling from the failed node fragment to the data on the healthy fragments, and assist in completing updating of the copy data.

Optionally, the data storage nodes in the system are set to be distributed as copies of virtual shards, an exemplary 1 physical node may deploy multiple logical shards, and the copy shards of each logical shard may be deployed on different physical machines, and when one physical storage node goes down, the copies of the logical shards on the physical storage node are dispersed on multiple storage nodes, so that it is ensured that loads of the down server and other nodes can not only implement failover, but also implement load balancing of the load on the failed server after failover. Illustratively, each data item specifies a storage node and a storage directory, and the data item is stored on the data node in a file and accessed by a storage unit disposed on the node. If the space of the storage directory of the designated node is insufficient, the main node inquires whether other storage units have storage space, and if so, the storage space is allocated to the data from other data nodes; if a certain storage node needs to be wholly migrated, a name node or a data node serving as a main node needs to suspend a storage request for a data unit, and queries whether other nodes have a proper space to take over the data node configured on the server, if the other nodes have the proper storage position, the data file is synchronized to other servers, a file storage position information change request is sent to the name node, and the name node records change and synchronizes to a virtual logic fragment management unit corresponding to the storage space mapping.

Optionally, when data is written to the allocation, concurrent operation is set, and when data is written to the corresponding fragmented data, if a situation that the file is locked is encountered, the data in another Hash segment can be written to other fragmented files, so that the reduction of data writing performance caused by locking is reduced.

Optionally, each data has three copies, one copy is the main data, the other two copies are backups of the main data, the information of the main copy is modified when the data is changed, and then the data of the other two copies are synchronized to the backup node through the network; if the data server storing the main copy is down or has network fault and other conditions, one copy server is selected according to the load to update the copy content and synchronize the copy, the copy server is changed into the main storage node, and the original server obtains the change information of the label data from the server after being started to resynchronize the data. The main information and the copy information of all the tag data are compressed and then sent to the cluster through the broadcast message, and when the tag data are read, the data are decompressed through the broadcast message, so that the namespace data can be accessed by all the user sides in the whole system.

Optionally, the name node may receive the heartbeat signals and the block reports from the data nodes at regular time intervals, and if the name node verifies that the metadata is abnormal according to the block reports, or the data nodes do not send heartbeat signals on time, the corresponding data nodes may be marked with the identification of downtime. At this point, the system will not send any more I/O requests to the anomalous data node.

The user side needs to interact with the name node and the data node to access the file system. Firstly, a user side can find a name node, the name node can respond to the user side when receiving a request, the user side obtains file metadata, and meanwhile the name node can map data blocks to data nodes through the metadata.

Firstly, when a user side has a request for writing a file, the file is sent to a name node at the first time, the name node exchanges information with a data node after receiving a specific request, specifically, the user side sends file size and configuration information to the name node, and the name node returns address information related to the data node managed by the user side to the user side according to the received information; then, the user side can split the file to be written into a plurality of small data blocks according to the data node address information returned by the name node, and write the small data blocks into the corresponding data nodes in sequence.

Optionally, the user side firstly provides a request for reading the file to the name node, and the name node can quickly return the address information of the data node storing the file requested by the user side to the user side after receiving the request of the user side, so that the user side can smoothly read the file through the address information of the data node.

Those skilled in the art will appreciate that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can include the processes of the embodiments of the methods described above when executed. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk Drive (Hard Disk Drive, abbreviated as HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A data configuration method of big data is characterized in that: the configuration name node, the data node and the user terminal,

the name node is configured as a central management server, descriptive metadata is stored in a memory in a list form, and internal metadata service is provided in response to the access requirement of a user on a file;

the data nodes are used for storing data required by a user side, storing the data in a blocking mode, setting the size of each fixed block and performing backup storage; receiving control information forwarded by the name node, performing creation, deletion and duplication work of the data block under unified scheduling of the name node, and periodically reporting the control information to the name node;

2. The method of claim 1, wherein: the data synchronization of the primary and secondary fragments through the interaction of the confirmation message is specifically as follows: processing the relational database operation associated with the two fragments on the two fragments simultaneously, when the main fragment needs to submit, sending a submission request to the sub-fragment simultaneously, if the sub-fragment has processed the task, the sub-fragment directly returns an ACK message to the main fragment, and if the sub-fragment has not finished the task, the sub-fragment returns a NACK message to the main fragment to trigger whether waiting is needed or not and to make log record.

3. The method of claim 2, wherein: the storing data in a blocking mode, setting the size of each fixed block, and performing backup storage comprises: three shares are stored by default, one for the local machine, one for the same rack machine, and one for the other racks.

4. The method of claim 3, wherein: a name node is configured with at least one backup name node.

5. The method of claim 4, wherein: and the backup name node carries out name node backup at regular time and ensures normal operation through automatic switching.

6. The method of claim 5, wherein: when a user side creates a file, the user side firstly caches file data to a local temporary file, and when the data accumulated by the temporary file reaches a threshold value, the user side initiates connection with the name node.

7. The method of claim 6, wherein: and each user side sets effective marking state information for the metadata in the data nodes, and when a system fault occurs, the effective marking state information before operation is updated and stored in a log.

8. The method of claim 7, wherein: the method is used for triggering whether waiting is needed or not, and logging comprises the following steps: and setting a back-off value for the consistency of the primary fragment and the secondary fragment in the data node, wherein the back-off value is used for marking the tolerance of the inconsistency between the copy and the master, and when any one of the primary fragment and the secondary fragment cannot work, allowing the data inconsistency between the primary fragment and the secondary fragment and allowing the whole transaction of the cluster to be forcibly submitted.

9. The method of claim 8, wherein: when the name node detects that the number of the copies in the system is lower than a preset copy number threshold value, the name node detects a data block contained in a lost copy, and when the system is idle, the name node copies the missing copy to reach the preset copy number threshold value; and detecting the operation needing to be recovered from the event log, and then calling the process to complete.

10. A big data configuration system, the system comprising a name node, a data node and a user side, for implementing the method of any one of claims 1 to 9.