CN112699080A

CN112699080A - High-speed multi-path network data migration method

Info

Publication number: CN112699080A
Application number: CN202110030467.0A
Authority: CN
Inventors: 邓金祥; 胡勇; 谢宗明; 王炜; 代先勇; 谷峰; 刘洋; 田晓东; 王念
Original assignee: Chengdu Shensi Science & Technology Co ltd
Current assignee: Chengdu Shensi Science & Technology Co ltd
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-04-23

Abstract

The invention relates to a high-speed multi-path network flow data migration method, which comprises the following steps: extracting an index file and configuring corresponding information according to an extraction rule condition by a system where an ES cluster needing data migration is logged in; equally dividing the extracted index files of the data to be migrated according to the available number of cluster servers, issuing the divided index file fragments to the data migration cluster, and allocating the index file fragment meanings to each server in the cluster by the cluster; each server extracts corresponding data from the original cluster according to the distributed index file fragments, receives the cluster server receiving the migrated data, resets the data offset position and stores the data offset position; and feeding back the state of the migration data after the migration data is completed. The invention has the advantages that: the indexes are distributed to different servers, and each server independently extracts original data, so that the data extraction speed is improved.

Description

High-speed multi-path network data migration method

Technical Field

The invention relates to the technical field of data processing, in particular to a high-speed multi-path network data migration method.

Background

With the continuous development of modern informatization and digitization technologies, more and more data need to be collected, the data generation speed is older and faster, and the stored data volume is doubled; when equipment needs to be upgraded or data needs to be backed up, the stored data needs to be migrated; in the traditional data migration, only files are copied, so that the problems of IO port use bottleneck, data reanalysis and creation after migration and the like can be met, and the efficiency is extremely low; therefore, a better and more efficient network traffic data migration solution is urgently needed.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a high-speed multi-path network traffic data migration method and overcomes the defects of the prior data migration method.

The purpose of the invention is realized by the following technical scheme: a high-speed multi-path network traffic data migration method comprises the following steps:

extracting an index file and configuring corresponding information according to an extraction rule condition by a system where an ES cluster needing data migration is logged in;

equally dividing the extracted index files of the data to be migrated according to the available number of cluster servers, issuing the divided index file fragments to the data migration cluster, and allocating the index file fragment meanings to each server in the cluster by the cluster;

each server extracts corresponding data from the original cluster according to the distributed index file fragments, receives the cluster server receiving the migrated data, resets the data offset position and stores the data offset position;

and feeding back the state of the migration data after the migration data is completed.

Further, the step of extracting the index file and configuring the corresponding information by the system where the ES cluster requiring data migration for logging in is located according to the extraction rule condition includes:

a user logs in a system where an ES cluster of data to be migrated is located, sets extraction rule conditions, executes extraction inquiry and extracts a traction file of the data to be migrated;

and configuring an ES cluster IP, a port and a login password which need to migrate in data, and verifying whether the configuration information is correctly input.

Further, the equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers includes:

dividing the index file according to a dividing mode of dividing the index dividing size into the index number/(the number of available servers + 1);

and traversing from the first fragment, and putting the remainder index files calculated according to the segmentation mode one by one until all the remainder index files are divided into the fragments.

Further, the extracting, by each server, corresponding data from the original cluster according to the distributed index file shards includes:

a1, reading the index file, and extracting secondary index data according to the offset of the index file;

a2, circularly reading secondary index data, and extracting original data according to secondary index offset;

a3, sequentially and respectively sending the original data, the secondary index and the primary index to a cluster server to which the data is migrated;

a4, after the data transmission is completed, reading the next index file, and repeating the steps A1-A3 until the index file is processed;

and A5, deleting the distributed index file, and returning the processing state to the current cluster system.

Further, the step of receiving the data by the cluster server receiving the immigration data, resetting the data offset position, and then saving includes:

receiving original data and storing the data;

receiving secondary index data, storing offset according to the original data, and modifying the secondary index offset;

receiving primary index data, storing the offset according to the secondary index, and modifying the offset of the primary index;

and saving the received data to the cluster file system.

The invention has the following advantages: a high-speed multipath network traffic data migration method distributes indexes to different servers, each server independently extracts original data, and the data extraction speed is improved; each server independently uploads a primary index, a secondary index and original data, so that the transmission speed is increased; CRC (cyclic redundancy check) is added, so that errors occurring in the transmission process are reduced, and migration failure is caused; after the data migration is completed, mapping association is automatically reestablished, and manual processing of the migrated data is not needed.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings, but the scope of the invention is not limited to the following.

As shown in fig. 1, the present invention relates to a high-speed multi-path network traffic data migration method, which implements a function of performing high-speed multi-path migration on an original network traffic data file with a built multi-stage ES index and hierarchical distributed storage. Specifically, the filtered index file can be distributed to different servers of the ES cluster according to different segments according to set migration conditions, each server independently extracts a secondary index file and original data, synchronously migrates the index file and the original data to a new ES cluster after extraction is completed, and modifies the mapping association of the original data and the index again after the new ES cluster receives all the data; the method specifically comprises the following steps:

s1, a user logs in a system where an ES cluster of data to be migrated is located, an extraction rule condition is set, the extraction rule condition can be a certain data mark, such as a fixed IP, an IP section, a fixed port, a port range, a home location, a protocol type and the like, and a multi-element combined data mark, the execution of extraction query is completed through the specified data mark, and finally an index file of the data to be migrated is extracted;

s2, configuring the ES cluster IP, the port and the login password of the data to be migrated, and verifying whether the configuration information is correctly input; and when the configuration information is verified to be correct, the subsequent processing flow can be carried out, and when the configuration information is verified to be wrong, the configuration information is prompted to be wrong, and the subsequent processing flow can be entered after the configuration information is input to be correct again.

S3, equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers, wherein the index file division algorithm is as follows:

S4, issuing the divided index file fragments to a data migration cluster, and distributing the index file fragments to each server in the cluster in a one-to-one correspondence manner by the cluster;

s5, each server extracts corresponding data from the original cluster according to the distributed index file fragments;

specifically, A1, reading an index file, and extracting secondary index data according to the offset of the index file;

S6, the cluster server receiving the migrated data receives the data, resets the data offset position and finally stores the data in the storage system;

specifically, receiving original data and storing the data;

and saving the received data to the cluster file system.

And S7, completing the migration of the data, and feeding back the state of the migration data.

The invention extracts data at high speed and in a multi-path concurrent manner, does not mutually occupy respective IO, CPU and memory, and can quickly transfer the extracted data to the server to be migrated into the cluster; the high-speed function is that only one extraction task is executed on a single server at the same time, and the data extraction is free of any external interference and high in speed; the multi-path function is that all servers in the cluster independently and simultaneously extract data, and independently transfer the extracted data to corresponding servers in the receiving cluster, and the extraction and the receiving servers form a corresponding relation, and all servers do not interfere with each other; the specific characteristic that the functions of respective IO, CPU and memory are not mutually occupied is that for extraction and reception, the cluster center only distributes tasks and does not participate in any transfer work of transferred data in the whole data transfer process. All servers in the extraction and receiving clusters are corresponding and independent, and the extraction, transmission and storage of data are in a point-to-point relationship; the function of rapidly transferring the extracted data to the cluster server to be migrated is specifically that the extraction server corresponds to the receiving server, and the data transmission adopts a point-to-point mode, so that the influence of a network middle layer on the transmission speed is greatly reduced. And the receiving server can carry out index remapping once when receiving the original files of a batch, thereby realizing the characteristic of using after the data extraction is finished.

The invention can extract data according to the conditions set by the user, the extracted data index file is distributed to each server receiving the data cluster service according to different segments, and then each server independently extracts the secondary index file and the original data from the target server and transfers the secondary index file and the original data to the server, thereby realizing high-speed multi-path concurrent extraction and migration of network flow data. And finally, after the data migration is finished, the new cluster reestablishes mapping association between the index and the original data, so that the whole data migration process is finished.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A high-speed multi-path network flow data migration method is characterized in that: the data migration method comprises the following steps:

2. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of extracting the index file and configuring corresponding information by the system where the ES cluster needing data migration is logged in according to the extraction rule condition comprises the following steps:

3. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers comprises the following steps:

4. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of extracting corresponding data from the original cluster by each server according to the distributed index file fragments comprises the following steps:

5. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of receiving the data by the cluster server receiving the migrated data, resetting the data offset position and then storing the data comprises the following steps:

receiving original data and storing the data;

and saving the received data to the cluster file system.