CN112699080A - High-speed multi-path network data migration method - Google Patents

High-speed multi-path network data migration method Download PDF

Info

Publication number
CN112699080A
CN112699080A CN202110030467.0A CN202110030467A CN112699080A CN 112699080 A CN112699080 A CN 112699080A CN 202110030467 A CN202110030467 A CN 202110030467A CN 112699080 A CN112699080 A CN 112699080A
Authority
CN
China
Prior art keywords
data
cluster
index
index file
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110030467.0A
Other languages
Chinese (zh)
Inventor
邓金祥
胡勇
谢宗明
王炜
代先勇
谷峰
刘洋
田晓东
王念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shensi Science & Technology Co ltd
Original Assignee
Chengdu Shensi Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shensi Science & Technology Co ltd filed Critical Chengdu Shensi Science & Technology Co ltd
Priority to CN202110030467.0A priority Critical patent/CN112699080A/en
Publication of CN112699080A publication Critical patent/CN112699080A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Abstract

The invention relates to a high-speed multi-path network flow data migration method, which comprises the following steps: extracting an index file and configuring corresponding information according to an extraction rule condition by a system where an ES cluster needing data migration is logged in; equally dividing the extracted index files of the data to be migrated according to the available number of cluster servers, issuing the divided index file fragments to the data migration cluster, and allocating the index file fragment meanings to each server in the cluster by the cluster; each server extracts corresponding data from the original cluster according to the distributed index file fragments, receives the cluster server receiving the migrated data, resets the data offset position and stores the data offset position; and feeding back the state of the migration data after the migration data is completed. The invention has the advantages that: the indexes are distributed to different servers, and each server independently extracts original data, so that the data extraction speed is improved.

Description

High-speed multi-path network data migration method
Technical Field
The invention relates to the technical field of data processing, in particular to a high-speed multi-path network data migration method.
Background
With the continuous development of modern informatization and digitization technologies, more and more data need to be collected, the data generation speed is older and faster, and the stored data volume is doubled; when equipment needs to be upgraded or data needs to be backed up, the stored data needs to be migrated; in the traditional data migration, only files are copied, so that the problems of IO port use bottleneck, data reanalysis and creation after migration and the like can be met, and the efficiency is extremely low; therefore, a better and more efficient network traffic data migration solution is urgently needed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a high-speed multi-path network traffic data migration method and overcomes the defects of the prior data migration method.
The purpose of the invention is realized by the following technical scheme: a high-speed multi-path network traffic data migration method comprises the following steps:
extracting an index file and configuring corresponding information according to an extraction rule condition by a system where an ES cluster needing data migration is logged in;
equally dividing the extracted index files of the data to be migrated according to the available number of cluster servers, issuing the divided index file fragments to the data migration cluster, and allocating the index file fragment meanings to each server in the cluster by the cluster;
each server extracts corresponding data from the original cluster according to the distributed index file fragments, receives the cluster server receiving the migrated data, resets the data offset position and stores the data offset position;
and feeding back the state of the migration data after the migration data is completed.
Further, the step of extracting the index file and configuring the corresponding information by the system where the ES cluster requiring data migration for logging in is located according to the extraction rule condition includes:
a user logs in a system where an ES cluster of data to be migrated is located, sets extraction rule conditions, executes extraction inquiry and extracts a traction file of the data to be migrated;
and configuring an ES cluster IP, a port and a login password which need to migrate in data, and verifying whether the configuration information is correctly input.
Further, the equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers includes:
dividing the index file according to a dividing mode of dividing the index dividing size into the index number/(the number of available servers + 1);
and traversing from the first fragment, and putting the remainder index files calculated according to the segmentation mode one by one until all the remainder index files are divided into the fragments.
Further, the extracting, by each server, corresponding data from the original cluster according to the distributed index file shards includes:
a1, reading the index file, and extracting secondary index data according to the offset of the index file;
a2, circularly reading secondary index data, and extracting original data according to secondary index offset;
a3, sequentially and respectively sending the original data, the secondary index and the primary index to a cluster server to which the data is migrated;
a4, after the data transmission is completed, reading the next index file, and repeating the steps A1-A3 until the index file is processed;
and A5, deleting the distributed index file, and returning the processing state to the current cluster system.
Further, the step of receiving the data by the cluster server receiving the immigration data, resetting the data offset position, and then saving includes:
receiving original data and storing the data;
receiving secondary index data, storing offset according to the original data, and modifying the secondary index offset;
receiving primary index data, storing the offset according to the secondary index, and modifying the offset of the primary index;
and saving the received data to the cluster file system.
The invention has the following advantages: a high-speed multipath network traffic data migration method distributes indexes to different servers, each server independently extracts original data, and the data extraction speed is improved; each server independently uploads a primary index, a secondary index and original data, so that the transmission speed is increased; CRC (cyclic redundancy check) is added, so that errors occurring in the transmission process are reduced, and migration failure is caused; after the data migration is completed, mapping association is automatically reestablished, and manual processing of the migrated data is not needed.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings, but the scope of the invention is not limited to the following.
As shown in fig. 1, the present invention relates to a high-speed multi-path network traffic data migration method, which implements a function of performing high-speed multi-path migration on an original network traffic data file with a built multi-stage ES index and hierarchical distributed storage. Specifically, the filtered index file can be distributed to different servers of the ES cluster according to different segments according to set migration conditions, each server independently extracts a secondary index file and original data, synchronously migrates the index file and the original data to a new ES cluster after extraction is completed, and modifies the mapping association of the original data and the index again after the new ES cluster receives all the data; the method specifically comprises the following steps:
s1, a user logs in a system where an ES cluster of data to be migrated is located, an extraction rule condition is set, the extraction rule condition can be a certain data mark, such as a fixed IP, an IP section, a fixed port, a port range, a home location, a protocol type and the like, and a multi-element combined data mark, the execution of extraction query is completed through the specified data mark, and finally an index file of the data to be migrated is extracted;
s2, configuring the ES cluster IP, the port and the login password of the data to be migrated, and verifying whether the configuration information is correctly input; and when the configuration information is verified to be correct, the subsequent processing flow can be carried out, and when the configuration information is verified to be wrong, the configuration information is prompted to be wrong, and the subsequent processing flow can be entered after the configuration information is input to be correct again.
S3, equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers, wherein the index file division algorithm is as follows:
dividing the index file according to a dividing mode of dividing the index dividing size into the index number/(the number of available servers + 1);
and traversing from the first fragment, and putting the remainder index files calculated according to the segmentation mode one by one until all the remainder index files are divided into the fragments.
S4, issuing the divided index file fragments to a data migration cluster, and distributing the index file fragments to each server in the cluster in a one-to-one correspondence manner by the cluster;
s5, each server extracts corresponding data from the original cluster according to the distributed index file fragments;
specifically, A1, reading an index file, and extracting secondary index data according to the offset of the index file;
a2, circularly reading secondary index data, and extracting original data according to secondary index offset;
a3, sequentially and respectively sending the original data, the secondary index and the primary index to a cluster server to which the data is migrated;
a4, after the data transmission is completed, reading the next index file, and repeating the steps A1-A3 until the index file is processed;
and A5, deleting the distributed index file, and returning the processing state to the current cluster system.
S6, the cluster server receiving the migrated data receives the data, resets the data offset position and finally stores the data in the storage system;
specifically, receiving original data and storing the data;
receiving secondary index data, storing offset according to the original data, and modifying the secondary index offset;
receiving primary index data, storing the offset according to the secondary index, and modifying the offset of the primary index;
and saving the received data to the cluster file system.
And S7, completing the migration of the data, and feeding back the state of the migration data.
The invention extracts data at high speed and in a multi-path concurrent manner, does not mutually occupy respective IO, CPU and memory, and can quickly transfer the extracted data to the server to be migrated into the cluster; the high-speed function is that only one extraction task is executed on a single server at the same time, and the data extraction is free of any external interference and high in speed; the multi-path function is that all servers in the cluster independently and simultaneously extract data, and independently transfer the extracted data to corresponding servers in the receiving cluster, and the extraction and the receiving servers form a corresponding relation, and all servers do not interfere with each other; the specific characteristic that the functions of respective IO, CPU and memory are not mutually occupied is that for extraction and reception, the cluster center only distributes tasks and does not participate in any transfer work of transferred data in the whole data transfer process. All servers in the extraction and receiving clusters are corresponding and independent, and the extraction, transmission and storage of data are in a point-to-point relationship; the function of rapidly transferring the extracted data to the cluster server to be migrated is specifically that the extraction server corresponds to the receiving server, and the data transmission adopts a point-to-point mode, so that the influence of a network middle layer on the transmission speed is greatly reduced. And the receiving server can carry out index remapping once when receiving the original files of a batch, thereby realizing the characteristic of using after the data extraction is finished.
The invention can extract data according to the conditions set by the user, the extracted data index file is distributed to each server receiving the data cluster service according to different segments, and then each server independently extracts the secondary index file and the original data from the target server and transfers the secondary index file and the original data to the server, thereby realizing high-speed multi-path concurrent extraction and migration of network flow data. And finally, after the data migration is finished, the new cluster reestablishes mapping association between the index and the original data, so that the whole data migration process is finished.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. A high-speed multi-path network flow data migration method is characterized in that: the data migration method comprises the following steps:
extracting an index file and configuring corresponding information according to an extraction rule condition by a system where an ES cluster needing data migration is logged in;
equally dividing the extracted index files of the data to be migrated according to the available number of cluster servers, issuing the divided index file fragments to the data migration cluster, and allocating the index file fragment meanings to each server in the cluster by the cluster;
each server extracts corresponding data from the original cluster according to the distributed index file fragments, receives the cluster server receiving the migrated data, resets the data offset position and stores the data offset position;
and feeding back the state of the migration data after the migration data is completed.
2. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of extracting the index file and configuring corresponding information by the system where the ES cluster needing data migration is logged in according to the extraction rule condition comprises the following steps:
a user logs in a system where an ES cluster of data to be migrated is located, sets extraction rule conditions, executes extraction inquiry and extracts a traction file of the data to be migrated;
and configuring an ES cluster IP, a port and a login password which need to migrate in data, and verifying whether the configuration information is correctly input.
3. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: equally dividing the extracted index files of the data to be migrated according to the available number of the cluster servers comprises the following steps:
dividing the index file according to a dividing mode of dividing the index dividing size into the index number/(the number of available servers + 1);
and traversing from the first fragment, and putting the remainder index files calculated according to the segmentation mode one by one until all the remainder index files are divided into the fragments.
4. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of extracting corresponding data from the original cluster by each server according to the distributed index file fragments comprises the following steps:
a1, reading the index file, and extracting secondary index data according to the offset of the index file;
a2, circularly reading secondary index data, and extracting original data according to secondary index offset;
a3, sequentially and respectively sending the original data, the secondary index and the primary index to a cluster server to which the data is migrated;
a4, after the data transmission is completed, reading the next index file, and repeating the steps A1-A3 until the index file is processed;
and A5, deleting the distributed index file, and returning the processing state to the current cluster system.
5. The method for migrating high-speed multi-path network traffic data according to claim 1, wherein: the step of receiving the data by the cluster server receiving the migrated data, resetting the data offset position and then storing the data comprises the following steps:
receiving original data and storing the data;
receiving secondary index data, storing offset according to the original data, and modifying the secondary index offset;
receiving primary index data, storing the offset according to the secondary index, and modifying the offset of the primary index;
and saving the received data to the cluster file system.
CN202110030467.0A 2021-01-11 2021-01-11 High-speed multi-path network data migration method Pending CN112699080A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110030467.0A CN112699080A (en) 2021-01-11 2021-01-11 High-speed multi-path network data migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110030467.0A CN112699080A (en) 2021-01-11 2021-01-11 High-speed multi-path network data migration method

Publications (1)

Publication Number Publication Date
CN112699080A true CN112699080A (en) 2021-04-23

Family

ID=75513733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110030467.0A Pending CN112699080A (en) 2021-01-11 2021-01-11 High-speed multi-path network data migration method

Country Status (1)

Country Link
CN (1) CN112699080A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282537A (en) * 2021-06-15 2021-08-20 成都深思科技有限公司 ES data migration system and migration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100070474A1 (en) * 2008-09-12 2010-03-18 Lad Kamleshkumar K Transferring or migrating portions of data objects, such as block-level data migration or chunk-based data migration
CN104348862A (en) * 2013-07-31 2015-02-11 华为技术有限公司 Data migration processing method, apparatus, and system
CN105205154A (en) * 2015-09-24 2015-12-30 浙江宇视科技有限公司 Data migration method and device
US20160253339A1 (en) * 2015-02-26 2016-09-01 Bittitan, Inc. Data migration systems and methods including archive migration
CN106973091A (en) * 2017-03-23 2017-07-21 中国工商银行股份有限公司 Distributed memory fast resampling method and system, main control server
CN111064789A (en) * 2019-12-18 2020-04-24 北京三快在线科技有限公司 Data migration method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100070474A1 (en) * 2008-09-12 2010-03-18 Lad Kamleshkumar K Transferring or migrating portions of data objects, such as block-level data migration or chunk-based data migration
CN104348862A (en) * 2013-07-31 2015-02-11 华为技术有限公司 Data migration processing method, apparatus, and system
US20160253339A1 (en) * 2015-02-26 2016-09-01 Bittitan, Inc. Data migration systems and methods including archive migration
CN105205154A (en) * 2015-09-24 2015-12-30 浙江宇视科技有限公司 Data migration method and device
CN106973091A (en) * 2017-03-23 2017-07-21 中国工商银行股份有限公司 Distributed memory fast resampling method and system, main control server
CN111064789A (en) * 2019-12-18 2020-04-24 北京三快在线科技有限公司 Data migration method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282537A (en) * 2021-06-15 2021-08-20 成都深思科技有限公司 ES data migration system and migration method

Similar Documents

Publication Publication Date Title
US9830221B2 (en) Restoration of erasure-coded data via data shuttle in distributed storage system
CN102209087B (en) Method and system for MapReduce data transmission in data center having SAN
CN110555012B (en) Data migration method and device
US7509322B2 (en) Aggregated lock management for locking aggregated files in a switched file system
US20150149819A1 (en) Parity chunk operating method and data server apparatus for supporting the same in distributed raid system
US20040133607A1 (en) Metadata based file switch and switched file system
CN109327332B (en) LIO-based iSCSI GateWay high-availability implementation method under Ceph cloud storage
US10712964B2 (en) Pre-forking replicas for efficient scaling of a distributed data storage system
WO2008019952A2 (en) Storage management system for preserving consistency of remote copy data
US11068537B1 (en) Partition segmenting in a distributed time-series database
CN109918021B (en) Data processing method and device
CN109271249B (en) Cloud container pre-copy online migration method based on P.haul framework
CN112699080A (en) High-speed multi-path network data migration method
EP3261302B1 (en) Storage network element discovery method and device
CN105354110B (en) Cloud Server data back up method and device
CN105791337B (en) A kind of upgrade method, equipment and group system
US11079960B2 (en) Object storage system with priority meta object replication
CN107908713A (en) A kind of distributed dynamic cuckoo filtration system and its filter method based on Redis clusters
CN1243431C (en) Analysis of universal route platform command lines
US10628242B1 (en) Message stream processor microbatching
CN106855869B (en) Method, device and system for realizing high availability of database
CN110162511B (en) Log transmission method and related equipment
US8997124B2 (en) Method for updating data in a distributed data storage system
US11093465B2 (en) Object storage system with versioned meta objects
Kasu et al. DLFT: Data and layout aware fault tolerance framework for big data transfer systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210423