CN113535463A

CN113535463A - Data recovery method and device

Info

Publication number: CN113535463A
Application number: CN202010289275.7A
Authority: CN
Inventors: 高帅; 陈加怀; 陈俊杰; 周敏均
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2021-10-22
Also published as: WO2021208585A1

Abstract

The application relates to the technical field of communication, and discloses a data recovery method and device, which are used for improving the safety of data recovery. When determining that a first data device in a data switching network is invalid, a controller may select a second data device, and add the second data device to a first check group in which the first data device is located, where multiple data devices of the first check group belong to the same logical network. The first data device stores first data therein. The controller may notify other data devices of the first parity group to send data for repair of the first data, and may further cause the second data device to obtain the repaired first data. Because the second data equipment is added into the first check group, the data equipment in the first check group belongs to the same logic network, and the network security can be enhanced.

Description

Data recovery method and device

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to a method and a device for data recovery.

Background

The edge device or the edge computing is adopted, so that the speed of processing the terminal data by the cloud data center is improved, and the edge device (device) can process the terminal data and can respond to the user service more quickly.

The edge device processes and stores the acquired data, and sends the processed data to a cloud data center (cloud data center) through a switch device (switch) and a network.

When the edge device fails or the switching device connected to the edge device fails, and the data stored in the failed edge device cannot be used, and at this time, the data in the failed edge device needs to be recovered. How to improve the security of the edge device in data recovery is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a method and a device for data recovery, which are used for improving the safety of data recovery.

In a first aspect, a method for data recovery is provided, which may be applied to a data switching network including a controller and a plurality of data devices, where data is stored in the data devices. The controller may detect whether a data device in the data switching network fails, and when it is determined that a first data device in the data switching network fails, may select a second data device, add the second data device to a first check group in which the first data device is located, where a plurality of data devices in the first check group belong to a same logical network, and specifically, fault-tolerant data stream ports in the plurality of data devices in the first check group belong to a same logical network. The first data device stores first data therein. The controller may notify other data devices of the first parity group to send data for repair of the first data, and may further cause the second data device to obtain the repaired first data. When the first data equipment in the first check group fails, the second data equipment is added into the first check group, and the data equipment in the first check group belongs to the same logic network, so that the network security can be enhanced.

In one possible implementation, the data devices in a check group are divided into a logical network. The data in different logical networks are isolated from each other during transmission, that is, the data device in one logical network cannot directly communicate with the data device in another logical network, so that the network security can be further enhanced. By setting the logic network, the broadcast domain can be limited, the bandwidth is saved, and the network processing capacity is improved. The robustness of the network can be improved, and the fault is limited in one logic network and cannot affect the normal work of other logic networks.

In one possible implementation, before selecting the second data device, the controller may further determine whether the first data device can be recovered according to states of other data devices in the first check group in which the first data device is located. And when the first data can be recovered, selecting a second data device for data recovery. The selection of the second data device may not be required if it is determined that the first data cannot be recovered. Specifically, when more than a set number of data devices in the plurality of data devices of the first check group fail, the first data cannot be recovered.

In a possible implementation, the data switching network may further include a plurality of switching devices, each switching device is connected to at least one data device, and different data devices of the first check group are connected to different switching devices. If a plurality of data devices of a check group are connected to a switching device, when the switching device fails, the data on the plurality of data devices in the check group all fail, and data recovery cannot be performed. If different data devices belonging to the same check group are connected with different switching devices, even if one switching device fails, only one data device in the check group is affected, and the data in the failed data device can be conveniently recovered.

In one possible implementation, the controller, when selecting the second data device, first selects the second data device from among data devices located in the same edge domain as the first data device. Then, when the second data device cannot be selected from among the data devices located in the same edge domain as the first data device, the controller may select, as the second data device, a data device having the smallest network hop count from among the data devices located in different edge domains from the first data device. Compared with different edge domains, the network condition in one edge domain is better, the bandwidth is larger, and the cost for the global network can be reduced by selecting the second data equipment in the same edge domain.

In one possible implementation, when the controller selects the second edge device among the data devices located in the same edge domain as the first data device, the controller may first select the second data device among the data devices located in different switching devices from the first data device. Then, when a second data device cannot be selected among data devices located under a different switching device from the first data device, the controller may select the second data device among data devices located under the same switching device as the first data device. Therefore, the data equipment in one check group can be connected to different switching equipment as far as possible, even if one switching equipment fails, only a small number of data equipment in the check group is affected, and the data in the failed data equipment can be conveniently recovered.

In a possible implementation, the controller may select an idle data device as the second data device first when selecting the second edge device among data devices located in the same edge domain as the first data device, and may select a detachable data device as the second data device when there is no idle data device.

In a possible implementation, when the controller selects, as the second edge device, a data device with the smallest network hop count from among data devices located in different edge domains from the first data device, if there are a plurality of data devices with the smallest network hop count, an idle data device may be selected first from among the plurality of data devices with the smallest network hop count as the second data device, and when there is no idle data device, a detachable data device may be selected as the second data device.

In a possible implementation, when the controller selects a data device with the smallest network hop count as the second edge device among data devices located in different edge domains from the first data device, if there are a plurality of data devices with the smallest network hop count, the data device under the switching device with the smallest network load or the aggregation switching device among the plurality of data devices with the smallest network hop count may be the second data device first.

In a possible implementation, when the controller adds the second data device to the first check group in which the first data device is located, the controller may notify the second data device to modify the port. Specifically, the controller may notify the switching device connected to the second data device to modify a port, so that the second data device joins the first check group in which the first data device is located.

In a possible implementation, when a first data device fails but a switching device connected to the first data device does not fail, the controller may further delete the first data device from a first check group in which the first data device is located, and specifically, the controller may notify the switching device connected to the first data device to modify a port, so that the first data device is deleted from the first check group. If the switching device connected with the first data device fails to cause data failure in the first data device, the controller may send an instruction to all devices (including data devices, other switching devices, aggregation switching devices, etc.) connected with the failed switching device, indicating that data interaction is not performed with the failed switching device.

In one possible implementation, three types of ports are provided in one data device, and one type is a fault-tolerant data stream port, i.e., a port for data recovery in the data device. The other is a service data stream port, namely a port for transmitting data to the cloud data center in the data equipment. Yet another is a data source stream port, i.e., a port in a data device that obtains data from a data source. When the port is modified, at least one of the three types of ports may be modified. The selected second data equipment is used for replacing the original first data equipment, carrying out data fault tolerance, carrying out service flow service and acquiring data source data.

In one possible implementation, the fault tolerant data stream port, the traffic stream port, and the data source stream port in one data device belong to different logical networks. Therefore, isolation of the service data stream, the fault-tolerant data stream and the data source stream can be realized, the safety of data recovery is further improved, the safety of data transmission to a cloud data center is also improved, and the safety of data source acquisition is also improved. The fault tolerant data stream port and/or the traffic stream port and/or the data source stream port may be a virtual port or a physical port.

In one possible implementation, the logical network may include, but is not limited to, a network partition by any of the following: virtual Private Network (VPN), Virtual Local Area Network (VLAN), virtual extensible LAN (VXLAN), Generic Routing Encapsulation (GRE).

In a second aspect, there is provided a communication device having functionality to implement the first aspect and any possible implementation of the first aspect. These functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more functional modules corresponding to the above functions.

In one possible implementation, the apparatus includes:

a determining module, configured to determine that a first data device in the data switching network fails, where the first data device stores first data;

a selection module for selecting a second data device;

an adding module, configured to add the second data device to a first check group in which the first data device is located, where multiple data devices of the first check group belong to a same logical network;

a notification module, configured to notify other data devices of the first check group to send data for repairing the first data, so that the second data device obtains the repaired first data.

In a possible implementation, the apparatus is applied to a data switching network including the apparatus, a plurality of data devices, and a plurality of switching devices, each of the switching devices being connected to at least one of the data devices, different data devices of the first parity group being connected to different switching devices.

In a possible implementation, the selecting module, when being configured to select the second data device, is specifically configured to: selecting the second data device among data devices located in the same edge domain as the first data device.

In a possible implementation, the selecting module, when being configured to select the second data device, is specifically configured to: and when the second data equipment cannot be selected from the data equipment located in the same edge domain as the first data equipment, selecting the data equipment with the minimum network hop count from the data equipment located in the edge domain different from the first data equipment as the second data equipment.

In a possible implementation, when the adding module is configured to add the second data device to the first check group in which the first data device is located, specifically, the adding module is configured to: and informing the second data equipment of modifying the port so that the second data equipment is added into the first check group in which the first data equipment is positioned.

In one possible implementation, the port of the apparatus comprises: a fault tolerant data stream port.

In one possible implementation, the port of the apparatus further comprises: a traffic data stream port and/or a data source stream port, wherein the fault-tolerant data stream port, the traffic data stream port and the data source stream port belong to different logical networks.

In one possible implementation, the logical network is obtained by network partitioning in any one of the following ways: VPN, VLAN, virtual extensible LAN VXLAN, and generic routing encapsulation GRE.

In a third aspect, a communication device is provided, and the device may be the controller in the above method embodiment, or a chip disposed in the controller. The device comprises a transceiver, a processor and optionally a memory. Wherein the memory is adapted to store a computer program or instructions, and the processor is coupled to the memory and the transceiver, respectively, and when the processor executes the computer program or instructions, the apparatus is adapted to perform the method performed by the controller in any of the above-mentioned first aspect and possible implementations of the first aspect via the transceiver.

In a fourth aspect, there is provided a computer program product comprising: computer program code for causing a computer to perform the method performed by the controller in any of the above described first aspect and possible implementations of the first aspect when said computer program code is run on a computer.

In a fifth aspect, the present application provides a chip system, which includes a processor and a memory, wherein the processor and the memory are electrically coupled; the memory to store computer program instructions; the processor is configured to execute part or all of the computer program instructions in the memory, and when the part or all of the computer program instructions are executed, the processor is configured to implement the functions of the controller in the method according to any one of the foregoing first aspect and the first possible implementation of the first aspect.

In one possible design, the chip system may further include a transceiver configured to transmit a signal processed by the processor or receive a signal input to the processor. The chip system may be formed by a chip, or may include a chip and other discrete devices.

A sixth aspect provides a computer-readable storage medium storing a computer program which, when executed, performs the method performed by the controller in any one of the above-mentioned first aspect and the first possible implementation.

In a seventh aspect, a system for communication is provided, the system comprising: a controller for performing the method of the first aspect as well as any possible implementation of the first aspect, and a plurality of data devices. And the switching equipment is connected with the data equipment.

Drawings

Fig. 1 is a diagram of a communication architecture provided in an embodiment of the present application;

fig. 2 is a schematic diagram of check group and logical network partitioning provided in an embodiment of the present application;

fig. 3 is a schematic flow chart of data recovery provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of selecting a second data device according to an embodiment of the present application;

FIGS. 5, 6 and 7 are schematic diagrams of a selection data device provided in an embodiment of the present application;

fig. 8 is a structural diagram of a communication apparatus provided in an embodiment of the present application;

fig. 9 is a structural diagram of a communication apparatus provided in an embodiment of the present application.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In order to facilitate understanding of the embodiments of the present application, some terms of the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

1) A terminal, also referred to as User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), etc., is a device that provides voice and/or data connectivity to a user. For example, the terminal device includes a handheld device, an in-vehicle device, an internet of things device, and the like having a wireless connection function. Currently, the terminal device may be: a mobile phone (mobile phone), a tablet computer, a notebook computer, a palm top computer, a Mobile Internet Device (MID), a wearable device, a Virtual Reality (VR) device, an Augmented Reality (AR) device, a wireless terminal in industrial control (industrial control), a wireless terminal in self driving (self driving), a wireless terminal in remote surgery (remote medical supply), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation safety (smart security), a wireless terminal in city (smart city), a wireless terminal in smart home (smart home), and the like.

2) And the edge domain refers to an area close to an object or a data source side relative to the cloud data center. The edge area can be large or small, and generally has the following characteristics: within the edge domain there are one or more local area networks that are accessed through one or several edge routers (access routers) to a network, such as a wide area network, a private network, a public network, a cellular network, a satellite network, etc. And the equipment in the edge domain interacts with the cloud data center through the exchange equipment and the wide area network. Devices in the edge domain are interconnected through networks in the domain (such as wireless network, wifi, electric network, cellular network, etc.), and the network quality in the domain (network indexes such as available bandwidth, time delay, packet loss, jitter, etc.) is better than that of the wide area network. Data sources (e.g., sensors, cameras, etc.) within the edge domain are typically managed by devices within the domain.

3) The edge device mainly faces to an edge domain, processes data acquired from a data source, wherein the data source comprises intelligent reconstruction of a network CAMERA (IP Camera, IPC), face gate, intelligent business surpassing, mobile control, various light senses, heat senses, electromagnetic induction lamp sensors and the like. The edge device and the cloud data center form an end cloud cooperation solution, independent software developers (ISVs) can conveniently deploy services and algorithms of the edge device and the cloud data center on the side, and end-to-end service delivery is rapidly achieved.

4) Edge computing (edge computing): edge computing refers to providing a nearest-end service on the side close to an object or a data source by adopting edge equipment with integrated network, computing, storage and application core capabilities. The application program is initiated at the edge side, so that a faster network service response is generated, and the basic requirements of the industry in the aspects of real-time business, application intelligence, safety, privacy protection and the like are met. The edge device or the edge computing is mainly used for solving the problems of high time delay and low bandwidth when the cloud data center processes the data of the terminal device. The purpose of responding to the user service more quickly is achieved by processing the data to a certain extent at the edge side.

5) Redundant Array of Independent Disks (RAID): the disk array is formed by combining a plurality of independent disks into a disk group with huge capacity, data is divided into a plurality of blocks and stored in different disks respectively, and when data of one part of the disks is lost, the data in the rest of the disks can be used for recovery. In this application, if the disk of the data device is used to store data, the disk of the data device may be regarded as a disk in the redundant array of disks.

6) Erasure Coding (EC): the erasure code technology mainly encodes original data through an erasure code algorithm to obtain redundant data, and stores the original data and the redundant data together to achieve the purpose of fault tolerance. The basic idea is to obtain m blocks of redundant data (check data) by calculating k blocks of original data to a certain extent. For the k + m blocks of data, when no more than m blocks of data are lost (the lost data may be original data or redundant data), the lost data can be recovered through a corresponding reconstruction algorithm. The process of generating redundant data/check data is called encoding (encoding), and the process of recovering lost data is called decoding (decoding). The disk utilization is k/(k + m). Compared with a multi-copy storage method, the erasure code method has the advantages of low redundancy, high disk utilization rate and the like. In general, k is an integer of 2 or more, and m is an integer of 1 or more. For example, the original data are d1, d2 and d3, the redundant data are y1 ═ d1+ d2+ d3, and a total of 4 blocks of data are stored, namely d1, d2, d3 and y 1. If d1 is lost, it can be recovered by d1 ═ y1-d2-d 3; if y1 is lost, it can be recovered by y1 ═ d1+ d2+ d 3. For example, the original data are d1 and d2, the redundant data are data obtained by XOR between d1 and d2, 3 blocks of data are stored together, and when one data is lost, XOR can be performed by the remaining two data to obtain the lost data.

7) The basic idea of the Regeneration Code (RC) is to calculate k + m check data according to k blocks of original data, where the k + m check data are different from the original data, and store the k + m check data respectively, so as to achieve the purpose of fault tolerance. In general, k is an integer of 2 or more, and m is an integer of 1 or more.

8) The logical network is one or more logical networks simulated on the physical network based on the network virtualization technology. Common network virtualization techniques include: virtual Private Network (VPN), Virtual Local Area Network (VLAN), virtual extensible LAN (VXLAN), Generic Routing Encapsulation (GRE), virtual network device, etc. From the perspective of users and devices in the network, the virtualized logical network is consistent with the physical network experience. For example, a Virtual Local Area Network (VLAN) is a group of logical devices and users, which are not limited by physical locations and can be organized according to functions, departments, applications, and other factors, and communicate with each other as if they are in the same network segment, so that the VLAN is named as a virtual lan. From the perspective of security, the security of the devices in the logical network is improved because the networks are isolated from each other.

9) Software Defined Network (SDN) is an implementation of network virtualization. The core technology OpenFlow separates the control plane and the data plane of the network equipment, thereby realizing the flexible control of network flow, enabling the network to be more intelligent as a pipeline, and providing a good platform for the innovation of a core network and application. The essence of the SDN is network software, the network programmability is improved, and the SDN is a reconstruction of a network architecture instead of a new characteristic and a new function. SDN will achieve various functional characteristics better, faster, and simpler than the original network architecture.

10) Network hop count: how many nodes the message has passed through for forwarding. In this application, a node may be a data device, a switching device, or an aggregation switching device, or a switching device in a wide area network, such as a secondary router or a network device.

"and/or" in the present application, describing an association relationship of associated objects, means that there may be three relationships, for example, a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The plural in the present application means two or more.

In the description of the present application, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance, nor order.

In addition, in the embodiments of the present application, the word "exemplary" is used to mean serving as an example, instance, or illustration. Any embodiment or implementation described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or implementations. Rather, the term using examples is intended to present concepts in a concrete fashion.

The technical scheme of the embodiment of the application can be applied to various communication systems, for example: long Term Evolution (LTE) systems, Worldwide Interoperability for Microwave Access (WiMAX) communication systems, future fifth Generation (5G) systems, such as new radio access technology (NR), future communication systems, and the like.

For convenience of understanding of the embodiment of the present application, an application scenario of the present application is introduced next, where the service scenario described in the embodiment of the present application is for more clearly explaining the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and it is known by a person skilled in the art that with the occurrence of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

As shown in the architecture diagram of fig. 1, a controller, such as an SDN controller, is disposed in a cloud data center (cloud). The cloud data center may include a public cloud, a private cloud, and the like. The edge domain edge may be a physical area such as an intersection, a street, etc. The edge side comprises a data device and a switch device. Since the data device is located in the edge domain, it may also be referred to as an edge device. One switch device is connected with at least one edge device, and the switch device is used for accessing the edge device to the network for data transmission. The edge device is deployed in an edge domain close to a data source, and the edge device may be a small intelligent station, a Network Video Recorder (NVR), or other terminal device. The data source can be a sensor, a camera and the like. The edge device may store or process data collected from the data source for subsequent querying. The edge device can also send the acquired data or the processed data to the cloud data center through the switch and the network. Optionally, the system may further include a convergence switch spine switch connected to the switch spine, where the convergence switch spine is used for data stream convergence in an edge domain edge. The convergence switching equipment is connected with the data center through a network.

In order to avoid the problem that data stored on one data device is easy to lose, the method and the device set a check group, wherein one check group comprises a plurality of data devices, and the data are stored on the plurality of data devices of one check group. When the data stored in one data device is lost, the data can be recovered according to the data stored in other data devices in the check group, so that the data storage reliability is improved.

The number of data devices included in a parity group may depend on the manner in which the data is verified/recovered. For example, the data may be copied into at least two copies, one check group includes at least two data devices, and each data device may store one copy of the data. And for example, data verification/recovery is performed by using an erasure code EC, where a verification group includes m + k — n data devices, where k data devices store one original data, and m data devices store verification data. When the data in the i data devices is not lost, the lost data in the i data devices can be recovered by checking the data stored in other data devices in the group. Note that i is not greater than m. When i is larger than m, the recovery of data cannot be completed because more data is lost. And then, for example, data verification/recovery is performed by using a regenerated code RC, where one verification group includes n data devices, where n data devices respectively store one piece of verification data. When the data in the i data devices is not lost, the lost data in the i data devices can be recovered by checking the data stored in other data devices in the group. It is also desirable that i is no greater than m. When i is larger than m, the recovery of data cannot be completed because more data is lost.

When data storage and recovery are performed, a data controller can be arranged for management, and the data controller for performing data storage and recovery management is not equal to the controller of the cloud data center. When data storage is performed, the data device may send data to the data controller, perform redundant data calculation by the data controller, such as data replication, or calculate k + m blocks of data from k blocks of data and send the data to a plurality of data devices of a check group to implement data redundant backup. Accordingly, when data recovery is performed, the data device also needs to send data that is not lost to the data controller, and the data controller performs data recovery management. In one example, the data controller may be deployed on a switching device or an edge device or other device, and the data device sends data to a device, which may be referred to as centralized management. In another example, the data controller may be deployed on multiple devices, such as on multiple data devices. Data devices need to send data to multiple devices, which may be referred to as distributed management.

Typically, data devices may store both raw data and verification data. For example, as shown in table 1, one check group includes a, b, c, d, 4 data devices. The data devices a, b, c may store the original data 1, 2, 3, and the data device d stores the verification data p1 verified by the original data 1, 2, 3. The data devices a, b, d store the original data 4, 5, 6, respectively, and the data device c stores the verification data p2 verified by the original data 4, 5, 6. The data devices a, c, d store the original data 7, 8, 9, respectively, and the data device b stores the verification data p3 verified by the original data 7, 8, 9. The data devices b, c, d store original data 10, 11, 12, respectively, and the data device a stores verification data p4 verified by the original data 10, 11, 12.

	Device a	Device b	Device c	Device d
					Data of	1	2	3	p1
Data of	4	5	p2	6
					Data of	7	p3	8	9
Data of	p4	10	11	12

TABLE 1

The flow of check data between different data devices is very large, and the transmission of check data through the wide area network and the cellular network may result in that the network bandwidth, the overhead, and the cost cannot be borne, for example, in table 1, each data device receives data from other 3 data devices, and the data is very large. When considering which data devices form a check group, if a plurality of data devices are randomly selected to form a check group, normal check may not be achieved. Based on the method, a plurality of data devices in the same edge domain can be selected to form a check group so as to reduce the overall network overhead, the internal network bandwidth of the same edge domain is large, the network environment is good, and normal check can be realized. In addition, in the same edge domain, data devices under different switching devices are selected as much as possible to form a check group. Therefore, the data equipment in one check group can be connected to different switching equipment as far as possible, even if one switching equipment fails, only a small number of data equipment in the check group is affected, and the data in the affected data equipment can be conveniently recovered.

It should be noted that the data device may store data by using a disk in the data device, and at this time, the entire disk in the data device may be regarded as one disk in the RAID. The JBOD, JBOF and the like can also be adopted to organize the blocks of the disks in the data equipment. The data device may use a magnetic disk to store data, or may use a memory, a memory card, a memory space, etc. to store data.

A process for forming a check group is described as follows:

printing a two-dimensional label on each data device: (data device number, switch device number) to identify which switch device this data device is connected to. For example, (a1, a) represents data device a1 connected with switch device a. These two-dimensional tags constitute a Set _ idle Set. Referring to the example of fig. 2, the Set of devices included in the edge domain Set _ idle { (a1, a), (a2, a), (B1, B), (B2, B), (C1, C), (C2, C), (D1, D), (D2, D) }.

And determining a data checking mode, namely a data recovery mode. And determining the number of the data devices in one check group according to the data check mode. This process is described above and will not be described further herein. Referring to the example of fig. 2, a check group includes 3 data devices, and may select 3 tags from the Set _ idle, and first select tags of different switching devices to form a check group. For example, the Set _0 of one check group { (a1, a), (B1, B), (C1, C) }, and the Set _1 of another check group { (B2, B), (C2, C), (D1, D) }. The 3 data devices are formed into a check group, and the check data flow is interacted through the exchange device or the convergence exchange device. The load brought by the check can be evenly distributed in 4 switching devices. The remaining two tags (D2, D), (a2, a) can no longer form a parity group and can be idle first. Data devices that do not belong to any check group may be defined as idle data devices. For example, data device a1 and data device d2 in FIG. 2 are idle data devices. The idle data devices that are not divided into check groups may replace failed data devices to achieve data recovery when other data devices fail. In general, there is no idle data device, and the idle data device may be placed in a check group and operated first to disperse the service pressure. And subsequently, the data equipment is split from the check group to replace the failed data equipment so as to realize data recovery.

If multiple data devices in a check group are in a public network, data transmission may cause security problems when data redundancy backup and data recovery are performed. To solve this problem, the present application proposes to partition the data devices in a check group into a logical network. One or more check groups may be included in a logical network, and data devices in a check group must be in a logical network. The logical network may be obtained by network division through techniques such as vpn, vlan, vxlan, GRE, and the like. The data in different logical networks are isolated during transmission, that is, the data device in one logical network cannot directly communicate with the data device in another logical network, so that the network security can be enhanced. By setting the logic network, the broadcast domain can be limited, the bandwidth is saved, and the network processing capacity is improved. The robustness of the network can be improved, and the fault is limited in one logic network and cannot affect the normal work of other logic networks.

In the example of fig. 2, one vlan includes one check group, the check group Set _0 is divided into the vlan10, and the check group Set _1 is divided into the vlan 20.

It is further noted that the number of data devices included in a check group is determined according to a check algorithm. For example, when data is checked by using erasure code EC, at least 2+1 to 3 data devices should be included in the check group. At this time, the minimum number of data devices corresponding to the check algorithm may be selected to form a check group, for example, 3 data devices may be selected to form a check group. The minimum number of data devices corresponding to the check algorithm may also be selected to form a check group, for example, 4 or more data devices may be selected to form a check group. One or more data devices beyond the minimum number can be disassembled, and when the data devices of other check groups fail, the failed data devices in other check groups are replaced, so that data recovery is realized. In the present application, one or more data devices that exceed the minimum number corresponding to the check group are defined as detachable data devices.

In this application, a two-dimensional label may be printed on a data source (e.g., a sensor or a camera) corresponding to each data device: (data source header number, data device number) to indicate to which data device the data in the data source is connected. For example, Sensor1 is connected to data device a1, Sensor2 is connected to data device b1, Sensor3 is connected to data device c1, Sensor4 is connected to data device b2, Sensor5 is connected to data device c2, and Sensor6 is connected to data device d 1. For example: (Sensor1, a1), (Sensor2, b1), (Sensor3, c1), (Sensor4, b2), (Sensor5, c2), (Sensor6, d 1). When the data device fails, the data source connected to the data device needs to transmit data to the data device that has not failed, that is, the data device connected to the data source changes, and then the two-dimensional tag of the data source also needs to be modified.

It should be noted that three types of ports are provided on one data device, and one type of port is a fault-tolerant data stream port, i.e., a port for data verification/recovery in the data device. The other is a service data stream port, namely a port for transmitting data to the cloud data center in the data equipment. Yet another is a data source stream port, i.e., a port in a data device that obtains data from a data source. When the port is modified, at least one of the three types of ports may be modified. The selected second data equipment is used for replacing the original first data equipment, carrying out data fault tolerance, carrying out service flow service and acquiring data source data. The fault-tolerant data stream port and the service data stream port and the data source stream port in one data device are not in the same logic network. Therefore, the isolation of the service data stream, the fault-tolerant data stream and the data source stream can be realized, the safety of data recovery is further improved, the safety of data transmission to a cloud data center is also improved, and the safety of data source acquisition is also improved. The fault tolerant data stream port and/or the traffic stream port and/or the data source stream port may be a virtual port or a physical port. For example, in fig. 2, data device a1, data device b1, and data device c1 are divided into logical networks of vlan 10. The partitioning of data device b2, data device c2, and data device d1 into the logical network of vlan20 may be for fault tolerant data stream ports in the data devices.

The above process of dividing the check group and the logical network may be performed by a controller of the cloud data center, or may be performed by one device in the edge domain, for example, may be a switching device, or may be a data device. The device may be managed in a centralized manner or in a distributed manner.

The data recovery process will be described in detail below with reference to the accompanying drawings.

As shown in fig. 3, a process diagram of data recovery is provided. The method is applied to a data switching network comprising a controller and a plurality of data devices; the data device may be an edge device in the edge domain in fig. 1. The controller may be a controller in the cloud data center in fig. 1. Optionally, the data switching network is a network architecture as described in fig. 1.

Step 301: the controller detects whether each data device in said data switching network has failed and may perform step 303 when a failure of the first data device is detected.

When the first data device fails, the first data device may fail, or a switching device connected to the first data device may fail. The first data device belongs to a first check group. The first data device stores first data therein. The first data may be data acquired from a data source by the first data device, or may be data obtained by processing the data acquired from the data source. The number of the failed first data devices in one check group may be one or more.

Optionally, the controller executes step 303: step 302 may also be performed before the second data device is selected.

Step 302: and determining whether the first data can be recovered according to the states of other data devices in the first check group in which the first data device is positioned. Upon determining that the first data can be recovered, step 303 is performed again. If it is determined that the first data cannot be recovered, it may be ended without performing step 303. Specifically, when more than a set number of data devices in the plurality of data devices in the first check group fail, the first data cannot be recovered.

Step 303: the controller selects a second data device.

When a failure of a first data device is detected, the first check group lacks a data device for data storage to the controller. The controller may select a second data device to be added to the first parity group to replace the failed first data device to store data, so as to implement data recovery. When the number of the failed first data devices is plural, the number of the second data devices corresponding to the failed first data devices is selected. The process of selecting each second data device is the same, and the following description will take the example of selecting one second data device as an example.

When the controller selects the second data device, the data device with the least network hop number can be selected to limit data recovery in a local network, so that network overhead can be reduced. The data device with the least network hops is: and the data equipment with the smallest sum of network hops away from the data equipment which is not failed in the first check group. The network hop count can be calculated using the following formula:

Y＝X1+X2+…+Xn-1。

in one example, X1 represents the network hop count of the selected second data device to the 1 st non-failed data device within the first parity group, … …, Xn-1 represents the network hop count of the selected second data device to the n-1 st non-failed data device within the first parity group, and Y represents the sum of the network hop counts of the selected second data device to each of the remaining non-failed data devices within the first parity group.

In another example, X1 represents the number of network hops of the selected second data device connected switch device to the 1 st non-failed data device within the first parity group, … …, Xn-1 represents the number of network hops of the selected second data device connected switch device to the n-1 st non-failed data device within the first parity group, and Y represents the sum of the number of network hops of the selected second data device connected switch device to each of the remaining non-failed data devices within the first parity group.

In another example, X1 represents the network hop count of the selected second data device connected aggregation switch device to the 1 st non-failed data device within the first check group, … …, Xn-1 represents the network hop count of the selected second data device connected aggregation switch device to the n-1 st non-failed data device within the first check group, and Y represents the sum of the network hop counts of the selected second data device connected aggregation switch device to each of the remaining non-failed data devices within the first check group.

In another example, X1 represents the network hop count of the selected second data device to the 1 st non-failed data device connected switch device within the first parity group, … …, Xn-1 represents the network hop count of the selected second data device to the n-1 st non-failed data device connected switch device within the first parity group, and Y represents the sum of the network hop count of the selected second data device to each of the remaining non-failed data device connected switch devices within the first parity group.

In another example, X1 represents the network hop count of the selected second data device to the 1 st non-failed data device connected aggregation switch device within the first check group, … …, Xn-1 represents the network hop count of the selected second data device to the n-1 st non-failed data device connected aggregation switch device within the first check group, and Y represents the sum of the network hop count of the selected second data device to each of the remaining non-failed data device connected aggregation switch devices within the first check group.

In another example, X1 represents the network hop count of the selected second data device connected switch device to the 1 st non-failed data device connected switch device within the first parity group, … …, Xn-1 represents the network hop count of the selected second data device connected switch device to the n-1 st non-failed data device connected switch device within the first parity group, and Y represents the sum of the network hop counts of the selected second data device connected switch device to each of the remaining non-failed data device connected switch devices within the first parity group.

In another example, X1 represents the network hop count of the selected second data device connected switch device to the 1 st non-failed data device connected aggregation switch device within the first check group, … …, Xn-1 represents the network hop count of the selected second data device connected switch device to the n-1 st non-failed data device connected aggregation switch device within the first check group, and Y represents the sum of the network hop counts of the selected second data device connected switch device to each of the remaining non-failed data device connected aggregation switch devices within the first check group.

In another example, X1 represents the network hop count of the selected second data device connected aggregation switch device to the 1 st non-failed data device connected aggregation switch device within the first check group, … …, Xn-1 represents the network hop count of the selected second data device connected aggregation switch device to the n-1 st non-failed data device connected aggregation switch device within the first check group, and Y represents the sum of the network hop count of the selected second data device connected aggregation switch device to each of the remaining non-failed data device connected aggregation switch devices within the first check group.

The manner in which the controller selects the second data device is described in detail below.

First, the controller selects the second data device among data devices located in the same edge domain as the first data device. Then, when the second data device cannot be selected from among the data devices located in the same edge domain as the first data device, the controller may select, as the second data device, a data device having the smallest network hop count from among the data devices located in different edge domains from the first data device. The controller may traverse all of the different edge domains and select the data device with the least number of network hops.

Several examples of the controller selecting the second edge device among the data devices located in the same edge domain as the first data device are described below:

in one example, the controller may optionally select one data device as the second data device in the same edge.

In one example, as shown in FIG. 4, step 401: the controller may first select an idle data device in the same edge as the second data device. If there are multiple idle data devices in the same edge domain, a data device located under a different switching device than the other data devices except the failed first data device in the first parity group may be preferentially selected as the second data device. Step 402: the controller may select a removable data device in the same edge as the second data device if no free data devices exist in the same edge domain. If there are a plurality of detachable data devices in the same edge domain, a data device located under a different switching device from the other data device except the failed first data device in the first parity group can be preferentially selected as the second data device. As already described above in the division of check groups, a free data device is a data device that does not belong to any check group. The second check group to which the detachable data device belongs is different from the first check group to which the first data device belongs, and the number of data devices included in the second check group is greater than the minimum number of data devices required by a check algorithm adopted by the check group, and data devices beyond the minimum number are detachable data devices.

As with the example of fig. 5, and still in conjunction with the contents of fig. 2, data device a1, data device b1, and data device c1 comprise a check group, located in vlan 10. Data device b2, data device c2, and data device d1 form a parity group, which is located in vlan 20. After switch B failure, data device B1 and data device B2 failed. The controller needs to select two data devices instead of data device b1 and data device b2, respectively. In selecting a data device, the controller may first consider whether there are free data devices in the same edge domain. It happens that data device a2 and data device d2 are both idle data devices. At this point data device a2 may be selected instead of data device b2 and data device a2 added to vlan 20; data device d2 was selected in place of data device b1 and data device d2 was added to the vlan 10. In this case, the device performing data recovery performs check calculation on the data in the data device c2 and the data device d1, may obtain the data in the data device b2, and stores the calculated data in the data device a 2. The data recovery device performs check calculation on the data in the data device a1 and the data device c1 to obtain the data in the data device b1, and stores the calculated data in the data device d 2. The foregoing describes two ways of data recovery, distributed management and centralized management. In the example of fig. 5, centralized management may be employed, and the device for data recovery may be some switching device, or some data device. Distributed management may also be adopted, in which when data in the data device b1 is recovered, the data recovery devices are the data device a1 and the data device c1 that have not failed, and when data in the data device b2 is recovered, the data recovery devices are the data device c2 and the data device d1 that have not failed.

In the example of fig. 5, data device a2 may also be selected in place of data device b1, and data device a2 is added to vlan 10; data device d2 was selected in place of data device b2 and data device d2 was added to the vlan 20. The same principle as the above is not repeated.

As in the example of fig. 6, data devices a1, a2, b1, and c1 form a parity group, and belong to vlan 10. The data devices b2, c2 and d1 form a check group and belong to the vlan 20. When data device d1 fails, the controller needs to select one data device to replace data device d 1. The controller finds that no free data device exists in the same edge domain, and that a data device can be split out from a parity group consisting of data devices a1, a2, b1 and c 1. The controller selects data device a2 instead of data device d1, allowing for preference for data devices under different switching devices. Data device a1 may also be selected in place of data device d 1.

In another example, the controller may select the second data device in the following manner:

first, a second data device is selected from data devices located under different switching devices than the other data devices except the first data device that failed in the first parity group. When the controller selects the second data device from the data devices under different switching devices from the other data devices in the first check group except the failed first data device, an example is: and one data device is optionally selected from the data devices under the different switching devices as a second data device. Another example is: firstly, an idle data device is selected from the data devices under the different switching devices as the second data device. The controller may select a removable data device among the data devices under the different switching device as the second data device if there is no free data device among the data devices under the different switching device. If the removable data device cannot be selected, it indicates that the second data device cannot be selected from the data devices located under different switching devices than the first data device that failed in the first parity group.

When the controller selects different switching devices, it may be that switching devices connected to other data devices other than the failed first data device in the first check group belong to the same aggregation switch preferentially, and if the switching devices connected to other data devices other than the failed first data device in the first check group belong to the same aggregation switch, it is impossible to select a second data device, and then select the second data device from the switching devices under the aggregation switches different from the first switching device.

Then, when the second data device cannot be selected from the data devices under the same switching device as the data devices under the first data device that failed in the first parity group, the controller may select the second data device from the data devices under the same switching device as the data devices under the first data device that failed in the first parity group. Therefore, the data equipment in one check group can be connected to different switching equipment as far as possible, even if one switching equipment fails, only a small number of data equipment in the check group is affected, and the data in the failed data equipment can be conveniently recovered. When the controller selects the second data device from the data devices located under the same switching device as the other data devices except the failed first data device in the first check group, an example is: and one data device is optionally selected as a second data device from the data devices under the same exchange device. Another example is: firstly, an idle data device is selected from the data devices under the same exchange device as the second data device. The controller may select a removable data device as the second data device among the data devices under the same switching device if there is no idle data device among the data devices under the same switching device.

As shown in fig. 4, when no free data device or no detachable data device can be selected in the same edge domain, it means that no second data device can be selected in the same edge domain, and then a second data device can be selected in a data device located in a different edge domain from the first data device. The specific process is as follows:

step 403: the controller may determine whether there are free data devices and/or partitionable data devices in different edge domains.

If not, data recovery cannot be performed, and the process may be ended. If there are idle data devices and/or detachable data devices, step 404 may be performed.

Step 404: and judging whether one data device with the minimum network hop number in different edge domains is the second data device, and if so, taking the data device as the second data device. If there are a plurality of data devices with the minimum network hop count, a data device under the switching device with the minimum network load or the aggregation switching device may be preferentially selected as the second data device from the plurality of data devices with the minimum network hop count, and the specific procedure may be to execute step 405. If there are a plurality of data devices with the minimum network hop count, any data device with the minimum network hop count may also be selected as the second data device.

Step 405: and judging whether the switching equipment with the minimum network load or the data equipment under the convergence switching equipment is one of the plurality of data equipment with the minimum network hop count. And if so, taking the switching equipment with the minimum network load or the data equipment under the converged switching equipment as second data equipment. If the switching device with the minimum network load or the data device under the aggregation switching device is multiple, the idle data device can be selected preferentially, and the detachable data device can be selected secondarily. A specific procedure may be to perform step 406. If there are a plurality of switching devices with the minimum network load or data devices under the aggregation switching device, any data device may also be selected as the second data device at the switching device with the minimum network load or data device under the aggregation switching device.

Step 406: and if the data equipment with the minimum network load or the data equipment under the aggregation switching equipment exists, any idle data equipment can be selected as the second data equipment.

If there are no idle data devices, step 407 may be performed: and preferentially selecting the data device with the largest remaining storage space in the detachable data devices as the second data device. Any one of the detachable data devices may also be selected as said second data device if no free data device is present.

As in the example of fig. 7, data devices a1, b1, and c1 in edge domain 1 form a parity group, and belong to vlan 0. Data device c1 failed. The controller finds that there is no free data device in edge domain 1 and no detachable data device, one data device can be selected in different edge domains 2 and 3 instead of data device c1 in edge domain 1. The number of network hops from data devices d1 and d2 in edge domain 2, and data device e1 in edge domain 3 to data devices a1 and b1 in edge domain 1 are the same. But the network load of the aggregation switch device in edge domain 2 is twice that of the aggregation switch device in edge domain 3, so data device e1 under the aggregation switch device with smaller network load can be picked to replace data device c 1. If the network load brought by each data device to the aggregation switch device is the same. The actually monitored network load of the aggregation switching device can be used as a reference for selecting the data device for data recovery.

The process after the controller selects the second data device is described next.

Step 304: and the controller adds the second data equipment to a first check group in which the first data equipment is positioned. And the plurality of data devices of the first check group belong to the same logical network. Specifically, the fault-tolerant data stream ports in the multiple data devices of the first check group belong to the same logical network. The service data stream ports of the data devices of the first check group belong to the same logical network, and the data source stream ports of the data devices of the first check group belong to the same logical network. The logical networks to which the three ports belong may be the same or different. When the logic networks of the different types are different, the data flow isolation of different types can be realized, and the safety of data transmission of different types can be improved.

When the controller adds the second data device to the first check group in which the first data device is located, the controller may notify the second data device of the modification port, and specifically, the controller notifies the switching device connected to the second data device of the modification port, so that the second data device is added to the first check group in which the first data device is located.

Further, when the first data device fails but the switching device connected to the first data device does not fail, the controller may further delete the first data device from the first check group in which the first data device is located, specifically, the controller may notify the switching device connected to the first data device to modify the port, so that the first data device is deleted from the first check group. If the switching device connected with the first data device fails to cause data failure in the first data device, the controller may send an instruction to all devices (including data devices, other switching devices, aggregation switching devices, etc.) connected with the failed switching device, indicating that data interaction is not performed with the failed switching device.

Step 305: the controller notifies other data devices of the first check group to send data for repairing the first data, so that the second data device obtains the repaired first data.

The above describes two management methods for data recovery. One way is centralized management, and a data device needs to send data that is not lost to a device for data recovery, where the device may be a switching device, a data device, or another device. At this time, the controller may notify the other data devices of the first parity group to send data to the device performing data recovery, so that the device repairs the first data. For example, the switching device performs data recovery, the other data devices in the first check group except the first data device respectively send the data stored in the data devices to the switching device, and the switching device recovers the first data according to multiple copies of data and sends the first data to the second data device.

In another approach, distributed management, the data device sends the data that is not lost to multiple devices for data recovery. At this point, the controller may notify the other data devices of the first parity group to send data to multiple devices in order for this device to repair the first data.

Optionally, after the data recovery is completed, step 306 may be further performed: the controller may be notified of completion of data recovery by the data recovery device, or the controller may be notified by the selected second data device.

The foregoing describes a method for data recovery according to an embodiment of the present application, and a communication device for data recovery according to an embodiment of the present application is described below. The method and the device are based on the same technical conception, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

Based on the same technical concept as the method for data recovery, as shown in fig. 8, a communication device 800 for data recovery is provided, and the device 800 can perform the steps performed by the controller in fig. 3 and 4. The apparatus 800 may include: determination module 810, selection module 820, addition module 830, notification module 840.

In one example, the determining module 810 is configured to determine that a first data device in the data-switching network, the first data device storing first data, is failed; the selecting module 820 is configured to select a second data device; the adding module 830 is configured to add the second data device to a first check group in which the first data device is located, where multiple data devices of the first check group belong to a same logical network; and the notifying module 840 is configured to notify other data devices of the first parity group to send data for repairing the first data, so that the second data device obtains the repaired first data.

In an example, the selecting module 820, when being configured to select the second data device, is specifically configured to: selecting the second data device among data devices located in the same edge domain as the first data device.

In an example, the selecting module 820, when being configured to select the second data device, is specifically configured to: and when the second data equipment cannot be selected from the data equipment located in the same edge domain as the first data equipment, selecting the data equipment with the minimum network hop count from the data equipment located in the edge domain different from the first data equipment as the second data equipment.

In an example, when the adding module 830 is configured to add the second data device to the first check group in which the first data device is located, specifically: and informing the second data equipment of modifying the port so that the second data equipment is added into the first check group in which the first data equipment is positioned.

Fig. 9 is a schematic block diagram of a communication apparatus 900 according to an embodiment of the present application. It is understood that the communications apparatus 900 is capable of performing the various steps described above in the methods of fig. 3 and 4 as being performed by the controller. The communication apparatus 900 includes: the processor 910 and the transceiver 920, optionally, further include a memory 930. The processor 910 and the memory 930 are electrically coupled.

Illustratively, a memory 930 for storing a computer program; the processor 910 may be configured to call the computer program or instructions stored in the memory to execute the above-mentioned method via the transceiver 920.

In one example, a processor 910 configured to determine that a first data device in the data switching network, the first data device storing first data, is failed; selecting a second data device, and adding the second data device to a first check group where the first data device is located, wherein a plurality of data devices of the first check group belong to the same logic network; and notifying other data devices of the first check group to send data for repair of the first data through a transceiver 920, so that the second data device obtains the repaired first data.

In one example, the processor 910, when configured to select the second data device, is specifically configured to: selecting the second data device among data devices located in the same edge domain as the first data device.

In one example, the processor 910, when configured to select the second data device, is specifically configured to: and when the second data equipment cannot be selected from the data equipment located in the same edge domain as the first data equipment, selecting the data equipment with the minimum network hop count from the data equipment located in the edge domain different from the first data equipment as the second data equipment.

In an example, when the processor 910 is configured to add the second data device to the first check group in which the first data device is located, specifically, to: the processor 910 notifies the second data device to modify the port through the transceiver 920, so that the second data device joins the first check group in which the first data device is located.

The processor may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP. The processor may further include a hardware chip or other general purpose processor. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The aforementioned PLDs may be Complex Programmable Logic Devices (CPLDs), field-programmable gate arrays (FPGAs), General Array Logic (GAL) and other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., or any combination thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The embodiment of the application also provides a computer storage medium, which stores a computer program, and when the computer program is executed by a computer, the computer can be used for executing the data recovery method. For example, to perform the various steps performed by the controller in the methods of fig. 3 and 4 described above.

Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, enable the computer to perform the above-provided data recovery method. For example, to perform the various steps performed by the controller in the methods of fig. 3 and 4 described above.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include such modifications and variations.

Claims

1. A method of data recovery, the method being applied to a data switching network comprising a controller and a plurality of data devices; the method comprises the following steps:

the controller determines that a first data device in the data switching network, which stores first data, is failed;

the controller selects a second data device, and adds the second data device into a first check group in which the first data device is located, wherein a plurality of data devices of the first check group belong to the same logic network;

the controller notifies other data devices of the first check group to send data for repairing the first data, so that the second data device obtains the repaired first data.

2. The method of claim 1, wherein the data switching network further comprises a plurality of switching devices, each switching device coupled to at least one data device, different data devices in the first parity group coupled to different switching devices.

3. The method of claim 1 or2, wherein the controller selecting the second data device comprises:

the controller selects the second data device among data devices located in the same edge domain as the first data device.

4. The method of any of claims 1-3, wherein the controller selecting the second data device comprises:

when the second data device cannot be selected from the data devices located in the same edge domain as the first data device, the controller selects the data device with the smallest network hop count from the data devices located in different edge domains from the first data device as the second data device.

5. The method of any of claims 1-4, wherein the controller adding the second data device to the first parity group in which the first data device is located comprises:

the controller notifies the second data device to modify the port so that the second data device joins the first check group in which the first data device is located.

6. The method of claim 5, wherein the port of the second data device comprises: a fault tolerant data stream port.

7. The method of claim 6, wherein the port of the second data device further comprises: a traffic data stream port and/or a data source stream port, wherein the fault-tolerant data stream port, the traffic data stream port and the data source stream port belong to different logical networks.

8. The method according to any of claims 1-7, wherein the logical network is obtained by network partitioning by any of:

VPN, VLAN, virtual extensible LAN VXLAN, and generic routing encapsulation GRE.

9. A communications apparatus, the apparatus comprising:

a selection module for selecting a second data device;

10. The apparatus of claim 9, wherein the apparatus is applied to a data switching network comprising the apparatus, a plurality of data devices, and a plurality of switching devices, each switching device connected to at least one data device, different data devices of the first parity group connected to different switching devices.

11. The apparatus according to claim 9 or 10, wherein the selection module, when configured to select the second data device, is specifically configured to: selecting the second data device among data devices located in the same edge domain as the first data device.

12. The apparatus according to any of claims 9 to 11, wherein the selection module, when configured to select the second data device, is specifically configured to: and when the second data equipment cannot be selected from the data equipment located in the same edge domain as the first data equipment, selecting the data equipment with the minimum network hop count from the data equipment located in the edge domain different from the first data equipment as the second data equipment.

13. The apparatus according to any one of claims 9 to 12, wherein the adding module, when configured to add the second data device to the first parity group in which the first data device is located, is specifically configured to: and informing the second data equipment of modifying the port so that the second data equipment is added into the first check group in which the first data equipment is positioned.

14. The apparatus of claim 13, wherein the port of the apparatus comprises: a fault tolerant data stream port.

15. The apparatus of claim 14, wherein the port of the apparatus further comprises: a traffic data stream port and/or a data source stream port, wherein the fault-tolerant data stream port, the traffic data stream port and the data source stream port belong to different logical networks.

16. The apparatus according to any of claims 9-15, wherein the logical network is obtained by network partitioning by any of:

VPN, VLAN, virtual extensible LAN VXLAN, and generic routing encapsulation GRE.

17. A communications apparatus, the apparatus comprising: a processor and a memory;

the memory to store computer program instructions;

the processor, configured to execute some or all of the computer program instructions in the memory to cause the communication device to perform the method of any of claims 1-8.

18. A computer-readable storage medium having computer-readable instructions stored thereon which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1-8.