WO2010143989A2

WO2010143989A2 - "rateless packet" scheme for distributed rateless coding in networked systems

Info

Publication number: WO2010143989A2
Application number: PCT/RS2009/000018
Authority: WO
Inventors: Dejan Vukobratovic; Cedomir Stefanovic; Vladimir Crnojevic; Vojin Senk
Original assignee: Dejan Vukobratovic; Cedomir Stefanovic; Vladimir Crnojevic; Vojin Senk
Priority date: 2009-06-10
Filing date: 2009-06-10
Publication date: 2010-12-16

Description

"RATELESS PACKET" SCHEME FOR DISTRIBUTED RATELESS CODING IN NETWORKED SYSTEMS

Technical Field

The present invention relates in general to coding theory, rateless codes and distributed rateless coding in communication networks. More specifically, it relates to network storage of information and/or data gathering in networks, such as Wireless Sensor Networks, Wireless Ad-Hoc Networks, Peer-to-Peer (Overlay) Networks, etc, where the distributed data set residing in different network nodes is encoded using distributed rateless codes and dispersed over the network in random manner.

Background Art

Rateless codes are a class of error-correcting codes with universally capacity- approaching performance on any erasure channel. Using rateless codes, a source message of length N information symbols can be encoded into a potentially infinite amount of encoded symbols. Encoded symbols are random and equally important representations of the source message. Rateless codes are conceptually introduced in J. Byers, M. Luby, M. Mitzenmacher, and A. Rege, "A digital fountain approach to reliable distribution of bulk data," ACM SIGCOMM 98, pp. 56-67, Vancouver, Canada, September 1998. The major classes of rateless codes proposed so far are LT codes, , "LT Codes," Proc. of the 43rd Annual IEEE Symp. Foundations of Computer Science (FOCS), Vancouver, Canada, November 2002, and Raptor codes, A. Shokrollahi, "Raptor Codes," IEEE Trans, on Information Theory, vol. 52, No. 6, pp. 2551-2567, June 2006.

In practical applications, both information and encoded symbols are usually equal length binary-data packets. In the standard, point-to-point communication channel scenario encoded symbols are transmitted over an erasure channel and a subset of the transmitted encoded symbols reaches the receiver. A desirable property of rateless codes is that once the receiver collects any subset of size N' encoded symbols, where N' is only slightly larger than N, decoding the source message is possible with high probability. A useful measure of rateless code efficiency is the average amount of received encoded symbols N'_avg needed for successful decoding at the receiver. It is more often expressed using the average reception overhead ε > 0, defined as N'_avg ⁼ (l+ε)K. A class of rateless codes is asymptotically capacity-achieving if the average reception overhead ε → 0 as N → oo. The first practical realization of asymptotically capacity-achieving class of rateless codes are LT codes, M. Luby, "LT Codes," Proc. of the 43rd Annual IEEE Symp. Foundations of Computer Science (FOCS), Vancouver, Canada, November 2002. LT codes are encoded by selecting uniformly at random d different information symbols and their bit- wise XOR-ing into the encoded symbol. The degree d of each encoded symbol is drawn independently using a discrete probability distribution Ω(d) called the degree distribution. For LT codes, a particular degree distribution called Robust Soliton degree distribution ΩRs(d) is designed for capacity-achieving performance with the iterative Belief-Propagation (BP) decoder.

Rateless codes were developed to deal with communication scenarios where the receiver and transmitter do not know the channel statistics before transmitting, or where this statistics is subject to change. They can adapt to these situations by generating as many encoded symbols as needed so that the receiver can always decode original information symbols when it collects a sufficient number of encoded symbols. They are particularly attractive in multicast/broadcast scenarios where feedback channel from the receivers does not exist or feedback messages from large number of receiver would cause large number of feedback messages congesting the multicast/broadcast server capacity.

The schemes based on distributed rateless codes that employ simple binary- coefficient data combining (XOR-ing) of small number of packets became of particular interest in network applications because of low complexity and high performance. However, in all of the distributed rateless coding schemes proposed so far, collecting required number of uniformly sampled different data packets and performing rateless encoding is responsibility of network nodes, see A. Dimakis, V. Prabhakaran, K. Ramchandran, "Distributed Fountain Codes for Networked Storage," Proc. IEEE ICASSP 2006, 2006; A. Kamra, J. Feldman, V. Misra and D. Rubenstein, "Growth Codes: Maximizing Sensor Network Data Persistence," Proc. ACM SIGCOMM 2006, Pisa, Italy, 2006; Y. Lin, B. Liang and B. Li, "Data Persistance in Large-Scale Sensor Networks with Decentralized Fountain Codes," Proc. IEEE INFOCOM 2007, Anchorage, AL, USA, 2007, and S. AIy, Z. Kong, and E. SoIj anin, "Fountain codes based distributed storage algorithms for large-scale wireless sensor networks," Proc. IEEE/ACM IPSN 2008, S. Louis, MO, USA, 2008. The scheme described in the proposed invention is different in nature as it shifts the encoding control from network nodes to encoded packets themselves. This simple shift in paradigm makes the proposed invention simple and natural solution for distributed rateless coding in networked environments.

Disclosure of the Invention

The present invention is based on the idea of giving the task of distributed rateless encoding to the encoded packets themselves (rateless packets). Each rateless packet is initially associated with a randomly selected degree d from a given degree distribution, after which it randomly traverses the network collecting original data packets until a given degree is reached and the rateless packet is finally stored in a random network node. That is, the main task of the rateless packet is to collect and add to its content d data packets selected uniformly at random by performing random walk across the network. Therefore, rateless packet encoding relies on efficient procedure for selecting network nodes uniformly at random by applying random walk on a graph representing given network. This topic, random walks on graphs, has a well developed mathematical background, which we briefly describe in the following paragraph for the purpose of terminology and notation.

Random walk on a graph is a sequence of nodes such that the next node j in the sequence is selected from the set of neighbors of the previous node i in the sequence with probability p,_j. Random walk on a graph can be modeled as a Markov Chain, where Markov Chain states correspond to the graph vertices (network nodes), and the transition probability matrix P = [p,_j] is defined by random walk next-hop probabilities p_υ. Performing sufficiently long random walks on such a graphs leads to the stationary distribution π = (π_\, π₂, . . . , π>j) such that π = TΓP, which gives the probabilities π, that the random walk will finish in the node i. For the rateless packet encoding problem, the goal is to reach uniform stationary distribution π while performing as short as possible random walk, so the matrix P should be designed in such a way to meet these conditions. Two popular solutions of this problem are maximum degee (MD) and Metropolis-Hastings (MH) algorithms. The length of the random walk needed to closely approach the stationary distribution is called the mixing time τ of Markov Chain. It represents the number of steps of random walk required for the probability of visiting any node i after τ steps to be "very close" to the stationary probability π, for any node i, and it should be as small as possible.

With the rateless packet encoding scheme, following the encoding phase, rateless packets are uniformly distributed over the network and by collecting a sufficient number of them original data packets can be recovered using usual iterative BP decoding. Depending on the number of the rateless packets created and dispersed accross the network, only a subset of network nodes will have to be visited to gather enough rateless packets for successful decoding of original network data. The details of the rateless packet scheme are presented in detail in the following sections.

Brief Description of Drawings

FIG. 1 illustrates a generic network of N nodes, each node containing a single information packet.

FIG. 2 describes a format of the rateless packet. Rateless packet header fields are presented in detail.

FIG. 3 presents an example of the initialization phase in creating a single rateless packet according to an embodiment of the invention

FIG. 4 presents an example of a part of the encoding phase in encoding a single rateless packet according to an embodiment of the invention

FIG. 5 presents an example of finalized encoding phase of a single rateless packet according to an embodiment of the invention. FIG. 6 presents an example of the dispersion phase of a single rateless packet according to an embodiment of the invention.

FIG. 7 presents a detailed rateless packet processing algorithm in each network node according to an embodiment of the invention.

FIG. 8 presents a possible embodiment of a system employing the rateless packet scheme described herein.

Best Mode for Carrying Out of the Invention

FIG. 1 describe a generic network (100) of N network nodes (102) capable of storing and communicating information packets (101). The network nodes and the corresponding network links may be wireless or wired. Network example in FIG. 1 and the example of the proposed invention in FIGS. 3-6 illustrate the case of wireless network nodes as part of for example Wireless Ad-Hoc Network or Wireless Sensor Network. In one embodiment, each network node (102) generates an information packet (101) of equal length of bits/bytes. In FIG. 1, this is explicitly represented only for node with Node ID equal to 37 (101), for the sake of figure clarity. In one embodiment, the goal of the invention is to enable efficient gathering of the set of N information packets from all network nodes. The invention describes a method of distributed rateless encoding of N information packets. In one embodiment, the method is able to produce arbitrary amount of encoded packets called rateless packets, which are of equal length as the information packets, and distribute them uniformly across the network. The invention enables recovery of all N information packets if any slightly more than N rateless packets are gathered from any set of network nodes. Information packet recovery may be efficiently performed by using standard iterative rateless code decoders. The major and novel property of the invention is that the process of rateless encoding is packet-centric, that is, it is the task of rateless packets themselves to control the encoding process. The invention may be applied an different data gathering applications in different networking technologies. Examples are data gathering applications, applications that preserve data persistence in case of massive node failures, distributed network storage applications, as part of the Wireless Sensor Networks, Wireless Ad-Hoc Networks, Peer-to-Peer Networks, etc. The following description illustrates one possible embodiment of the proposed scheme.

In one embodiment, equal-length information packets are generated periodically by all N nodes in the network, one per sensor node. In one embodiment, the information packet is kept in the memory of the network node until replaced by the next information packet generated in the following time period. The set of all information packets produced at the beginning of each period represents one data generation and we restrict our attention to distributed rateless encoding of information packets belonging to a single generation. In one embodiment, the distributed rateless coding scheme creates and disperses a sufficient number of rateless packets in a distributed fashion uniformly across the WSN. Distributed rateless encoding should preserve the same properties of rateless packets as if they were encoded in a centralized manner, using the standard rateless code encoders. In each network node, rateless packets are stored in the memory buffer of sufficient capacity.

FIG. 2 describes one embodiment of the rateless packet format (200). Rateless packet header fields of the described embodiment are presented in detail. Rateless packets are generated from information packets. In the described embodiment, rateless packet contains header fields and rateless packet data field, as depicted in Fig. 2. Generation ID header field (201) denotes the time period when the information packets, which are encoded into the rateless packet, were created. Degree counter header field (202) and Mixing time counter header field (203) control the encoding process. Node ID's header field (204) denotes the network nodes whose information packets are encoded in the rateless packet data field (205). The process of creating rateless packet consists of three phases: initialization, encoding and dispersion of the rateless packet. In the following, we describe one embodiment of each phase.

In one embodiment of the initialization phase, every network node initializes b rateless packets. This results in the total of bN rateless packets initialized across N network nodes. Note that b need not to be an integer. For example, if b = 2.5, each of N network nodes will initialize 2 rateless packets, and will initialize the third rateless packet with probability 0.5. Note that b defines the actual rate of the distributed rateless packet encoding scheme and can be arbitrary large.

In one embodiment of the initialization phase, after the network node initializes b rateless packets, it is responsible for filling these rateless packets with the initial information. In one embodiment, the network node places a copy of its own information packet into the rateless packet data field of every initialized rateless packet. In one embodiment, the network node places its node ID in the node ID's header field of every initialized rateless packet. In one embodiment, the network node independently associates a degree d drawn randomly from a selected degree distribution Ω(d) to each of b initialized rateless packets. As the rateless packet data field is initialized with the local node information packet, the Degree counter is initialized to value d - 1, which is the remaining degree (number of information packets) to be collected by the initialized rateless packet. Finally, the Mixing time counter is set to the chosen global mixing time constant value τ.

FIG. 3 illustrates an example of the described embodiment of the rateless packet initialization phase. In FIG. 3, the rateless packet (200) belonging to the 5-th data generation (201) is initialized in the network node with ID 37 with the local copy of the network information packet (204, 205). The degree d = 3 is randomly drawn (210) from the degree distribution Ω(d) and the degree counter is initialized to the value d - 1 = 2 (202). The mixing time counter is initialized with the value τ = 3 (203). Note that the same procedure is applied simultaneously for all the &N rateless packets initialized in the network.

Following the initialization phase, bN rateless packets start the encoding phase. In one embodiment of the encoding phase, the task of each rateless packet is to add to its content the remaining d - 1 information packets selected uniformly at random by performing random walk across the network. The transition probabilities py of selecting network node j from the set N(i) are obtained locally by each network node i. A detailed review of the distributed algorithms for the matrix P design is provided later. While performing random walk, every rateless packet is processed by every network node on the path using the node processing algorithm whose one embodiment is presented in FIG. 7. For the sake of clarity, we describe the basic idea of the node processing algorithm now, and provide detailed algorithm description later, while discussing FIG. 7. The network node applies the node processing algorithm on every rateless packet it receives. Following the inspection of the rateless packet header fields, the network node performs following actions. If the Mixing time counter of the rateless is larger than zero, the network node only updates the rateless packet header by decreasing the value of its Mixing time counter header field by one and forwards the rateless packet to the next random hop. Otherwise, if the Mixing time counter header field is equal to zero, the network node adds (bit- wise XOR' s) its information packet to the rateless packet content, updates the rateless packet header fields - decreases the Degree counter header field value by one, puts its own Node ID in the list of Node ID's header field, resets the Mixing time counter header field to its initial value τ, and finally, forwards the rateless packet to the next random hop. An exception to this rule occurs when the rateless packet Mixing time counter header field expires (reaches zero) in the network node that has already contributed its information packet to the content of the rateless packet, which is easily checked by inspecting the Node ID's header field. In that case, the rateless packet continues its random walk until the first network node whose information packet has not been included in the rateless packet content. For the standard LT code degree distributions, where the low degrees dominate, this situation happens rarely and is usually resolved in small number of additional hops. Finally, upon collecting d information packets, the rateless packet completes its encoding phase.

FIG. 4 illustrates an example of the described embodiment of a part of the rateless packet encoding phase. FIG. 4 illustrates the first τ = 3 hops (301, 302, 303) of the encoding phase of the rateless packet initialized by the network node 37 (300) as described in FIG. 3. The first two hops (301, 302) illustrate the case when the Mixing time counter header field is larger than zero, which is why in these nodes Mixing time counter is decremented, and the rateless packet is forwarded to the next hop. In the third hop (303), which is the sensor node with ID 49, the Mixing time counter becomes zero. Therefore, the information packet of the node 49 is XOR-ed to the rateless packet, its ID is appended to the Node's ID header field, Degree counter is decremented by one (from 2 to 1), and Mixing time counter is reset to τ = 3, as illustrated in FIG. 4 (303). Since the Degree counter does not reach zero in the network node 49, the rateless packet is let to make another τ = 3 hops. FIG. 5 illustrates the situation after these three hops (304) and additional encoding step inside the network node with ID 76. After node 76 includes its information packet into the rateless packet, the required degree d = 3 is achieved (i.e., the Degree counter header field is zero) and the encoding phase is finished (304).

After the encoding phase, the rateless packet is located in the last network node which appended its information packet to the rateless packet content (i.e., the network node that decreased the degree counter from value 1 to value 0). To prevent correlation between the content of the rateless packet and the network node where it is finally stored, each rateless packet continues its random walk for another τ hops, as part of one embodiment of the dispersion phase. The goal of the dispersion phase is to place the rateless packet in its final random position in the network.

An example of the described embodiment of the dispersion phase is illustrated in FIG. 6. The dispersion phase consists of additional τ = 3 hops, as illustrated in FIG. 6, after which the rateless packet is finally stored in the network node with ID 14 (305). It is important to note that the same procedure is performed simultaneously by all of the bN rateless packets in the network.

In the following, we describe several possibilities for probabilistic rateless packet forwarding of rateless packets in network nodes. Each network node with ID i should be able to locally initialize the transition probabilities py of forwarding rateless packets to any of its neighbors j.

In one embodiment, the network node should select any of its neighbors equally- likely, i.e., any neighbor j of a node i from N(i) should be selected with probability py = l/d(i), where d(i) = |N(i)| is the degree of the node i. Such probabilistic packet forwarding is called normal random walk (NRW). It converges to the stationary distribution π where ϊti = d(i)/2m, which is uniform only if each network node has the same degree, i.e., the underlying network graph is regular.

In another embodiment, probabilistic rateless packet forwarding can be based on so called Maximum-Degree (MD) algorithm. In MD, a network node associates transition probabilities to each of its neighbors as py = l/d_max, where d_max is the maximum degree of any network node. Also, the node associates the self-transition probability p_ϋ = 1 - dj/dm_ax, where the next hop is the same node itself (note that the NRW algorithm does not associate self-transition probabilities). The value d_max has to be available in all network nodes before transition probabilities initialization, which is the major drawback of MD algorithm. However, the upper bound for d_max may be used by each network node if the limit on the number of network node neighbors is known, for example, due to technical limitations.

In another embodiment, probabilistic rateless packet forwarding can be based on so called Metropolis-Hastings (MH) algorithm. In MH, network nodes first exchange information on their degrees with their neighbors. After that, a network node i associates transition probability to its neighbor j as p,_j = l/max(dj,d_j) where max() selects the larger of the two degree values. Also, the node associates the self-transition probability as p_ϋ = 1 - ∑j Pij.

FIG. 7 describes one embodiment of the network node processing algorithm. The algorithm operates in every network node and is applied on every rateless packet received by the network node. The key variables used by the algorithm are extracted from the rateless packet header fields. These variables are the following:

MTC - Mixing Time Counter; DC - Degree Counter;

NODE ID - The ID of the network node in which the rateless packet is processed; NODE(i) ID - The ID of the i-th node from the list contained in the Node ID's header field. Note that the information contained in the Generation ID header field is not relevant in the node processing algorithm. This information is relevant as part of the decoding process described later. The embodiment of the algorithm presented in FIG. 7 closely follows the embodiment of the proposed rateless packet scheme as described in the text above, which is why we do not repeat the node processing algorithm steps.

The described invention proposes a method for distributed rateless encoding of the set of N information packets residing in N different network nodes, one information packet per node. Clearly, the situations where the number of nodes and packets differ are easily encodable using the same rateless packet principle. The resources invested in distributed rateless encoding are justified by considerably simplified process of data gathering. The simplicity of data collection results from the fact that the original set of N information packets can be recovered from any slightly more than N rateless packets collected from any network nodes by applying the iterative rateless decoding procedures which are efficient and of low complexity. This useful property can be used in different ways, such as the following.

In one embodiment of the rateless packet based system, a mobile collector (400) can be used for data gathering. In this scenario, which is depicted in FIG. 8, the mobile collector (400) makes a random and unplanned tour (401) around the network communicating with random network nodes and retrieving their stored rateless packets (402) until it collects slightly more than N rateless packets sufficient for recovery of the original network information packets. This scenario may be useful in large scale Wireless Sensor Networks deployed at inaccessible regions (e.g., mountainous, desert areas) or regions uncovered by external network connectivity (e.g., large agricultural fields). For example, rateless packet scheme is suitable for system implementation where the data gathering is performed periodically and infrequently (e.g., once per week) in large scale Wireless Sensor Networks deployed for agricultural applications. Prior to scheduled periodic collector arrival, rateless packet scheme is initiated in order to encode and disperse sensor data across the Wireless Sensor Network. Data gathering can be performed without mobile collector path planning or optimization, by simple random "sweep" of ground or aerial mobile collector through or over the field of Wireless Sensor Network deployment as presented in FIG. 8. Even if mobile collector skips its scheduled arrival, the rateless packets of a given generation remain dispersed in the Wireless Sensor Network, stored for the future "data harvesting" cycles.

The scheme is fully distributed, adaptable on the topology changes and robust to the massive node failures. It is rich in parameters and allows for flexible system design, where the appropriate trade-off between the system efficiency, total energy consumption and mobile collector path length sufficient for data recovery can be obtained. The proposed invention has many advantages over the state of the art solutions, most notably, it is very simple and its efficiency approach the performance of centralized rateless codes.

In another embodiment of the rateless packet based system, data gathering may be performed by any network node from its local neighborhood. Due to the fact that the rateless packets are uniformly dispersed all over the network, and depending of the total number 6N of the rateless packets "injected" in the network, each network node can collect sufficient number of rateless packets to decode the network information from its own neighborhood. As an alternative to this approach, several (instead of one) rateless packets gathering points that collect rateless packets from their own local neighborhood, may be selected in the network. From these points of local gathering, rateless packets can be routed either to the sink node, or to the mobile collector.

Apart from the data gathering, rateless packet scheme can be applied for other possible applications. Examples include data persistence in networks where there is a possibility of sudden massive node failures. Also, the distributed network storage applications may be simply and efficiently designed using the rateless packet principles.

Claims

Claims:

1. A method of distributed encoding of N information packets residing in network nodes into bN so called rateless packets uniformly distributed over the network using the following steps:

- the initialization step where each network node initializes b rateless packets as described in the preferred embodiment of the proposed invention

- the rateless packets encoding step where the total of ZJN rateless packets randomly traverse the network performing distributed rateless encoding using principles described in the preferred embodiment of the proposed invention

- the rateless packet dispersion step where the rateless packets that have finished their encoding phase are let for another set of random network hops in order to find their random and uniformly selected position in the network, as principles described in the preferred embodiment of the proposed invention,

2. A method of claim 1 where the different and arbitrary number of rateless packets are created per network node,

3. A method of claim 1 where the rateless packet is initialized not as the local copy of the information packet from the network node, but by some other initialization strategy, for example, as an all-zero rateless packet,

4. A method of claim 1 where random walks of rateless packets are determined by NRW, MH, MD or similar transition probability design algorithms locally computed in each network node,

5. A method of claim 1 where the information on which network node information packets are combined into the rateless packet is determined not by employing separate and variable-length rateless packet header field for this purposes, but by some other means, for example, by using synchronized random number generators,

6. A method of claim 1 where the information packets that are combined into the rateless packets are not determined by their generation time as it is described in the preferred embodiment of the proposed invention using the concept of data generations, but by some other means, for example, by different classes of data content of information packets (e.g., layers of layered video content, or different sensor measurement data, etc.)

7. A method of claim 1 where the rateless packet dispersion phase is designed using different probabilistic packet forwarding in network nodes than applied in encoding phase. The application may be oriented towards "pushing" the rateless packets in certain direction towards certain point or points of collection for easier data gathering purposes.

8. A method of claim 1 where the rateless packet dispersion phase is separately designed using different probabilistic packet forwarding in network nodes for each of the different rateless packet classes (such as temporal generations or other possible rateless packet classifications described in claim 6). The application may be oriented towards "pushing" the rateless packets of different content in certain direction towards different gathering points inside the network,

9. A method of claim 1 where the set of information packets are first "precoded" by distributed version of Low Density Parity Check (LDPC) code or of Low Density Generator Matrix (LDPC) code or suitable distributed high-rate erasure correcting codes using the same principles as in the described invention, after which the set of obtained encoded packets are encoded by distributed rateless packet scheme as described in the proposed invention. The obtained combination of outer distributed LDPC and inner distributed rateless packet code resembles centralized Raptor code construction and shares its advantages,

10. A wireless communication system consisting of network nodes (such as sensor nodes in WSN, or lap-top computers with wireless interface cards in Wireless Ad-Hoc Networks) where each node contains or periodically generates information packets;

A rateless coding algorithm implemented in network nodes as described in the preferred embodiment of the proposed invention and illustrated in FIG. 7, which may be triggered at desired time instants automatically or by user requests;

A mobile collector (such as ground or aerial vehicle) with built-in iterative decoding functionality that randomly passes through the network establishing wireless connections with network nodes and collecting sufficient number of rateless packets until successful network data recovery,

11. The wireless communication system of claim 10, where data gathering is performed by any single or group of network nodes by querying rateless packets from the network nodes in their local neighborhood,

12. The communication system of claim 10, where inter-node connections are not wireless but wired (e.g., LAN/WAN computer networks consisting of PC stations, such as Internet),

13. The communication system of claim 12, where distributed rateless coding using rateless packet approach is implemented as part of the application software for peer-to-peer network node communications (such as file sharing), where each node (application layer software) is able to recover the data contained in plurality of network nodes by collecting sufficient number of rateless packets containing encoded desired content by querying their neighbor nodes.