WO2022066030A1

WO2022066030A1 - Graph-based list decoding with early list size reduction

Info

Publication number: WO2022066030A1
Application number: PCT/RU2020/000494
Authority: WO
Inventors: German Viktorovich SVISTUNOV; Kedi WU; Jing Liang; Liang Ma; Alexey FROLOV; Kirill ANDREEV
Original assignee: Huawei Technologies Co., Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-03-31

Abstract

The present disclosure relates to list decoding that uses a graph representation of an Error Correcting Code (ECC) and provides early list size reduction. The list decoding involves reducing a size of a list of codeword candidates for a received codeword after a minimum number of decoding iterations, and using machine learning means to improve a rule for selecting the most relevant codeword candidate for the received codeword. At the same time, the iterative decoding is performed by using the graph representation for the ECC by which the received codeword has been obtained. By so doing, it is possible to reduce the overall list-based decoding complexity and achieve acceptable decoding performance for bigger sizes of the list of codeword candidates at lower computational costs. Moreover, such graph-based list decoding may efficiently be used for decoding codewords obtained by using different ECCs, including (but not limited thereto) low-density parity-check codes.

Description

GRAPH-BASED LIST DECODING WITH EARLY LIST SIZE REDUCTION

TECHNICAL FIELD

The present disclosure relates generally to the field of information decoding, and particularly to list decoding that uses a graph representation of a code and provides early list size reduction.

BACKGROUND

Error correcting codes (ECCs) are used for controlling errors in information signals transmitted over a noisy communication channel. More specifically, a transmitter applies an appropriate encoding algorithm to an information signal to be transmitted over the communication channel. In turn, a receiver applies an appropriate decoding algorithm to determine whether the received information signal was corrupted after said transmission and to correct any errors detected therein. Low-density parity-check (LDPC) codes are one example of linear ECC, which are applicable for error correction coding in a variety of next generation communication systems.

An LDPC code is typically represented using a graph representation technique, and many characteristics thereof may be analyzed using methods based on graph theory, algebra, and probability theory. By mapping information on encoded bits constituting a codeword to vertexes (or nodes) in a graph and mapping relations between the encoded bits to edges in the graph, it is possible to consider a communication network in which the vertexes exchange predetermined messages through the edges. This makes it possible to derive an appropriate decoding algorithm for the codeword that is obtained using the LDPC code.

In particular, the graph representation technique may involve using a bipartite graph, which is referred to as a Tanner graph. In this case, the decoding algorithm may be based on using a Tanner-based deep neural network (DNN). This DNN may be trained on a zero-codeword only, thereby solving a dimensionality problem. A number of trainable parameters for such a DNN equals to a number of messages passed over the Tanner graph. However, the Tanner-graph DNN has previously been tested mostly on short Bose-Chaudhuri-Hocquenghem (BCH) codes. Moreover, as any DNN, the Tanner-based DNN suffers from the vanishing gradient problem, which makes its training time very long. On top of that, any communication system operates over a set of different codes of different lengths and rates, so that optimal weights have to be stored for every coding scheme. All of this makes the use of the T anner-graph DNN resource-intensive. There are also other existing DNN-based decoding algorithms for the ECCs, such as, for example, a syndrome-based approach and a graph partitioning approach. However, both approaches demonstrate bad decoding performance when applied to the LDPC codes.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features of the present disclosure, nor is it intended to be used to limit the scope of the present disclosure.

It is an objective of the present disclosure to provide a technical solution that enables graphbased list decoding with early list-size reduction.

The objective above is achieved by the features of the independent claims in the appended claims. Further embodiments and examples are apparent from the dependent claims, the detailed description and the accompanying drawings.

According to a first aspect, an apparatus for decoding a codeword is provided. The apparatus comprises at least one processor and a memory coupled to the at least one processor. The memory stores processor-executable instructions, and the codeword has bits encoded therein by using an Error Correcting Code (ECC). When executed, the processor-executable instructions cause the at least one processor to operate as follows. At first, the at least one processor receives initial bit estimations for the bits of the codeword. Each bit estimation is indicative of a probable value of corresponding one of the bits of the codeword. Then, the at least one processor generates a list of codeword candidates corresponding to the codeword based on the initial bit estimations. After that, the at least one processor obtains intermediate bit estimations for the list of codeword candidates by performing at least one iteration of iterative decoding on each codeword candidate of the list of codeword candidates using a graph representation for the ECC. Next, the at least one processor reduces the list of codeword candidates by discarding therefrom codeword candidates less relevant to the codeword using the intermediate bit estimations. Further, the at least one processor obtains final bit estimations for the reduced list of codeword candidates by performing the complete iterative decoding on each codeword candidate of the reduced list of codeword candidates. Finally, the at least one processor selects, in the reduced list of codeword candidates, a most relevant codeword candidate for the codeword based on the final bit estimations. With this apparatus configuration, it is possible to implement the graph-based list decoding algorithm with early list size reduction, thereby achieving acceptable decoding performance for big list sizes at low computational costs.

In one embodiment of the first aspect, the ECC used to encode the bits into the codeword is a Low-Density Parity-Check (LDPC) code. This may make the apparatus according to the first aspect more flexible in use, since LDPC codes may be used, for example, in the fifth generation (5G) communication technology.

In one embodiment of the first aspect, the iterative decoding is performed by a first machinelearning means. The first machine-learning means comprises a Deep Neural Network (DNN) using a Tanner graph representation as the graph representation for the ECC. Using the Tanner graph representation may reduce a number of trainable parameters (i.e. weights) and a number of iterations needed for decoding convergence (e.g., from 50 to 20), thereby reducing time and computational costs for the whole decoding process. Moreover, such a Tanner-based DNN may be used for short- and moderate-length codes (with a length N < 1000) and may be trained on a zero codeword only. The latter is critical because the code may contain 2ⁿ codewords, where n - 0,1,2 ..., and it is not possible to use all of them in the training procedure of the Tanner-based DNN.

In one embodiment of the first aspect, the DNN is pre-trained with a combined loss function that comprises a linear combination of a Bit Error Rate (BER) loss function and a Frame Error Rate (FER) loss function. Using the combined loss function may optimize the decoding performance.

In one embodiment of the first aspect, the DNN comprises multiple layers of neurons, and at least two of the multiple layers of neurons comprise at least one residual connection therebetween. The DNN architecture with at least one residual connection may be considered as one way to efficiently confront the vanishing gradient problem and significantly speed-up the training procedure.

In one embodiment of the first aspect, the at least two layers of neurons comprise multiple residual connections therebetween, and the multiple residual connections are configured with linear and/or non-linear dependency. This may provide a variety of architectures for the DNN, thereby making the apparatus according to the first aspect more flexible in use.

In one embodiment of the first aspect, the reducing of the list of codeword candidates is performed by a second machine-learning means. The second machine-learning means is configured to use a soft syndrome feature to define, among the list of codeword candidates, codeword candidates that are more relevant for the codeword. By using the soft syndrome, it is possible to reduce the size of the list of codeword candidates more efficiently, especially in case of short codes. Moreover, once trained, the first and second machine-learning means may use the same set of trained weights for multiple code lengths and rates, thereby providing code length and rate adaptation.

In one embodiment of the first aspect, the iterative decoding comprises min-sum iterative decoding. This may make the iterative decoding easy to use.

According to a second aspect, a method for decoding a codeword is provided. The codeword has bits encoded therein by using an ECC. The method starts with the step of receiving initial bit estimations for the bits of the codeword. Each initial bit estimation is indicative of a probable value of corresponding one of the bits of the codeword. Then, the method proceeds to the step of generating a list of codeword candidates corresponding to the codeword based on the initial bit estimations. After that, the method goes on to the step of obtaining intermediate bit estimations for the list of codeword candidates by performing at least one iteration of iterative decoding on each codeword candidate of the list of codeword candidates using a graph representation for the ECC. Next, the method proceeds to the step of reducing the list of codeword candidates by discarding therefrom codeword candidates less relevant to the codeword using the intermediate bit estimations. Further, the method proceeds to the step of obtaining final bit estimations for the reduced list of codeword candidates by performing the complete iterative decoding on each codeword candidate of the reduced list of codeword candidates. The method ends up with the step of selecting, in the reduced list of codeword candidates, a most relevant codeword candidate for the codeword based on the final bit estimations. With this method, it is possible to implement the graph-based list decoding algorithm with early list-size reduction, thereby achieving acceptable decoding performance for big list sizes at low computational costs.

In one embodiment of the second aspect, the ECC used to encode the bits into the codeword is an LDPC code. This may make the method according to the second aspect more flexible in use, since LDPC codes may be used, for example, in the 5G communication technology.

In one embodiment of the second aspect, the iterative decoding is performed by a first machine-learning means. The first machine-learning means comprises a DNN using a Tanner graph representation as the graph representation for the ECC. Using the Tanner graph representation may reduce a number of trainable parameters (i.e. weights) and a number of iterations needed for decoding convergence (e g., from 50 to 20), thereby reducing time and computational costs for the whole decoding process. Moreover, such a Tanner-based DNN may be used for short- and moderate-length codes (with a length N < 1000) and may be trained on a zero codeword only. The latter is critical because the code may contain 2ⁿ codewords, where n = 0,1,2 .... and it is not possible to use all of them in the training procedure of the Tanner-based DNN.

In one embodiment of the second aspect, the DNN is pre-trained with a combined loss function that comprises a linear combination of a BER loss function and a FER loss function. Using the combined loss function may optimize the decoding performance.

In one embodiment of the second aspect, the DNN comprises multiple layers of neurons, and at least two of the multiple layers of neurons comprise at least one residual connection therebetween. The DNN architecture with at least one residual connection may be considered as one way to efficiently confront the vanishing gradient problem and significantly speed-up the training procedure.

In one embodiment of the second aspect, the at least two layers of neurons comprise multiple residual connections therebetween, and the multiple residual connections are configured with linear and/or non-linear dependency. This may provide a variety of architectures for the DNN, thereby making the method according to the second aspect more flexible in use.

In one embodiment of the second aspect, the reducing of the list of codeword candidates is performed by a second machine-learning means. The second machine-learning means is configured to use a soft syndrome feature to define, among the list of codeword candidates, codeword candidates that are more relevant for the codeword. By using the soft syndrome, it is possible to reduce the size of the list of codeword candidates more efficiently, especially in case of short codes. Moreover, once trained, the first and second machine-learning means may use the same set of trained weights for multiple code lengths and rates, thereby providing code length and rate adaptation.

In one embodiment of the second aspect, the iterative decoding comprises min-sum iterative decoding. This may make the iterative decoding easy to use.

According to a third aspect, a computer program product is provided. The computer program product comprises computer code. When executed by at least one processor, the computer code causes the at least one processor to perform the method according to the second aspect. By using such a computer program product, it is possible to simplify the implementation of the method according to the second aspect in any computing device, such, for example, as the apparatus according to the first aspect.

According to a fourth aspect, a UE for wireless communications is provided. The UE comprises a transceiver and the apparatus according to the first aspect. The transceiver is configured to receive the codeword transmitted over a communication channel. After that, the transceiver is configured to obtain the initial bit estimations for the bits of the codeword and provide the initial bit estimations to the apparatus according to the first aspect for decoding the codeword. With this configuration, the UE may implement the graph-based list decoding algorithm with early list-size reduction, thereby achieving acceptable decoding performance for big list sizes at low computational costs.

According to a fifth aspect, a base station (BS) for wireless communications is provided. The BS comprises a transceiver and the apparatus according to the first aspect. The transceiver is configured to receive the codeword transmitted over a communication channel. After that, the transceiver is configured to obtain the initial bit estimations for the bits of the codeword and provide the initial bit estimations to the apparatus according to the first aspect for decoding the codeword. With this configuration, the BS may implement the graph-based list decoding algorithm with early list-size reduction, thereby achieving acceptable decoding performance for big list sizes at low computational costs.

Other features and advantages of the present disclosure will be apparent upon reading the following detailed description and reviewing the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is explained below with reference to the accompanying drawings, in which:

FIG. 1 shows a Tanner graph for an LDPC code in accordance with the prior art;

FIG. 2 shows a Tanner-based DNN obtained by using the Tanner graph shown in FIG. 1 ;

FIG. 3 shows a block diagram of an apparatus for decoding a codeword in accordance with one exemplary embodiment;

FIG. 4 shows a flowchart of a method for decoding a codeword in accordance with one exemplary embodiment;

FIG. 5 shows a block diagram of a Tanner-based DNN with residual connections in accordance

FIG. 6 shows a block diagram of a Tanner-based DNN with residual connections in accordance with one other exemplary embodiment;

FIG. 7 shows a block diagram of a Tanner-based DNN with residual connections in accordance with one more other exemplary embodiment;

FIG. 8 explains rate adaptation by means of an example of two parity-check matrices; FIG. 9 and 10 show tables descriptive of a certain rate adaptation scheme for the case of short LDPC codes;

FIG. 11-13 show comparison results of decoding performances provided by the method shown in FIG. 4 and the existing decoding algorithms for the case of LDPC codes;

FIG. 14 shows a block diagram of a UE for wireless communications in accordance with one exemplary embodiment; and

FIG. 15 shows a block diagram of a BS for wireless communications in accordance with one exemplary embodiment.

DETAILED DESCRIPTION

Various embodiments of the present disclosure are further described in more detail with reference to the accompanying drawings. However, the present disclosure may be embodied in many other forms and should not be construed as limited to any certain structure or function discussed in the following description. In contrast, these embodiments are provided to make the description of the present disclosure detailed and complete.

According to the detailed description, it will be apparent to the ones skilled in the art that the scope of the present disclosure encompasses any embodiment thereof, which is disclosed herein, irrespective of whether this embodiment is implemented independently or in concert with any other embodiment of the present disclosure. For example, the apparatuses and method disclosed herein may be implemented in practice by using any numbers of the embodiments provided herein. Furthermore, it should be understood that any embodiment of the present disclosure may be implemented using one or more of the elements presented in the appended claims.

The word “exemplary” is used herein in the meaning of “used as an illustration”. Unless otherwise stated, any embodiment described herein as “exemplary” should not be construed as preferable or having an advantage over other embodiments.

According to the embodiments disclosed herein, a graph may refer to a data structure that consists of the following two components:

1. a finite set of vertices also known as nodes;

2. a finite set of ordered pairs of the form (u, v), which are also known as edges. Each pair is ordered because (u, v) is not the same as (v, u) in case of a directed graph. The pair of the form (u, v) indicates that there is an edge from vertex u to vertex v. The edges may contain a weight/value/cost.

In the embodiments disclosed herein, the graph is used to represent a code that is in turn used to encode bits into a codeword. An information signal carrying the codeword is transmitted from a transmitting side to a receiving side over a communication channel. Since a noiseless communication channel does not exist, the information signal comes to the receiving side in a corrupted form, i.e. with errors. The presence of noise in the communication channel causes ECCs to be used for controlling the errors in the information signal transmitted over the noisy communication channel. One example of the ECCs is an LDPC code, and the LDPC code may be represented by using a bipartite graph, which is referred to as a Tanner graph. It should be noted that the present disclosure is not limited to the LDPC codes, and other embodiments are possible, in which other ECCs, such as, for example, Bose-Chaudhuri-Hocquenghem (BCH), Reed-Muller, polar codes, etc., or any other codes that allow controlling the errors in the information signal may be used depending on particular applications. The same is also related to the Tanner graph representation - it may be replaced with any other graph representation that enables message-passing implementation, depending on particular applications.

As used in the embodiments disclosed herein, the Tanner graph may be defined as a graph whose nodes may be separated into two classes, i.e. variable and check nodes, where edges may only connect two nodes not residing in the same class. The Tanner graph of an ECC is drawn according to the following rule: check node c is connected to variable node v whenever matrix element h_cv in a parity-check matrix H of the ECC is 1 . If the parity-check matrix H has a size m x n, there are m = n - k check nodes and n variable nodes, where k is the length of an information bit sequence.

FIG. 1 shows a Tanner graph 100 for an ECC having a parity-check matrix H 102 of size 4 x 5. The Tanner graph 100 comprises a set 104 of variable nodes x₀, x₁₍ x₂, x₃, x₄ (schematically shown as circles) and a set 106 of check nodes

(schematically shown as squares), with a set 108 of edges properly connecting the variable and check nodes. The symbolic notation e_ab for each edge means that this edge connects the variable node with index a to the check node with index b. For example, the symbolic notation e₀₂ means that there is an edge connecting the variable node x₀ to the check node c₂. Given such H, for any valid codeword x = [x₀x₁x₂x₃x₄], the checks performed at the check nodes c₀, c₁₍ c₂, c₄ to decode the codeword are written as: for check node c₀: x₃ + x₄ = 0(mod 2), for check node q: x₄ + x₂ = 0(mod 2), for check node c₂ x₀ + x₂ + x₄ = 0(mod 2), for check node c₃ x₀ + x₄ + x₃ = 0(mod 2).

Thus, for example, the variable nodes x₃ and x₄ are involved in the check performed at the check node c₀. For this reason, the check node c₀ is connected to the variable nodes x₃ and x₄ via two edges e₃₀ and e₄₀, as shown in FIG. 1. Following the same logic, one can see that the check node q is connected to the variable nodes x₄ and x₂ via two edges e_{a i} and e₂₁, the check node c₂ is connected to the variable nodes x₀, x₂, and %₄ via three edges e₀₂, e₂₂, and e₄₂, and the check node c₃ is connected to the variable nodes x₀, x₄, and x₃ via three edges e₀₃, e₁₃, and e₃₃. In this way, the Tanner graph 100 is derived from the parity-check matrix H 102 of the ECC.

A closed path in the Tanner graph 100 comprising I edges that closes back on itself is called a cycle of length I. The standard iterative decoding algorithm, also known as belief propagation (BP), passes probability messages along the set 108 of edges of the Tanner graph 100 to decode the codeword obtained by using the ECC. In the first iteration, the BP algorithm sets values of the variable nodes x₀,x₁,x₂,x₃,x₄ equal to logarithmic likelihood-ratio (LLR) values for a received codeword, which are initially obtained by a conventional transceiver on the receiving side. After that, the BP algorithm involves recalculating the LLR values iteratively through the check nodes c₀,c₁,c₂,c₄ until the valid codeword is obtained. It can be said that each edge of the Tanner graph 100 "carries” a certain LLR value in each iteration of the BP algorithm. The BP algorithm often operates well for codes with cycles, but it is not guaranteed to be optimal, i.e. appears to be unreliable.

The Tanner graph may be used to represent a DNN. Similar to the idea described above, this approach involves a direct correspondence between the Tanner graph and the DNN with the only difference: every edge in the original Tanner graph is represented by a node in the DNN, and vice versa. As a result, the so-called Tanner-based DNN architecture is obtained, which is then used for decoding the codeword in a more reliable manner.

FIG. 2 shows a Tanner-based DNN 200 obtained by using the Tanner graph 100. In the Tanner-based DNN 200, the edges of the Tanner graph 100 is represented by nodes (or neurons). The initially obtained LLR values denoted as L_1; L₂, L₃, L₄, L₅ are mapped to a first variable layer 202 of the Tanner-based DNN 200, which is called an input mapping layer in FIG. 2, in accordance with the number of the edges of the Tanner graph 100 that originate from each of the variable nodes x₀,x₁,x₂,x₃,x₄ (as follows from FIG. 1 , there are two such edges for each variable node). In fact, all layers of the Tanner-based DNN 200 represent the LLR values that are "carried” by the edges of the Tanner graph 100 at internodal transitions (i.e. the transitions between the variable and check nodes) in each decoding iteration. These LLR values, in turn, depend on the other LLR values that have come to the nodes at the previous internodal transitions. For example, at the transition from the variable nodes to the check nodes in the first decoding iteration, the edge e₀₂ (which connects the variable node x₀ to the check node c₂, as shown in FIG. 1) “carries” the LLR value defined only by the initially obtained value Li that has come to the variable node %₀. However, at the inverse transition from the check nodes to the variable nodes in the same first decoding iteration, the LLR value of the edge e₀₂ is now defined by the LLR values “carried” by the edges e₂₂ and e₄₂. Therefore, the two edges e₂₂ and e₄₂ come to a first (from above) node in a first check layer 204 of the Tanner-based DNN 200. These LLR values are then used together with the initially obtained LLR values in the next (second) decoding iteration, which starts with a second variable layer 206 of the Tanner-based DNN 200, and similar considerations are applied to build the rest relationships between the nodes. In general, each pair “variable layer + check layer” corresponds to one decoding iteration in the Tanner graph 100. A last layer 208 in the Tannerbased DNN 200, which is called an output mapping layer in FIG. 2, maps the LLR values of the last but one layer to a lesser number of nodes, which is equal to the number of the variable nodes x₀, x₁, x₂, x₃, x₄ in the Tanner graph 100, depending on the edges “carrying” the LLR values to these nodes.

The Tanner-based DNN, like the one shown in FIG. 2, may be trained on a zero-codeword only, thereby solving a dimensionality problem. Moreover, a number of trainable parameters (i.e. weights) for such a DNN equals to a number of probability messages passed over the Tanner graph. However, the Tanner-graph DNN has been previously tested mostly on short BCH codes. Moreover, as any DNN, the Tanner-based DNN suffers from the vanishing gradient problem, which makes its training time very long. On top of that, any communication system operates over a set of different codes of different lengths and rates, so that optimal weights have to be stored for every coding scheme. All of this makes the use of the Tannergraph DNN resource-intensive.

There are also other existing DNN-based decoding algorithms for the ECCs, such as, for example, a syndrome-based approach and a graph partitioning approach. Both approaches may be based on using a properly configured DNN. The graph partitioning approach is applicable to the ECCs based on Plotkin or (U, U+V) construction (e.g., polar and Reed-Muller codes). The syndrome-based approach is applicable to any linear code, and its main benefits are the possibilities to train the DNN on a zero codeword and utilize the general DNN with fully connected layers. However, both the approaches demonstrate bad decoding performance when applied, for example, to the LDPC codes. The exemplary embodiments disclosed herein provide a technical solution that allows mitigating or even eliminating the above-sounded drawbacks peculiar to the prior art. In particular, the technical solution disclosed herein provides a graph-based list decoding algorithm with early list-size reduction. Contrary to the existing list decoding algorithms that perform iterative decoding on the whole list of codeword candidates for a received (noisy) codeword, the proposed algorithm involves reducing the size of the list of codeword candidates after a minimum number of decoding iterations and using machine-learning means to improve a rule for selecting the most relevant codeword candidate for the received codeword. Moreover, in the proposed algorithm, the iterative decoding is performed by using a graph representation (e.g., the Tanner graph) for an ECC by which the received codeword has been obtained. By so doing, it is possible to reduce the overall list-based decoding complexity and, consequently, achieve acceptable decoding performance for bigger sizes of the list of codeword candidates at lower computational costs. Moreover, the proposed list-based decoding architecture may efficiently be used for decoding codewords obtained by using different ECCs, including (but not limited thereto) the LDPC codes.

FIG. 3 shows a block diagram of an apparatus 300 for decoding a codeword in accordance with one exemplary embodiment. The apparatus 300 is intended to be used on the receiving side for the purpose of decoding the codeword obtained by using different ECCs, including (but not limited to) the LDPC codes. It should be again noted that due to the noisiness of any communication channel, the codeword comes to the receiving side with noise errors. Moreover, some other errors may also be present in the codeword, such, for example, as those caused by any interfering signals. Therefore, whenever the codeword is mentioned in the foregoing description, it should be construed as the codeword with these and/or any other errors.

Referring back to FIG. 3, the apparatus 300 comprises a processor 302 and a memory 304. The memory 304 stores processor-executable instructions 306 which, when executed by the processor 302, cause the processor 302 to receive initial bit estimations 308 for bits encoded in the codeword and use them to find the most relevant codeword candidate 310, as will be described below in more detail. It should be noted that the number, arrangement and interconnection of the constructive elements constituting the apparatus 300, which are shown in FIG. 3, are not intended to be any limitation of the present disclosure, but merely used to provide a general idea of how the constructive elements may be implemented within the apparatus 300. For example, the processor 302 may be replaced with several processors, as well as the memory 304 may be replaced with several removable and/or fixed storage devices, depending on particular applications. The processor 302 may be implemented as a CPU, general-purpose processor, singlepurpose processor, microcontroller, microprocessor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), complex programmable logic device, etc. It should be also noted that the processor 302 may be implemented as any combination of one or more of the aforesaid. As an example, the processor 302 may be a combination of two or more microprocessors.

The memory 304 may be implemented as a classical nonvolatile or volatile memory used in the modern electronic computing machines. As an example, the nonvolatile memory may include Read-Only Memory (ROM), ferroelectric Random-Access Memory (RAM), Programmable ROM (PROM), Electrically Erasable PROM (EEPROM), solid state drive (SSD), flash memory, magnetic disk storage (such as hard drives and magnetic tapes), optical disc storage (such as CD, DVD and Blu-ray discs), etc. As for the volatile memory, examples thereof include Dynamic RAM, Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Static RAM, etc.

The processor-executable instructions 306 stored in the memory 304 may be configured as a computer-executable code which causes the processor 302 to perform the aspects of the present disclosure. The computer-executable code for carrying out operations or steps for the aspects of the present disclosure may be written in any combination of one or more programming languages, such as Java, C++, or the like. In some examples, the computerexecutable code may be in the form of a high-level language or in a pre-compiled form, and be generated by an interpreter (also pre-stored in the memory 304) on the fly.

FIG. 4 shows a flowchart of a method 400 for decoding a codeword in accordance with one exemplary embodiment. The method 400 substantially describes the operation of the apparatus 300. Therefore, each of the steps of the method 400 is performed by the processor 302 of the apparatus 300. The method 400 starts with a step S402, in which the processor 302 receives the initial (rough) bit estimations 308 for bits of the codeword. Each bit estimation is indicative of a probable value of corresponding one of the bits of the codeword. The initial bit estimations 308 may be represented by LLR values or any other values which could be reduced to probability. The initial bit estimations 308 are provided to the processor 302 from a transceiver residing on the receiving side (in particular, they are obtained upon demodulating the received information signal carrying the codeword). Next, the method 400 proceeds to a step S404, in which the processor 302 generates a list of codeword candidates corresponding to the codeword based on the initial bit estimations 308. After that, a step S406 is initiated, in which the processor 302 obtains intermediate bit estimations for the list of codeword candidates by performing at least one iteration of iterative decoding on each codeword candidate of the list of codeword candidates. The iterative decoding may be performed by a first machine-learning means using a graph representation for the ECC by which the codeword has been obtained. The iterative decoding may represent min-sum iterative decoding. The minimum number of the iterations performed in the step S406 depends on particular applications (for example, acceptable or allowable computational costs). Further, the method 400 proceeds to a step S408, in which the processor 302 reduces the list of codeword candidates by discarding therefrom codeword candidates less relevant to the codeword. To perform said reducing, the processor 302 may use a second machine-learning means using the intermediate bit estimations as inputs. The second machine-learning means is different from the first machine-learning means. The method 400 then goes on to a step S410, in which the processor 302 obtains final bit estimations for the reduced list of codeword candidates by performing the complete iterative decoding on each codeword candidate of the reduced list of codeword candidates. The method 400 eventually ends up with a step S412, in which the processor 302 selects, in the reduced list of codeword candidates, a most relevant codeword candidate for the codeword based on the final bit estimations. It should be noted that if the initial bit estimations 308 comprise, for example, the LLR values, then the intermediate and final bit estimations relate to the same type of the bit estimations (i.e. refined LLR values). By using the method 400, it is possible to implement the graph-based list decoding algorithm with early list-size reduction, thereby achieving acceptable decoding performance for big list sizes at low computational costs. Moreover, once trained, the first and second machine-learning means may use the same set of trained weights for multiple code lengths and rates (i.e. to apply a weight sharing procedure), thereby providing code length and rate adaptation.

In one exemplary embodiment, the steps S406 and S408 may be iteratively repeated before the step 410 to reduce the size of the list of codeword candidates in several steps. This may also influence the choice of the minimum number of the iterations performed in the step S406 of the method 400.

In one exemplary embodiment, the first machine-learning means may be implemented as a DNN using the Tanner graph representation as the graph representation for the ECC by which the codeword has been obtained. Such a Tanner-based DNN may have an architecture like the one shown in FIG. 2 and may use different activation functions for its constituent neurons, depending on the code type involved. For example, when applied to the LDPC codes, the Tanner-based DNN may use offset min-sum activation functions for the neurons. Moreover, due to the quasi-cyclic structure of the LDPC codes (i.e. their parity-check matrix H consists of circular matrices - circulants), it is possible to use the weight sharing procedure for the Tannerbased DNN. In particular, it becomes possible to decode the same codeword multiple times with different shifts. This kind of diversity may improve decoding performance. Another option is to embed this diversity into the Tanner-based DNN which leads to weight sharing, i.e. using a single weight per circulant rather than a single weight per each message to pass over the Tanner graph. In general, the weight sharing procedure provides the following benefits: the number of trainable parameters (i.e. weights) is reduced; the Tanner-based DNN is trained faster; and less memory is required to store the weights.

The Tanner-based DNN may be pre-trained with any loss function or with a linear combination of loss functions. In one exemplary embodiment, the DNN is pre-trained with a combined loss function that comprises the linear combination of a Bit Error Rate (BER) loss function and a Frame Error Rate (PER) loss function. In one exemplary embodiment, the combined loss function may be calculated with an adjustable parameter a (0 < a < 1), which allows switching between two loss function components as follows:

where i is the number of the decoding iteration, i^^ombined j_s the combined loss function, L^EER is the FER loss function, and L^EER is the BER loss function. The training procedure continues until the combined loss function is minimized. In another exemplary embodiment, the weight sharing procedure may avoid the need to adjust the parameter a of the combined loss function, so that the Tanner-based DNN may be trained successfully with a = 1.

FIG. 5 shows a block diagram of a Tanner-based DNN 500 in accordance with one exemplary embodiment. Similar to the Tanner-based DNN 200, the Tanner-based DNN 500 comprises the same types of layers: an input mapping layer 502, check layers 504-1 - 504-4, variable layers 506-1 - 506-3, and an output mapping layer 508. The input mapping layer 502 receives the initial bit estimations 308, and the output mapping layer 508 outputs refined bit estimations 510 (which are represented by the above-mentioned intermediate or final bit estimations, depending on which of the steps S406 and S410 of the method 400 is performed at the moment). The check layers 504-1 - 504-4 and the variable layers 506-1 - 506-3 alternate with each other between the input mapping layer 502 and the output mapping layer 508. It should be noted that the number of the check and variable layers is given for illustrative purposes only, and may be replaced with any other number depending on particular applications (for example, based on a required number of decoding iterations). As shown in FIG. 5, the Tannerbased DNN 500 further comprises residual connections 512-1 and 512-2 between the variable layers 506-1 - 506-3, which are provided due to the vanishing gradient problem - as more layers using certain activation functions are added to an NN, gradients of a loss function approaches zero, making the NN hard to train. Thus, by using the residual connections 512-1 and 512-2, it is possible to efficiently confront the vanishing gradient problem and significantly speed-up the training procedure of the Tanner-based DNN 500. Again, the number and arrangement of the residual connections are given for illustrative purposes only. In some other exemplary embodiments, the Tanner-based DNN 500 may comprise only one or more than two residual connections.

As follows from FIG. 5, the residual connections 512-1 and 512-2 are applied to the variable layer 506-1 - 506-3. Let

be an output vector of a given variable layer at decoding iteration /. The output vector may be constructed as a linear combination of output vectors from multiple previous variable layers:

Here L is the depth of the residual connections (L = 2 for the Tanner-based DNN 500 shown in FIG. 5), are the trainable weights for the residual connections, w are the trainable weights of the variable layer which represents an arrival point for the residual connections (every (L + l)-th variable layer), e_v is the set of edges for that variable node, e- is the set of edges coming to that variable node, and L_c is the set of messages (bit estimations) outgoing from the check node arranged after that variable node.

Thus, the Tanner-based DNN architectures may be modified by different residual-connection schemes. These modifications may be obtained based on the following criteria: the number of the previous variable layers (i.e. the depth L) that is taken into consideration when calculating the residual connections; the arrangement of the residual connections - It could be calculated at every iteration of the iterative decoding, or the whole set of the iterations of the iterative decoding may be split into several subsets.

FIG. 6 shows a block diagram of a Tanner-based DNN 600 in accordance with one other exemplary embodiment. To avoid overloading the figure, the input and output mapping layers are not shown, but they are implied to be included in the Tanner-based DNN 600 in the same or similar manner as it is done for the input mapping layer 502 and the output mapping layer 508 of the Tanner-based DNN 500. The Tanner-based DNN 600 comprises check layers 602- 1 - 602-7 and variable layers 604-1 - 604-6 which alternate with each other. The Tannerbased DNN 600 further comprises residual connections 606-1 - 606-4 between the variable layers 604-1 - 604-6, with the residual connections being arranged without linear dependency. The number of the check and variable layers and the number of the residual connections are given for illustrative purposes only and may change depending on particular applications.

FIG. 7 shows a block diagram of a Tanner-based DNN 700 in accordance with one more other exemplary embodiment. To avoid overloading the figure, the input and output mapping layers are not shown, but they are implied to be included in the Tanner-based DNN 700 in the same or similar manner as it is done for the input mapping layer 502 and the output mapping layer 508 of the Tanner-based DNN 500. The Tanner-based DNN 700 comprises check layers 702- 1 - 702-7 and variable layers 704-1 - 704-6 which alternate with each other. The Tannerbased DNN 700 further comprises residual connections 706-1 - 706-8 between the variable layers 704-1 - 704-6, with the residual connections being arranged with linear dependency. Again, the number of the check and variable layers and the number of the residual connections are given for illustrative purposes only and may change depending on particular applications.

Referring back to FIG. 4, the second machine-learning means receives, as the inputs, the intermediate bit estimations obtained by using the first machine-learning means (for example, the Tanner-based DNN 500) in the step S406 of the method 400, and uses them to provide the reduced list of codeword candidates in the step S408 of the method 400. To do this, the second machine-learning means may also be implemented as a DNN in one exemplary embodiment. Let us call it a list reducing DNN to avoid any confusion with the Tanner-based DNN. The list reducing DNN may comprise dense layers configured such that the list reducing DNN outputs a vector of scores for the list of codeword candidates. The vector of scores is then used by the processor 302 to discard the codeword candidates less relevant to the target codeword.

To improve the performance of the list reducing DNN, one needs to find the most relevant inputs or key features that may be used to define proper codeword candidates among the list of codeword candidates. For this purpose, the list reducing DNN may use the so-called soft syndrome that has turned out to be the most powerful feature for reducing the list size, especially for short ECCs. The soft syndrome is calculated as follows. Suppose that the initial bit estimations 308 are represented by LLR values, and let us multiply the vector constituted by the LLR values (hereinafter - the LLR vector) element-wise by the parity-check matrix H of the ECC involved in the method 400. After this, all non-zero elements at every row of a resulting matrix will be transformed as follows: si = 2 arctanh tanli \ 3

Here, is the elements of the parity-check matrix, L,- is the element of the LLR vector, and s_t is the element of a soft-syndrome vector. The physical meaning of the value s₍ is the logarithm of the probability ratio between events denoting that i-th parity check is satisfied versus not satisfied. The higher the sum

the higher the probability that the whole decoding process converges to some valid codeword candidate. The list reducing DNN demonstrates the performance similar to the soft-syndrome sum deterministic criterion (in terms of a list reduction error). As an alternative to the soft syndrome, it is possible to use scoring based on the dynamics of saturated LLR values (i.e. those LLR values that are assigned with deliberately reliable values) or use hard decisions for Lj instead of the LLR values.

Once the reduced list of codeword candidates is obtained, it is then used in the step S410 of the method 400 to obtain the final bit estimations. For this purpose, the same Tanner-based DNN as the one used to obtain the intermediate bit estimations (e.g., the Tanner-based DNN 500, 600 or 700) is used, but now the iterative decoding is performed completely. After that, the final bit estimations are used by the processor 302 to determine the most relevant codeword candidate for the target codeword.

It should be noted that the Tanner-based DNN 500, 600 and 700 used in the method 400 are adaptive in the sense that each of them allows using a single set of trained weights for multiple code lengths and rates. The goal of adaptive training is to derive the training procedure that will provide an improvement in the performance of multiple coding constructions by training just a single DNN. This will provide a reduction of memory consumption when operating at multiple code rates or code lengths. The idea of rate adaptation is to cover a wide range of code rates by a limited number of trained weights. In particular, by enabling or disabling some weights for different batches during training, one can simultaneously train the Tanner-based DNN that will perform well for different code rates. In turn, the idea of length adaptation is to train the Tanner-based DNN and use multiple code lengths optimally, keeping the code rate constant. In general, there are three main parameters that may be adjusted: a factor value (different code lengths may be obtained while the code rate remains fixed), a number of rows in the parity-check matrix H (the code length remains the same while the code rate may be adjusted), a number of information bits in the parity-check matrix H, which adjusts the code length.

With reference to the LDPC codes, their parity-check matrices have a nested structure. This means that different codes can be generated from the same parity-check matrix by the changing the factor value (circulant size) and bordering the parity-check matrix for the lowest code rate. As soon as a single weight is used for a single circulant, the overall weight structure remains the same for different codes. In this case, the length and rate adaptation may be performed as follows.

Length adaptation: Different factor values may produce different code lengths, but the code rate and the weight structure of the corresponding Tanner-based DNN will remain the same. The weights remain acceptable for a different circulant size, but the residual connections should be disabled in this case. At the same time, the combined loss function still allows training the Tanner-based DNN efficiently without occurring the vanishing gradient problem.

Rate adaptation: A different number of rows in the parity-check matrix may produce different code rates, while keeping the code length constant. The lowest code rate may be treated as a higher code rate plus additional parity checks.

FIG. 8 explains the above-described rate adaptation by means of an example of two paritycheck matrices. In particular, both the matrices are assumed to have a number of information bits k = 120. The parity-check matrices differ from each other in the code rates: one with a lower code rate R = 0.5, and another with a higher code rate R = 0.75. It should be noted that, in this case, each code rate is calculated as R = (N — M)/(N - np~), where N is the number of columns in the parity-check matrix, M is the number of rows in the parity-check matrix, np is the number of punctured nodes. Node puncturing is used in the 5G LDPC codes to improve the decoding performance and implies that some variable nodes of the parity-check matrix (more precisely, its Tanner graph representation) are declared as "punctured" and do not take part in the transmission of the codeword bits corresponding to these variable nodes. On the receiving side, the punctured bits are interpreted as erasures. For the matrices shown in FIG. 8, there are 40 punctured variable nodes, so that the code rates are obtained as follows: higher code rate R = (200-80)7(200-40) = 0.75, and lower code rate R = (280-160)7(280-40) = 0.5. The dots shown in FIG. 8 correspond to nonzero elements in each of the parity-check matrices. Since the lower code rate may be treated as the higher code rate plus additional parity checks, the weights for the Tanner-based DNN corresponding to the higher code rate may be selected as a subset of the lower code rate configuration. This kind of rate reuse is the rate adaptation. The method 400 allows a joint training procedure at the rate adaptation. As mentioned above, one can treat the lower code rate as the higher code rate plus additional parity checks. Given the proper weights are set to zero, it is possible to obtain the Tanner-based DNN corresponding to the higher code rate.

As an example, let us consider the case of the short LDPC codes with k = 120 information bits, the circulant size equal to 20 and the code rate varying between 0.75 and 0.2 (i.e. the paritycheck matrix has 4-26 rows). Further, we train the Tanner-based DNN for the lowest code rate (i.e. 0.2, which corresponds to 26 rows of the parity-check matrix) and try to reuse an obtained set of weights for higher code rates. We repeat the same procedure, but next time starting from the parity-check matrix having 25 rows, then 24 rows, and so on. This approach gives us the performance of different sets of weights for different code rates. We may now make a cutoff for a certain FER level and calculate the performance degradation due to the weight reuse, comparing with the set of weights obtained directly for the higher code rate. The final rate adaptation scheme for the considered case is described in two tables 900 and 1000 shown in FIG. 9 and 10, respectively. The table 900 lists signal-to-noise ratio (SNR) values at which the FER equal to 1% is achieved, while the table 1000 lists performance losses (in dB) corresponding to the SNR values. In each of the tables 900 and 1000, columns and rows are numbered from 4 to 26 in accordance with the code rates under consideration. The rows indicate where the weights have been taken from. For example, the 26th row means that a proper set of weights has been taken from code rate 0.2. The columns indicate at which code rate the set of weights has been reused. From the table 1000, it is easy to derive the sets of weights which should be used to provide acceptable performance losses (for example, if the acceptable performance losses are those less than 0.09 dB, then it is recommended to use the sets of weights corresponding to the light-gray cells in the table 1000). At the same time, those table cells which are dark-grayed in the tables 900 and 1000 correspond to the set of weights which are not good to use (i.e. correspond to the weight reuse scenarios with significant performance losses).

FIG. 11-13 show comparison results of decoding performances provided by the method 400 and the existing decoding algorithms for the case of the LDPC codes. In particular, the existing decoding algorithms are represented by the BP algorithm and the scaled min-sum algorithm. As for the decoding performances, they are represented by dependences of a Block Error Rate (BLER) on an SNR for different code lengths and rates. FIG. 11-13 also show the Polyanskiy bound calculated under a normal approximation assumption. It should also be noted that the Tanner-based DNN used for comparison had residual connections from four previous layers, and the weight sharing approach was utilized. This allowed training the Tanner-based DNN with the FER-targeted loss function. Multiple random initial points were also tested during the training procedure.

More specifically, the dependences shown in FIG. 11 are obtained for the code length N = 240 and the code rate R = 0.5 by using 20 decoding iterations in the method 400, 20 and 50 decoding iterations for the BP algorithm, and 20 decoding iterations for the scaled min-sum algorithm. One can see that 20 decoding iterations performed by the Tanner-based DNN in the method 400 provide better results than 20 decoding iterations in the BP algorithm and the scaled min-sum algorithm, and get closer, in terms of the decoding performance, to 50 decoding iterations in the BP algorithm.

FIG. 12 and 13 show the BLER-SNR dependences obtained by using the same decoding iterations in each of the method 400, the BP algorithm, and the scaled min-sum algorithm but for other code lengths and rates: N = 600 and R = 0.2 for FIG. 12, and N = 160 and R = 0.75 for FIG. 13. According to the BLER-SNR dependences shown in FIG. 12, the method 400 again outperforms the BP algorithm and the scaled min-sum algorithm with 20 decoding iterations and gets closer, in terms of the decoding performance, to the BP algorithm with 50 decoding iterations. According to the BLER-SNR dependences shown in FIG. 13, the method 400 even outperforms, in terms of the decoding performance, the BP algorithm with 50 decoding iterations.

FIG. 14 shows a block diagram of a UE 1400 for wireless communications in accordance with one exemplary embodiment. The UE 1400 may refer to a mobile device, a mobile station, a terminal, a subscriber unit, a mobile phone, a cellular phone, a smart phone, a cordless phone, a personal digital assistant (PDA), a wireless communication device, a laptop computer, a tablet computer, a gaming device, a netbook, a smartbook, an ultrabook, a medical device or medical equipment, a biometric sensor, a wearable device (for example, a smart watch, smart glasses, a smart wrist band), an entertainment device (for example, an audio player, a video player, etc.), a vehicular component or sensor, a smart meter/sensor, industrial manufacturing equipment, a global positioning system (GPS) device, an Internet-of-Things (loT) device, a machine-type communication (MTC) device, a group of Massive loT (MIoT) or Massive MTC (mMTC) devices/sensors, or any other suitable device configured to support wireless communications. In some example embodiments, the UE 1400 may refer to at least two collocated and inter-connected UEs thus defined.

Referring back to FIG. 14, the UE 1400 comprises a transceiver 1402 and the apparatus 300. The transceiver 1402 is configured to receive the codeword transmitted over a communication channel (for example, from another UE). After that, the transceiver 1402 is configured to obtain the initial bit estimations 308 for the bits of the codeword and provide the initial bit estimations 308 to the apparatus 300 for decoding the codeword in accordance with the method 400. It should be noted that the number, arrangement and interconnection of the constructive elements constituting the UE 1400, which are shown in FIG. 14, are not intended to be any limitation of the present disclosure, but merely used to provide a general idea of how the constructive elements may be implemented within the UE 1400. In one other exemplary embodiment, the transceiver 1402 may be implemented as two individual devices, with one for a receiving operation and another for a transmitting operation. Irrespective of its implementation, the transceiver 1402 is implied to be capable of performing different operations required to perform the reception and transmission of different signals, such, for example, as signal modulation/demodulation, encoding/decoding, etc.

FIG. 15 shows a block diagram of a BS 1500 for wireless communications in accordance with one exemplary embodiment. The BS 1500 may refer to a node of a Radio Access Network (RAN), such as a Global System for Mobile Communications (GSM) RAN (GRAN), a GMS EDGE RAN (GERAN), a Universal Mobile Telecommunications System (UMTS) RAN (UTRAN), a Long-Term Evolution (LTE) UTRAN (E-UTRAN), Next-Generation (NG) RAN. The BS 1500 may be used to connect the UE 1400 to a Data Network (DN) through a Core Network (CN), and is referred to as a base transceiver station (BTS) in terms of the 2G communication technology, a NodeB in terms of the 3G communication technology, an evolved NodeB (eNodeB) in terms of the 4G communication technology, and a gNodeB (gNB) in terms of the 5G communication technology and New Radio (NR) air interface.

Referring back to FIG. 15, the BS 1500 comprises a transceiver 1502 and the apparatus 300. The transceiver 1502 is configured to receive the codeword transmitted over a communication channel (for example, from another UE or BS). After that, the transceiver 1502 is configured to obtain the initial bit estimations 308 for the bits of the codeword and provide the initial bit estimations 308 to the apparatus 300 for decoding the codeword in accordance with the method 400. It should be noted that the number, arrangement and interconnection of the constructive elements constituting the BS 1500, which are shown in FIG. 15, are not intended to be any limitation of the present disclosure, but merely used to provide a general idea of how the constructive elements may be implemented within the BS 1500. In one other exemplary embodiment, the transceiver 1502 may be implemented in the same or similar manner as the transceiver 1402 discussed above.

It should be noted that each step or operation of the method 400, or any combinations of the steps or operations, can be implemented by various means, such as hardware, firmware, and/or software. As an example, one or more of the steps or operations described above can be embodied by processor executable instructions, data structures, program modules, and other suitable data representations. Furthermore, the executable instructions which embody the steps or operations described above can be stored on a corresponding data carrier and executed by the processor 302. This data carrier can be implemented as any computer- readable storage medium configured to be readable by said at least one processor to execute the processor executable instructions. Such computer-readable storage media can include both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, the computer-readable media comprise media implemented in any method or technology suitable for storing information. In more detail, the practical examples of the computer-readable media include, but are not limited to information-delivery media, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic tape, magnetic cassettes, magnetic disk storage, and other magnetic storage devices.

Although the exemplary embodiments of the present disclosure are described herein, it should be noted that any various changes and modifications could be made in the embodiments of the present disclosure, without departing from the scope of legal protection which is defined by the appended claims. In the appended claims, the word “comprising” does not exclude other elements or operations, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

CLAIMS An apparatus for decoding a codeword, comprising: at least one processor; and a memory coupled to the at least one processor and storing processor-executable instructions, wherein the codeword has bits encoded therein by using an Error Correcting Code (ECC), and wherein the at least one processor is configured, when executing the processor-executable instructions, to: receive initial bit estimations for the bits of the codeword, each initial bit estimation being indicative of a probable value of corresponding one of the bits of the codeword; generate a list of codeword candidates corresponding to the codeword based on the initial bit estimations; obtain intermediate bit estimations for the list of codeword candidates by performing at least one iteration of iterative decoding on each codeword candidate of the list of codeword candidates using a graph representation for the ECC; reduce the list of codeword candidates by discarding therefrom codeword candidates less relevant to the codeword using the intermediate bit estimations; obtain final bit estimations for the reduced list of codeword candidates by performing the complete iterative decoding on each codeword candidate of the reduced list of codeword candidates; and select, in the reduced list of codeword candidates, a most relevant codeword candidate for the codeword based on the final bit estimations. The apparatus of claim 1 , wherein the ECC used to encode the bits into the codeword is a Low-Density Parity-Check code. The apparatus of claim 1 or 2, wherein the iterative decoding is performed by a first machine-learning means, wherein the first machine-learning means comprises a Deep Neural Network (DNN) using a Tanner graph representation as the graph representation for the ECC.

23

4. The apparatus of claim 3, wherein the DNN is pre-trained with a combined loss function, the combined loss function comprising a linear combination of a Bit Error Rate loss function and a Frame Error Rate loss function.

5. The apparatus of claim 3 or 4, wherein the DNN comprises multiple layers of neurons, and wherein at least two of the multiple layers of neurons comprise at least one residual connection therebetween.

6. The apparatus of claim 5, wherein the at least two layers of neurons comprise multiple residual connections therebetween, and the multiple residual connections are configured with linear and/or non-linear dependency.

7. The apparatus of any one of claims 1 to 6, wherein the reducing of the list of codeword candidates is performed by a second machine-learning means, wherein the second machine-learning means is configured to use a soft syndrome feature to define, among the list of codeword candidates, codeword candidates that are more relevant for the codeword.

8. The apparatus of any one of claims 1 to 7, wherein the iterative decoding comprises min- sum iterative decoding.

9. A method for decoding a codeword, the codeword having bits encoded therein by using an Error Correcting Code (ECC), wherein the method comprises: receiving initial bit estimations for the bits of the codeword, each initial bit estimation being indicative of a probable value of corresponding one of the bits of the codeword; generating a list of codeword candidates corresponding to the codeword based on the initial bit estimations; obtaining intermediate bit estimations for the list of codeword candidates by performing at least one iteration of iterative decoding on each codeword candidate of the list of codeword candidates using a graph representation for the ECC; reducing the list of codeword candidates by discarding therefrom codeword candidates less relevant to the codeword using the intermediate bit estimations; obtaining final bit estimations for the reduced list of codeword candidates by performing the complete iterative decoding on each codeword candidate of the reduced list of codeword candidates; and selecting, in the reduced list of codeword candidates, a most relevant codeword candidate for the codeword based on the final bit estimations.

10. The method of claim 9, wherein the ECC used to encode the bits into the codeword is a Low-Density Parity-Check code.

11 . The method of claim 9 or 10, wherein the iterative decoding is performed by a first machinelearning means, wherein the first machine-learning means comprises a Deep Neural Network (DNN) using a Tanner graph representation as the graph representation for the ECC.

12. The method of claim 11 , wherein the DNN is pre-trained with a combined loss function, the combined loss function comprising a linear combination of a Bit Error Rate loss function and a Frame Error Rate loss function.

13. The method of claim 11 or 12, wherein the DNN comprises multiple layers of neurons, and wherein at least two of the multiple layers of neurons comprise at least one residual connection therebetween.

14. The method of claim 13, wherein the at least two layers of neurons comprise multiple residual connections therebetween, and the multiple residual connections are configured with linear and/or non-linear dependency.

15. The method of any one of claims 9 to 14, wherein the reducing of the list of codeword candidates is performed by a second machine-learning means, wherein the second machine-learning means is configured to use a soft syndrome feature to define, among the list of codeword candidates, codeword candidates that are more relevant for the codeword.

16. The method of any one of claims 9 to 15, wherein the iterative decoding comprises min- sum iterative decoding.

1 / . A computer program product comprising computer code which, when executed by at least one processor, causes the at least one processor to perform the method according to any one of claims 9 to 16.

18. A user equipment for wireless communications, comprising: a transceiver; and the apparatus according to any one of claims 1 to 8, wherein the transceiver is configured to: receive the codeword transmitted over a communication channel, obtain the initial bit estimations for the bits of the codeword, and provide the initial bit estimations to the apparatus.

19. A base station for wireless communications, comprising: a transceiver; and the apparatus according to any one of claims 1 to 8, wherein the transceiver is configured to: receive the codeword transmitted over a communication channel, obtain the initial bit estimations for the bits of the codeword, and provide the initial bit estimations to the apparatus.