WO2019197043A1

WO2019197043A1 - Multi-composition coding for signal shaping

Info

Publication number: WO2019197043A1
Application number: PCT/EP2018/059574
Authority: WO
Inventors: Marcin PIKUS; Wen Xu
Original assignee: Huawei Technologies Duesseldorf Gmbh
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2019-10-17
Also published as: CN111670543A; CN111670543B

Abstract

The present invention relates to the technical field of signal shaping, specifically distribution matching. The invention presents a device for probabilistic signal shaping, and a transmitter or receiver employing said device. The device comprises a processor configured to receive a first input sequence of symbols, perform an encoding based on an arithmetic coding algorithm to map the first input sequence to a first output sequence of symbols, receive a second input sequence of symbols, and perform an encoding based on the same arithmetic coding algorithm to map the second input sequence to a second output sequence of symbols. The first and the second output sequences are encoded to have the same block length. Further, the first and the second output sequences have different compositions.

Description

MULTI-COMPOSITION CODING FOR SIGNAL SHAPING

TECHNICAL FIELD The present invention relates to the technical field of probabilistic signal shaping, specifically Distribution Matching (DM). The invention presents a device for probabilistic signal shaping, and a transmitter or receiver employing said device. The device encodes input sequences of symbols of a signal into output sequences of symbols of different composition. Thus, the device is configured to perform a multi-composition coding. The invention relates also to a corresponding coding method, and relates further to a base codebook including output sequences (codewords) of multiple different compositions.

BACKGROUND In order to achieve the capacity of a transmission channel, the channel input symbols need to have a certain probability distribution. For example, Gaussian distribution is required to achieve the capacity of the Additive White Gaussian Noise (AWGN) channel. However, in many practical systems, uniformly distributed channel input symbols are used, which causes a gap to the capacity. This loss is called the“shaping loss”, and can be up to 1 53dB on AWGN channels, if uniformly distributed channel input symbols are used.

Probabilistically Shaped Coded Modulation (PSCM) is a transmission scheme, which transmits symbols from the uniform Quadrature Amplitude Modulation (QAM) alphabet with non-uniform probabilities. Probabilistic Amplitude Shaping (PAS) is one implementation of PSCM. By means of this implementation, PSCM is able to mimic the optimal distribution and avoid the shaping loss (also referred to as obtaining“the shaping gain”) compared to the schemes, which use uniform distributed transmit symbols. The PAS scheme consists specifically of a Shaping Encoder (ShEnc) and a Channel Encoder (ChEnc) at the transmitter side, and accordingly of a Channel Decoder (ChDec) followed by a Shaping Decoder (ShDec) at the receiver side. The scheme is shown in FIG. 8, and brings the following advantages: Firstly, the ShEnc transforms uniformly distributed bits of an input message to a non-uniform distribution, such that channel input symbols are distributed to approach the capacity achieving distribution. Secondly, by changing the parameters of the ShEnc, the transmitter can adjust the rate of the transmission, without changing the parameters of the Forward Error Correction (FEC) code. These two aspects are different compared to a conventional coded modulation scheme (such as Bit- Interleaved Coded Modulation (BICM), where there is no distribution matching to optimize the distribution of the channel input symbols, and where the rate matching is done by adjusting the parameters of the FEC code.

The key part of PAS system is the ShEnc. In general, the ShEnc aims to produce at the output a sequence of symbols (random variables) with a desired probability distribution given a sequence of symbols as an input (usually with uniformly probability distribution). For this reason, a ShEnc is sometimes referred to as a distribution matcher, and a ShDec is called an inverse distribution matcher or distribution dematcher. In this document, a ShEnc and a distribution matcher, and respectively a ShDec and an inverse distribution matcher are assumed to be identical, unless otherwise stated, i.e. these terms are used interchangeably.

The PAS system (see e.g.‘G. Bocherer et al,“Bandwidth efficient and rate-matched low- density parity-check coded modulation ,” IEEE Trans. Commun., vol. 63, no. 12, Dec 2015’) shown in FIG. 8 works as follows, wherein a sequence of n symbols is denoted

— n — n

(modeled as random variables) by A , i.e. A = A₁A₂—A_n:

0. Assume a transmission of a block of n_c symbols from 2^m—ASK (amplitude shift key) alphabets.

1. A sequence U of k_c uniformly distributed input bits enters the ShEnc.

—ⁿc

2. The ShEnc outputs a sequence A of n_c amplitudes with distribution P_A on the alphabets L = (1, 3, ( 2^m— 1)}.

3. Each amplitude is mapped, in particular independently, by a fixed mapping b_A, to a corresponding bit label of length m— 1.

4. The binary sequence, consisting of the concatenated bit labels,

( m— l)n_c bits is encoded by a systematic FEC encoder of rate R = (m— l)/m, i.e., for each amplitude one parity bit is produced.

_— (m-l)n_c

5. The binary sequence b\A) is mapped back to amplitudes using the reverse

-1 — —\ ⁿc

mapping b_A . The binary sequence b(SJ of n_c parity bits is mapped to sign symbols S via

n_c —n

6. The sequences A and S of n_c amplitudes and signs are multiplied element-wise and scaled by D to obtain the channel input symbols. DM is usually performed on a block-to- basis, i.e., the ShEnc maps a uniformly distributed input binary sequence of fixed length k_c to a sequence of fixed length n_c of symbols distributed according to a desired target probability distribution. The mapping should be one-to-one. Generally non-binary distribution matching is considered, where the input sequence is binary and the output sequence is non binary. It was shown that non-binary DM (with non-binary output sequence) can be performed by parallel binary DMs (with binary output sequences) and a mapper, see e.g. ‘M. Pikus and W. Xu, “ Bit-level probabilistically shaped coded modulation ,” IEEE Commun. Lett., vol. 21, no. 9, Sept

2017’. So far DM is performed by Constant Composition Distribution Matching (CCDM), see e.g. ‘P. Schulte, G. Bocherer,“ Constant Composition Distribution Matching” , IEEE Trans. Inf. Theory, vol. 62, no. 1, 2016’, or in the binary case by m-out-of-n codes or constant weight codes, see e.g.‘T. V. Ramabadran,“A coding scheme for m-out-of-n codes,” IEEE Trans. Commun., vol. 38, Aug 1990’. Note that in the binary case, CCDM reduces to m-out-of-n code.

DM can be considered as a coding scheme, where data sequences are encoded into codewords (output sequences) which have a specific distribution. DM can always be presented as a mapping from data sequences to codewords, e.g., see codebook for CCDM shown in FIG. 9 with an output length n = 4, and output probability of bit 1 P (1) = 0.25.

This small example shown in FIG. 9 could be implemented as a look up table. However, a ShEnc has a poor performance when the output sequence is short. Here in FIG. 9, k = 2 information bits are encoded into n = 4 output symbols with P( 1) = 0.25. This gives a coding rate of k/n = 0.5. If the output length is to be increased to e.g. n = 16, one would have ( ^⁶) = 1820 possible output sequences, from which 2'^{logz 182}°I = 1024 sequences can be used and can be labelled bijectively with binary inputs sequences of length k= Llog₂ 1820J = 10. The coding rate is k/n = 10/16 = 5/8. This is better than 0.5 for the ShEnc from the codebook shown in FIG. 9 (more details about CCDM can be found below). This small example of FIG. 9 accordingly illustrates that for longer output sequences, data can be encoded more efficiently into shaped sequences. However, for longer output sequences it is not possible to still implement the ShEnc by means of a look- up-table (since too much memory is used, as it is needed to store n2^k bits). Therefore efficient algorithms based on arithmetic coding are used in CCDM and m-out-of-n codes.

The following describes briefly how CCDM works. The description involves non-binary CCDM for a more general perspective. CCDM works block- wise, i.e., it takes a sequence of bits as an input and produces a sequence of symbols at the output. The output distribution P_A is emulated by outputting a sequence a^nc of n_c symbols of a certain type, i.e., the output sequence a^nc contains a fixed number of individual symbol cq from A.

Then, the empirical distribution of the symbols in the sequence is

where n_a. is the number of occurrences of the symbol a_L in the output sequence a^nc with å_{a E} , _A ⁿ _a- = ⁿc- The CCDM in this case works as follows:

0. Input parameters: P_A, n_c

1. Find an empirical distribution for the output sequence, which is close to the target distribution P_A . Finding an empirical distribution is equivalent to finding n_a., i = 1, ... , 2^m~1. E.g., a simple rounding can be performed n_a * n_cP_A(cii).

2. Calculate the number of sequences with the found empirical distribution P_A (a_;).

the multinomial coefficient.

3. Select input length k_c = log₂ /Vj , (where |xj denotes the floor function, i.e., the largest integer not greater than x ) as this is the maximum number of bits which can be used to label the sequences bijectively. Select randomly 2^kc sequences and define a one-to-one mapping between binary input sequences and the selected output sequences. An efficient implementation of the mapping based on arithmetic coding can be found in‘P. Schulte, G. Bocherer,“ Constant Composition Distribution Matching”, IEEE Trans. Inf. Theory, vol. 62, no. 1, 2016’.

A first specific example is now described. Assume L = {A, B, C, D } and the corresponding target probabilities P4 =

CCDM can match the empirical distribution of the output sequence exactly, i.e., no approximation is needed. In fact, CCDM looks for an empirical distribution P_A which minimizes the Kullback-Leibler (KL) divergence D (P_A \ \ P_a^), which is a function of two probability distributions P_A, P_A defined on the same alphabet L, i.e., D {P_A

: =

That is, there are 55440 sequences with the desired target distribution and length n_c = 12, i.e., there are 55440 sequences with 6 occurrences of A, 3 of B, 2 of C and 1 of D. Since a binary sequence is used to label the sequences, CCDM chooses randomly 2'^logz = 32768 sequences and labels each sequence with k_c = {log₂ iVj = 15 bits. The labeling can be done efficiently using arithmetic coding.

A second specific example is now described with respect to the codebook shown in FIG. 10. Now binary CCDM is considered (with o alphabet A = {0,1}), an output length n = 4, and the output probability P(l) = 0.5 (in fact in this case shaping is not needed but it’s an insightful example). The codebook in FIG. 10 is used by the CCDM (all possible output sequences). In fact the CCDM will only use 4 of the sequences as described above. This corresponds to 2 data bits encoded in one codeword. Notably, the codebook in FIG. 10 is referred to as a“base codebook”, and the codebook which is the actually used by the CCDM, i.e., the sub-codebook of size 4 of the base codebook, is referred to as an“actual codebook”. The actual codebook may be obtained by applying arithmetic-coding algorithm to the base codebook.

Disadvantageously, all the above-described approaches and examples, particularly of CCDM, show a low information rate (i.e. too litle data conveyed in the shaped sequences), and less flexibility in terms of the output target distribution (less choices of P (1) available).

SUMMARY

In view of the above-mentioned disadvantages, the present invention aims to improve the conventional approaches for DM, specifically improve on CCDM. The present invention has the objective to introduce a device and method for block-to-block DM, which allows for higher transmission rates (i.e. more information bits encoded in the output sequence of a ShEnc). Further, the device and method should enable more flexibility of the output target distribution.

The objective of the present invention is achieved by the solution provided in the enclosed independent claims. Advantageous implementations of the present invention are further defined in the dependent claims.

The present invention generally proposes DM with codebooks, which have more codewords (for the same output length n and the probability P(l)), and can be efficiently encoded by an arithmetic encoding algorithm (encoder/decoder) just as the codebooks used by CCDM. The invention is mainly described with respect to the binary case, but can be adopted to non-binary DM.

One main idea of the invention is to use Multi-Composition (MC) codebooks (base codebooks, or pruned and/or punctured base codebooks) for the signal shaping, i.e., in a device or method according to embodiments of the present invention, in order to realize the ShEnc and/or ShDec. In particular, MC codebooks with special properties (e.g. codewords containing all codewords of multiple compositions and ordered lexicographically) can be used efficiently with an arithmetic coding algorithm. Consequently, the present invention bases on the MC codebooks, as well as their construction, further on the generation of MC codewords from message symbols, particularly by using an arithmetic coding algorithm, and the generation of the message symbols from MC codewords, by using an arithmetic coding algorithm.

A first aspect of the present invention provides device for probabilistic signal shaping, the device comprising a processor configured to receive a first input sequence of symbols, perform an encoding based on an arithmetic coding algorithm to map the first input sequence to a first output sequence of symbols, receive a second input sequence of symbols, perform an encoding based on the same arithmetic coding algorithm to map the second input sequence to a second output sequence of symbols, wherein the first and the second output sequences are encoded to have the same block length; and wherein the first and the second output sequences have different compositions. Here, the same arithmetic coding algorithm means in particular the arithmetic coding algorithm with the same parameters (e.g. branching probabilities, output and input length, pruning parameters, etc).

At least for this document the term“output sequence” is also referred to as“codeword”. That is, these two terms can be used interchangeably. Codewords / output sequences can be based on a base codebook.

“Probabilistic signal shaping” is an encoding scheme with the aim to approach a certain probability distribution of the output sequence of symbols. The scheme should be able to decode the output sequence of symbols to obtain the data (input sequence of symbols) back.

A“block-length” of a sequence is the number of symbols in the sequence.

A“composition” of a sequence describes and/or comprises a tuple containing the numbers of occurrences in the sequence of the particular symbols from an alphabet. For instance, for a binary alphabet s = (0,1), the composition of a sequence 101 1 = (number of 0s, number of ls) = (1,3). The device of the first aspect uses the output sequences (codewords) of different compositions to map the different input sequences. Accordingly, the device is configured to use MC codewords from a MC codebook. Such a MC codebook is larger than a conventional CC codebook, but can be coded efficiently with the same arithmetic coding algorithm. Hence, the coding has the same complexity. However, the larger codebook allows for conveying more data bits per codeword. In addition, the MC codebook allows for more flexibility when choosing e.g. P(l).

In an implementation form of the first aspect, the processor is configured to compute the output sequences based on one or more parameters, wherein the parameters are in particular received as an input.

The one or more parameters can comprise a probability parameter, a block length, and/or a parameter for arithmetic coding. In this way, the coding becomes more efficient. The step of this implementation form can be performed during the arithmetic-coding based coding.

In a further implementation form of the first aspect, the different compositions are selected based on a characteristic of a channel (e.g. SNR, pathloss, fading) for transmission of the output sequence.

Thus, the codewords (output sequences) can be selected such that a transmission rate over the channel or a channel capacity is optimized.

In a further implementation form of the first aspect, the processor is further configured to, in particular lexicographically, order the output sequences, in particular based on a most- significant symbol. Here, the order of the sequences means that an input sequence with “higher value”, e.g.,‘1 G>H0’, is assigned with a codeword with“higher value”, e.g, ‘l000’>‘0l00’ (see e.g. FIG. 9).

The ordering enables an efficient coding of the input sequences into the output sequences. The step of this implementation form can be performed during the arithmetic-coding based coding. A“lexicographical order” describes an alphabetic order. It is a generalization of the way words are alphabetically ordered based on the alphabetical order of their component letters. This generalization consists primarily in defining a total order over the sequences of elements of a finite alphabet.

In a further implementation form of the first aspect, the processor is further configured to access a base codebook and/or parameters of the base codebook; process the base codebook and/or the parameters of the base codebook to obtain a pruned base codebook, and compute the output sequences from the pruned base codebook.

The pruning allows to obtain more general base codebooks.“Pruning” means removing a certain number of codewords from the top or bottom of the base codebook (top and bottom apply to lexicographical ordering). Removed codewords will never be used by the arithmetic encoding algorithm (encoder/decoder). The step of this implementation form can be performed during the arithmetic-coding based coding.

Obtaining a pruned base codebook comprises obtaining parameters of a pruned base codebook. An advantage thereof is that obtaining and/or processing a complete base codebook might be cumbersome if it has a large size. Computation of the codeword (i.e. an output sequence) can be based on the parameters of the base codebook or the pruned base codebook, and can be performed via arithmetic coding.

In a further implementation form of the first aspect, the processor is further configured to uniformly puncture the base codebook, the parameters of the base codebook and/or the pruned base codebook to obtain the output sequences.

The puncturing can select the final codewords, such that e.g. a coding efficiency or transmission rate is optimized.“Puncturing” means skipping (removing) certain codewords from the (pruned or not pruned) base codebook. The arithmetic encoding algorithm selects certain number of codewords from the (pruned or not pruned) based codebook. If the number of the codewords to select is smaller than the number of the codewords in the pruned based codebook, some of the codewords will be skipped (punctured). Puncturing is done usually uniformly on the (pruned or not pruned) base codebook and implicitly by arithmetic encoder/decoder. Puncturing can be done after or before pruning, when both puncturing and pruning are employed. In our preferred implementation, puncturing is done after pruning. The step of this implementation form can be performed during the arithmetic coding based coding. Uniform puncturing comprises puncturing by arithmetic coding.

In a further implementation form of the first aspect, the device is a shaping encoder, at least one of the first and the second input sequences has a uniform probability distribution, and at least one of the first and the second output sequences has a predefined target probability distribution.

A uniform probability distribution can also comprise an essentially uniform probability distribution. A target distribution comprises also distributions, which are essentially the same as a pre-defined target distribution. Accordingly, probabilistic signal encoding can be performed by the device.

In a further implementation form of the first aspect, the device is a shaping decoder, at least one of the first and the second output sequences has a uniform probability distribution, and at least one of the first and the second input sequences has a predefined target probability distribution.

Accordingly, probabilistic signal decoding can be performed by the device.

A second aspect of the present invention provides a transmitter comprising a device according to the first aspect or any of its implementation forms.

A third aspect of the present invention provides a receiver comprising a device according to the first aspect or any of its implementation forms.

Accordingly, the transmitter and receiver of the second and third aspect, respectively, enjoy all advantages and effects of the device of the first aspect.

A fourth aspect of the present invention provides a method for probabilistic signal shaping, comprising receiving a first input sequence of symbols, performing an encoding based on an arithmetic coding algorithm to map the first input sequence to a first output sequence of symbols, receiving a second input sequence of symbols, performing an encoding based on the same arithmetic coding algorithm to map the second input sequence to a second output sequence of symbols, wherein the first and the second output sequences are encoded to have the same block length, and wherein the first and the second output sequences have different compositions.

According to an implementation form of the fourth aspect, performing the encoding based on the arithmetic coding comprises mapping an input sequence of bits having a uniform probability distribution to an output sequence of bits having a determined target probability distribution, or an input sequence of bits having a determined target probability distribution to an output sequence of bits having a uniform probability distribution.

The method of the fourth aspect achieves all advantages and effects of the device of the first aspect. Implementation forms of the method can add further method steps corresponding to the additional features described for various the implementation forms of the device of the first aspect.

A fifth aspect of the present invention provides a computer program product comprising a program code for controlling a device according to the first aspect or any of its implementation forms, or for performing, when implemented on a computer, a method according to the fourth aspect or its implementation form.

A sixth aspect of the present invention provides a codebook, in particular for probabilistic signal shaping, comprising: a plurality of output sequences related to a first composition; a plurality of output sequences related to a second composition; wherein the codebook is in particular a base codebook or a pruned base codebook and/or a punctured base codebook.

Such a MC codebook can be larger than a conventional CC codebook, but can be coded efficiently with the same arithmetic coding algorithm. Hence, coding based on the MC codebook has the same complexity. However, the larger codebook allows for conveying more data bits per codeword. In addition, the MC codebook allows for more flexibility when choosing e.g. P(l).

According to an implementation form of the sixth aspect, the codebook comprises ordered output sequences, in particular lexicographically ordered output sequences, in particular according to the most-significant symbol. According to a further implementation form of the sixth aspect, the codebook comprises all possible output sequences of two or more, in particular of each, compositions.

Such a MC codebook allows most efficient coding by means of an arithmetic coding algorithm.

According to a further implementation form of the sixth aspect, the first and the second composition are adjacent compositions.

Such a codebook allows most efficient coding by means of an arithmetic coding algorithm.

A seventh aspect of the present invention provides, a shaping encoder that uses the codebook of any one of the preceding claims, wherein the shaping encoder is configured to execute an arithmetic coding based on the codebook.

An eighth aspect of the present invention provides a shaping decoder that uses the codebook of any one of the preceding claims, wherein the shaping decoder is configured to execute an arithmetic coding based on the codebook.

It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. BRIEF DESCRIPTION OF DRAWINGS

The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which

FIG. 1 shows a device according to an embodiment of the present invention.

FIG. 2 shows a MC base codebook according to an embodiment of the present invention.

FIG. 3 shows a MC base codebook according to an embodiment of the present invention. FIG. 4 shows a puncturing and/or pruning of a base codebook according to an embodiment of the present invention, in order to obtain a punctured and/or pruned base codebook according to embodiments of the present invention.

FIG. 5 compares CC codebooks (for CCDM) and MC codebooks according to embodiments of the present invention.

FIG 6 illustrates MC codebooks according to embodiments of the present invention with BL-DM (Bit-Level Distribution Matcher) in the PAS framework.

FIG. 7 shows a method according to an embodiment of the present invention.

FIG. 8 shows a conventional PAS system. FIG. 9 shows an exemplary conventional CC codebook.

FIG. 10 shows an exemplary conventional CC codebook. DETAIFED DESCRIPTION OF EMBODIMENTS

FIG. 1 shows a device 100 according to an embodiment of the present invention. The device 100 is configured for performing probabilistic signal shaping. The device 100 may be a ShEnc and/or may be included in a transmitter, or may be a ShDec and/or may be included in a receiver. The device 100 comprises at least one processor 101 configured to implement at least a coding as described below.

The processor 101 is configured to receive a first input sequence 102 of symbols. Further, the processor 101 is configured to perform an encoding based on an arithmetic coding algorithm 103 to map the first input sequence 102 to a first output sequence 104 of symbols.

The processor 101 is also configured to receive a second input sequence 105 of symbols. Further, the processor 101 is configured to perform an encoding based on the same arithmetic coding algorithm 103 (as used for encoding the first input sequence 102) to map the second input sequence 105 to a second output sequence 106 of symbols.

The first and the second output sequences 104, 106 are particularly encoded to have the same block length. Further, the first and the second output sequences 104, 106 have different compositions, i.e. the device 100 is configured to perform MC coding. The first input sequence 104 and the second output sequence 106 may both be codewords selected from the same base codebook, which is accordingly a MC codebook.

FIG. 2 shows a base codebook 200 according to an embodiment of the present invention, particularly a MC base codebook 200. The base codebook 200 can be used by the device 100 for performing signal shaping. However, the device 100 may also use a pruned base codebook 400 and/or punctured base codebook 401 according to embodiments of the present invention (see later FIG. 4), i.e. a codebook obtained by pruning and/or puncturing a base codebook 200 like the one in FIG. 2.

A base codebook 200 (and likewise a pruned and/or punctured base codebook 400, 401) of the invention generally includes a plurality of first output sequences 104 related to a first composition 201. The base codebook 200 shown in FIG. 2 shows a specific example in the binary case, and the first composition 201 is illustrated to be exemplarily (1, 2), i.e. the first output sequences 104 include one“0” and two“ls”. A base codebook 200 (and likewise a pruned and/or punctured base codebook 400, 401) of the invention generally includes further a plurality of second output sequences 106 related to a second composition 202. The second composition 201 of the base codebook 200 shown in FIG. 2 is illustrated to be exemplarily (2, 1), i.e. the second output sequences 106 include two“0s” and one“1”.

In the following, details of the present invention - as implemented by means of the device 100 shown in FIG. 1, particularly its processor 101, and the codebooks 200 (shown in FIG. 2), 400 or 401 - are described.

Assumed is a sequence S = s₁ ... s_n of n symbols from certain alphabet A = {<¾, . . . , <½ } . A“composition” of the sequence is a tuple containing the number of occurrences of each of the symbols from A in the sequence, i.e.:

(|{i: Si = ¾}!, \{i: S . = a₂}\, \{i: s_t = a_M}\) where |x| denotes the number of elements in x. For example, in the binary case A = (0, 1), M = 2. The sequence 1011 1 of length n = 5 has, for instance, a composition (1, 4) where “1” is the number of“0s” and“4” the number of“ls”.

Two compositions are said to be lexically adjacent, if they correspond to the sequences of the same length and they differ by one symbol. For instance, compositions (3,2) and (4,1) are lexically adjacent.

A set of compositions is adjacent, if for each composition in the set, there exist some other adjacent composition. For instance, the set of compositions {(5,0), (4,1), (3,2)} is adjacent, whereas the set of compositions {(5,0), (3,2)} is not adjacent.

In this sense, CCDM uses CC codebooks (base and actual codebooks are CC). That is, each output sequence has a fixed composition, i.e., has a fixed number of each of the symbols. In the binary case, such a codebook is also called constant-weight or m-out-of-n codebook, where n is block length and m is the Hamming weight of the codewords. For instance, the codebook shown in FIG. 10 is a CC codebook of weight 2. The device 100 according to an embodiment of the present invention involves uses a MC codebook 200, as e.g. shown in FIG. 2. That is, the device 100 operates generally speaking with a MC code (MCC). Concretely, codewords in such MC codebook are allowed, which have certain compositions 201, 202. Preferably, the codebooks have a special structure, since then there exist particular efficient algorithms for encoding data sequences into codewords, and decoding the codewords in another way around. In particular, the codebook may comprise all codewords of one or more compositions 201, 202, in particular of each composition 201, 202. The different compositions 201, 202 may be adjacent compositions. Thereby, the different compositions 201, 202 may be selected based on a characteristic of a channel for transmission of the codewords and/or a parameter received by the device 100 as an input.

In the binary case, a MCC is a multi-weight or [m^ m_yj-out-of-n code. Specifically, a [m_L, m_u] -out-of-n codeword has Hamming weights m_L, (m_L + 1), ... , hi_u in the base codebook 200.

FIG. 3 shows an example of a MC base codebook 200 for specifically P(0) = 0.5 and n = 4, i.e., [l,3]-out-of-4 codebook. The MC base codebook 200 when compared to the CC codebook shown in FIG. 10 has the same parameters P(0) = 0.5, n = 4, but has much more codewords (output sequences). Accordingly, a ShEnc using this codebook 200 can use 8 out of 13 codewords, which results in a transmission of 3 data bits per codeword (as opposed to 2 bits for CCDM from the example illustrated by FIG. 10).

When an arithmetic coding algorithm 103 is applied to any base codebook 200, only a specific 2^k codewords will be chosen. This 2^k codewords may be chosen from the base codebook 200. These codewords constitute the “actual codebook”, which will be effectively used by the device 100 (e.g. a ShEnc). The actual codebook may also be obtained before the coding from the base codebook 200. To this end - as shown in FIG. 4 - the device 100 may be configured to access the base codebook 200 (and/or parameters of the base codebook 200), and process the base codebook 200 (and/or the parameters of the base codebook 200) to obtain a pruned base codebook 400. Further, the device 100 may be configured to uniformly puncture the base codebook 200 (and/or the parameters of the base codebook) and/or the pruned base codebook 400 to obtain a punctured base codebook 401. For example, an actual codebook can be obtained according to the following steps:

1. Select the base codebook 200 (C), which is defined as a codebook containing output sequences 104, 106 of multiple compositions 201, 202, here particularly adjacent compositions 201, 202, e.g. example, a [0,2]-out-of-3 codebook. The output sequences 104, 106 (codewords) are preferably ordered lexicographically, e.g., according to 0 < 1 , Most Significant Bit (MSB) left. (The lexicographical ordering and different compositions 201, 202 allow to efficiently encode/decode the data into the codewords 104, 106 via arithmetic encoding/decoding.)

2. Select from C a sub-codebook 400 (C‘) containing e.g. M (adjacent) codewords, and re -index the codewords in C‘. That is. C is pruned by deleting some codewords at the beginning and/or at the end of C to result in the pruned codebook 400 (C‘). This step allows to obtain more codebooks.

Puncture C‘ uniformly, such that K codewords are left, which form the punctured MC codebook 401 (C“). That is, C“ contains the codewords from C‘ with indexes

This step selects the final codewords uniformly via arithmetic coding such that the actual codebook used in the device 100 is obtained. It can be assumed that the proposed punctured codebook 401 has codewords indexed 0, 1, ..., K- 1.

The codewords from the punctured codebook 401 defined above form the actual codebook in these examples of FIG. 4. The actual codebook can be efficiently encoded/decoded using low-complexity arithmetic coding/decoding algorithms based on arithmetic coding. Usually, K is selected to be a power of 2, since there are 2^k binary input data sequences. See the figures below for more examples. The MC codebooks 200, 400 and 401 of the present invention may each be used with efficient encoded/decoding based on arithmetic coding just as CCDM, but the codebooks 200, 400, 401 are larger than the codebooks used by CCDM. This results in higher information rate, as well as more flexibility when choosing the probability of symbols. Since the MC codebooks 200, 400, 401 of the invention improve the performance of CC codebooks, and have the same algorithms for encoding/decoding and complexity, the MC codebooks 200, 400, 401 can be used in any scenario where efficient DM is need. This may include coding schemes where the data should be encoded in biased sequences, e.g. PSCM.

In FIG. 5 can be seen that the [0,m]-out-of-n MC codebook is able to convey more information than the CC codebook as well as offers more choice of Pc(l). CC codebooks are only able to achieve Pc(l) = {0, 1/10, 2/10, 3/10, 4/10, 5/10} whereas the [0,m]-out-of- n MC codebook can achieve finer set of Pc(l). Recall that base [0,m]-out-of-n codebook contains all codewords with hamming weights 0, 1, 2, ...., m and CCDM base codebook has codewords only with weight m.

FIG. 6 presents the FER (Frame Error Rate) results with the proposed Multi-Composition Distribution Matching (MCDM), i.e. for a ShEnc using a MC codebook [0,m]-out-of-n), e.g. used as a building block for the Bit-Level Distribution Matcher (BL-DM), see e.g.‘M. Pikus and W. Xu,“Bit-level probabilistically shaped coded modulation,” IEEE Commun. Lett., vol. 21, no. 9, Sept 2017’. The MCDM replaces the inner CCDMs in the BL-DM. The results were obtained for 256-QAM modulation and WiMax LDPC code of length 576 and rate 5/6. Simulation was performed for three transmission rates 1.8, 2.8, and 5.5 b/CU. The gain of the proposed solution for FER= 10-3 over the BL-DM with CCDMs was 0.0 ldB, 0.2dB, and 0.3dB, respectively. The higher gain for higher transmission rates can be explained by FIG. 5. Higher transmission rates have less biased bit-level distributions. In FIG. 6, the gain of the MCDM over the CCDM is higher for less biased distributions. It can be concluded that the gain is also higher for the higher modulation order (sum of gains for each bit-level).

FIG. 7 shows a method 700 according to an embodiment of the present invention. The method 700 is particularly configured for probabilistic signal shaping. The method 700 may be carried out by the device 100 shown in FIG. 1, particularly implemented on the processor 101. The method 700 may also be carried out by a transmitter or receiver including the device 100, or a ShEnc or ShDec comprising the device 100. The method 700 includes a step 701 of receiving a first input sequence 102 of symbols. Further, the method 700 includes a step 702 of performing an encoding based on an arithmetic coding algorithm 103 to map the first input sequence 102 to a first output sequence 104 of symbols. Further, the method 700 includes a step 703 of receiving a second input sequence 105 of symbols. Further, the method 700 includes a step 704 of performing an encoding based on the same arithmetic coding algorithm 103 (as in step 702) to map the second input sequence 105 to a second output sequence 106 of symbols. Thereby, the first and the second output sequences 104, 106 are encoded to have the same block length, and the first and the second output sequences 104, 106 have different compositions.

The present invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word“comprising” does not exclude other elements or steps and the indefinite article“a” or“an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

1. Device (100) for probabilistic signal shaping, the device (100) comprising a processor (101) configured to

receive a first input sequence (102) of symbols,

perform an encoding based on an arithmetic coding algorithm (103) to map the first input sequence (102) to a first output sequence (104) of symbols,

receive a second input sequence (105) of symbols,

perform an encoding based on the same arithmetic coding algorithm (103) to map the second input sequence (105) to a second output sequence (106) of symbols,

wherein the first and the second output sequences (104, 106) are encoded to have the same block length; and

wherein the first and the second output sequences (104, 106) have different compositions.

2. Device (100) according to claim 1, wherein the processor (101) is configured to compute the output sequences based on one or more parameters, wherein the parameters are in particular received as an input.

3. Device (100) according to one of the preceding claims, wherein

the different compositions are selected based on a characteristic of a channel for transmission of the output sequences (104, 106).

4. Device (100) according to one of the preceding claims, wherein the processor (101) is further configured to,

in particular lexicographically, order the output sequences (104, 106), in particular based on a most-significant symbol.

5. Device (100) according to one of the preceding claims, wherein the processor (101) is further configured to

access a base codebook (200) and/or parameters of the base codebook (200); process the base codebook (200) and/or the parameters of the base codebook (200) to obtain a pruned base codebook (400), and

compute the output sequences (104, 106) from the pruned base codebook (400).

6. Device (100) according to claim 5, wherein the processor (101) is further configured to

uniformly puncture the base codebook (200), the parameters of the base codebook (200) and/or the pruned base codebook (400) to obtain the output sequences (104, 106).

7. Device (100) according to one of the claims 1 to 6, wherein

the device (100) is a shaping encoder,

at least one of the first and the second input sequences (102, 105) has a uniform probability distribution, and

at least one of the first and the second output sequences (104, 106) has a predefined target probability distribution.

8. Device (100) according to one of the claims 1 to 7, wherein

the device (100) is a shaping decoder,

at least one of the first and the second output sequences (104, 106) has a uniform probability distribution, and

at least one of the first and the second input sequences (102, 105) has a predefined target probability distribution.

9. Transmitter comprising a device (100) according to one of the claims 1 to 7.

10. Receiver comprising a device (100) according to one of the claims 1 to 6 and 8.

11. Method for probabilistic signal shaping, comprising

receiving a first input sequence (102) of symbols,

performing an encoding based on an arithmetic coding algorithm (103) to map the first input sequence (102) to a first output sequence (104) of symbols,

receiving a second input sequence (105) of symbols,

performing an encoding based on the same arithmetic coding algorithm (103) to map the second input sequence (105) to a second output sequence (106) of symbols

wherein the first and the second output sequences (104, 106) are encoded to have the same block length, and wherein the first and the second output sequences (104, 106) have different compositions.

12. Method according to claim 11, wherein performing the encoding based on the arithmetic coding algorithm (103) comprises mapping

an input sequence (102, 105) of bits having a uniform probability distribution to an output sequence (104, 106) of bits having a determined target probability distribution, or an input sequence (102, 105) of bits having a determined target probability distribution to an output sequence (104, 106) of bits having a uniform probability distribution.

13. Computer program product comprising a program code for controlling a device (100) according to one of the claims 1 to 8, or for performing, when implemented on a computer, a method according to claim 11 or 12.

14. Codebook (200, 400, 401), in particular for probabilistic signal shaping, comprising:

- a plurality of first output sequences (104) related to a first composition (201);

- a plurality of second output sequences (106) related to a second composition

(202);

wherein the codebook (200, 400, 401) is in particular a base codebook (200) or a pruned base codebook (400) and/or a punctured base codebook (401).

15. The codebook (200, 400, 401) according to the preceding claim, comprising ordered output sequences (104, 106), in particular lexicographically ordered output sequences (104, 106), in particular according to the most-significant symbol.

16. The codebook (200, 400, 401) according to one of the preceding claims, comprising all possible output sequences (104, 106) of one or more, in particular of each, composition (201, 202).

17. The codebook (200, 400, 401) according to one of the preceding claims, wherein the first composition (201) and the second composition (202) are adjacent compositions (201, 202).

18. A shaping encoder that uses the codebook (200, 400, 401) of any one of the preceding claims, wherein

the shaping encoder is configured to execute an arithmetic coding algorithm (103) based on the codebook (200, 400, 401).

19. A shaping decoder that uses the codebook (200, 400, 401) of any one of the preceding claims, wherein

the shaping decoder is configured to execute an arithmetic coding algorithm (103) based on the codebook (200, 400, 401).