WO2024003524A1

WO2024003524A1 - Steganography

Info

Publication number: WO2024003524A1
Application number: PCT/GB2023/051501
Authority: WO
Inventors: Christian Schroeder DE WITT; Jakob Foerster; Martin STROHMEIR; Samuel SOKOTA; Jeremy KOLTER
Original assignee: Oxford University Innovation Limited
Priority date: 2022-07-01
Filing date: 2023-06-09
Publication date: 2024-01-04
Also published as: GB202209681D0

Abstract

A communication method for communicating a secret message between a send and a receiver, the method comprising: the sender and the receiver obtaining a shared private key and a shared context; the sender obtaining a ciphertext corresponding to the secret message using the shared private key (K); the sender sampling a stegotext space represented by the shared context using the ciphertext to obtain a stegotext encoding the secret message; wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts; the sender transmitting the stegotext; the receiver receiving the stegotext; the receiver sampling a stegotext space using the stegotext to obtain the ciphertext encoding the secret message; and obtaining the secret message from the ciphertext using the private key (K).

Description

STEGANOGRAPHY

Field of the Invention

[0001] The present invention relates to steganography, in particular to steganography using minimum entropy coupling and provides methods, systems and computer programs for stegano graphic encoding and decoding.

Background

[0002] Modern applications, such as mobile-to-mobile communication or app-to-server communication, often require communicating sensitive information over insecure channels. Such applications motivate the development of methodologies for communicating over these channels while concealing sensitive content from adversarial third parties. Cryptographic procedures are one class of methods designed for this use case [Katz and Lindell, 2007, Chamberlain, 2017]. However, cryptographic procedures possess a drawback — they reveal to adversaries that sensitive information is being communicated, by virtue of the fact that encrypted messages (which appear as random content) are being sent over the channel. If an adversary controls the channel, it may simply block attempts to send encrypted messages, making cryptographic procedures inapplicable. Even if the adversary does not control the channel, it may engage in other undesirable activities, such as cyber-attacks.

[0003] A complementary approach to communicating sensitive information over insecure channels is steganography [Blum and Hopper, 2004, Cachin, 2004], In steganography, the goal, informally speaking, is to encode a plaintext message in a manner that appears similar enough to innocuous communication (called covertexf) that an adversary would not realize that hidden communication is occurring in the first place. Because steganographic procedures hide the existence of sensitive communication from adversaries altogether, they provide a complementary kind of security to that of cryptographic methods.

[0004] The concept of perfect security was first defined in [Cachin, 1998]. One case in which perfect security is possible is when the covertext distribution is uniform. In this case, perfect security can be achieved by embedding the message in uniform random cipher text over the same domain as the covertext. However, constructing algorithms that both guarantee perfect security and transmit information at non-vanishing rates for more general covertext distributions has proved challenging. One notable result is that of Wang and Moulin [2008], who show that, in the case that letters of the covertext are independently identically distributed, public watermarking codes disclosed by Moulin and O’Sullivan [2003], Somekh- Baruch and Merhav [2003, 2004] that preserve first order statistics can be used to construct perfectly secure steganography protocols of the same error rate. Another important line of research is that of Ryabko and Ryabko [2009], who show that perfectly secure steganography can be achieved in the case that letters of the covertext are independently identically distributed under the weaker assumption of black box access to the covertext distribution. In follow up work, Ryabko and Ryabko [2011] generalize their earlier work to a setting in which the letters of the covertext need only follow a k-ordcr Markov distribution. Unlike these works, our approach does not make any assumptions on the structure of the distribution of covertext, though, unlike Ryabko and Ryabko, we do assume that this distribution is known. [0005] There is also a body of related literature concerned with the combination of steganography and deep generative models. For example, Volkhonskiy et al. [2017] investigate the idea of training a generative adversarial network for steganography. In their setup, the generator is trained to be robust against both 1) a discriminator attempting to discriminate between real and generated images and 2) a discriminator attempting to discriminate between unmodified images and images for which a least significant bit matching has been used to embed a secret message. Another important example is the work of Dai and Cai [2019], who introduce an algorithm using Huffman coding to modify the covertext distribution in a manner that controls the total variation distance between the covertext distribution and the stegotext distribution. Perhaps the most closely related work is that of [Ziegler et al., 2019]. Ziegler et al. [2019] build on the work of Sallee [2003], who showed that compression algorithms can be used for steganography and that, in cases in which perfect compression is possible, the corresponding steganography procedure achieves perfect security. Ziegler et al. [2019] leverage this insight to perform experiments on language models, showing that arithmetic coding offers favorable trade-offs empirically compared to the Huffman coding. Most recently, Kaptchuk et al. [2021] also propose a steganography method called Meteor that resembles a modification of arithmetic coding. Kaptchuk et al. [2021] show empirical results in which Meteor outperforms arithmetic coding in terms of closeness to the covertext distribution, but is outperformed by arithmetic coding in terms of throughput.

[0006] Therefore, there remains a need for improved steganographic methods. Summary

[0007] It is an aim of the invention to provide improved stegano graphic methods, in particular by providing security guarantees. Embodiments of the invention can provide a desired level of security guarantee, up to and including a provable perfect security guarantee. [0008] Embodiments of the invention provide a method of encoding a secret message, the method comprising: obtaining a ciphertext corresponding to the secret message using a private key (K); and sampling a stegotext space using the ciphertext to obtain a stegotext encoding the secret message; wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts. [0009] Embodiments of the invention provide a method of decoding stegotext, the method comprising: sampling a stegotext space using the stegotext to obtain a ciphertext encoding the secret message; and obtaining a secret message from the ciphertext using a private key (K); wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts. [0010] Embodiments of the invention provide a communication method for communicating a secret message between a sender and a receiver, the method comprising: the sender and the receiver obtaining a shared private key and a shared context; the sender obtaining a ciphertext corresponding to the secret message using the shared private key (K); the sender sampling a stegotext space represented by the shared context using the ciphertext to obtain a stegotext encoding the secret message; wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible cyphertexts; the sender transmitting the stegotext; the receiver receiving the stegotext; the receiver sampling a stegotext space using the stegotext to obtain the ciphertext encoding the secret message; and obtaining the secret message from the ciphertext using the private key (K). [0011] The present invention approaches the problem of steganography as a problem of minimum entropy coupling (MEC). Given two marginal distributions for two random variables, the minimum entropy coupling is the joint distribution over these two random variables that has minimal joint entropy, subject to the constraint that it marginalizes correctly [Kovacevic et al., 2015], Our theoretical contribution proves that minimal entropy coupling between the covertext distribution and the ciphertext distribution (an encoded form of the plaintext that can be made to look uniformly random) yields a steganographic procedure that communicates the maximal possible amount of information about the plain text message, subject to perfect steganographic security. In theory, iMEC is provably perfectly secure in the sense that covertext and stegotext distributions are statistically indistinguishable. In real- world embodiments, iMEC can be implemented in software such that the stegotext bias is limited by numerical precision only. It is shown below that the stegotext bias can thus be reduced to several orders of magnitudes below the output precision of common covertext distribution models. In fact, stegotext bias can be driven arbitrarily small if higher-precision floating data types are employed in the calculation of the minimum entropy coupling algorithm.

[0012] While minimum entropy coupling is an NP-hard problem, there exist O(N log N) approximation algorithms [Kocaoglu et al., 2017, Cicalese et al., 2019, Rossi, 2019] that are subop timal (in terms of joint entropy) by no more than one bit, while retaining exact marginalization guarantees. Furthermore, Sokota et al. [2022] introduced an iterative minimum entropy coupling approach (iMEC) that iteratively applies these approximation procedures to construct couplings between one uniform distribution and one autoregressively specified distribution, both having arbitrarily large supports, while still retaining marginalization guarantees. Because ciphertext can be made to look uniformly random, and any distribution of covertext can be specified autoregressively we can leverage iMEC to perform steganography for arbitrary covertext distributions and plain text messages. Excitingly, this yields the first instance of a steganography algorithm with perfect security guarantees that scales to arbitrary distributions of covertext.

[0013] In our experiments, we evaluate iMEC using language and audio models, specifically GPT-2 and WaveRNN. We compare against arithmetic coding [Ziegler et al., 2019] and Meteor [Kaptchuk et al., 2021], other recent methods for performing steganography with deep generative models. To examine empirical security, we estimate the KL divergence between the stegotext and the covertext for each method. For iMEC, we find that the KL divergence is on the order of the numerical precision of a 64 bit floating integer, in agreement with our theoretical guarantees. In contrast, arithmetic coding and Meteor yield KL divergences many orders of magnitude larger, reflecting their weaker security guarantees. To examine encoding efficiency, we measure the number of bits transmitted per step. We find that iMEC yields superior efficiency results to those of arithmetic coding and Meteor, despite its stricter constraints.

Brief Introduction To The Drawings

[0014] The present invention will be described below with reference to exemplary embodiments and the accompanying schematic drawings, in which:

[0015] Figure 1 depicts a steganography process in which the present invention can be applied;

[0016] Figure 2 depicts two example couplings, shown in magenta, between (red) and (blue) distributions;

[0017] Figure 3 is a graph of Kullback-Leibler divergences between the stegotext distribution and covertext distribution for various methods including embodiments of the invention and comparative examples;

[0018] Figure 4 is graphs showing comparative encoding efficiencies (left) and bit rates (right) of methods of the invention and comparative examples;

[0019] Figure 5 is a graph showing comparative speed evaluation from the cover distribution of methods of the invention and comparative examples;

[0020] Figure 6 is a graph showing bit error rate as a function of threshold size;

[0021] Figure 7 is a graph of bit error rate (X) and non-termination frequencies (circles) for arithmetic coding vs. precision hyperparameter; and

[0022] Figure 8 is a diagram of a communication method of an embodiment.

Exemplary Embodiments

[0023] Before a detailed description of embodiments, a review of steganography and minimum entropy coupling is provided along with a description of a specific stegano graphic problem setting in which the invention can be applied.

[0024] Steganography

As an example, it is helpful to consider a problem setting in which the distribution of normally occurring content C, called the covertext distribution, is known to all parties (the sender, the receiver, and the adversary) and the adversary is unbounded but passive. Unbounded means that the adversary may use arbitrarily expensive computational operations toward the end of determining whether the distribution of stegotext S (i.e., the distribution of text being sent by the sender), differs from the distribution of covertext; passive means that it is not allowed to modify the content sent by the sender. It is desirable to achieve so-called perfect security [Cachin, 1998], wherein the distribution of stegotext is exactly equal to the distribution of covertext (and, resultantly, even an unbounded passive adversary cannot distinguish between them by statistical means), while simultaneously communicating as much information as possible to the receiver about the content of the plaintext message through the stegotext.

[0025] It is to be noted that if the covertext distribution being used does not exactly match that of real messages in the relevant communication channel, then stegotext messages may be detectable statistically or by human review. As discussed further below, the covertext distribution is, in embodiments of the invention, represented by a trained machine learning model and such models are increasingly capable of producing text, audio and even video messages that cannot easily be detected as artificially generated.

[0026] The term “covertext” is used herein to refer to any distribution of messages that can be exchanged between two communication partners over a [digital] channel, often, but not necessarily in the form of text. Examples of suitable message types include: e-mails; chat messages (e.g. using services such as WhatsApp, Telegram, WeChat, Slack, Yammer, Discord, Facebook Messenger, etc.); direct messages in a social media platform (e.g. Twitter, Facebook, Instagram, Reddit, vKontakte, Weibo, etc.); posts or comments in a social media platform (e.g. as before); audio fdes (e.g. voice messages); pictures; video; documents, etc.. Files containing messages may be exchanged via known file sharing services. A message that is sent in an embodiment of the invention may include metadata and/or formatting imposed by the channel or platform and not used to encode the ciphertext. However, in some cases elements of such metadata, e.g. sender names, may be included in the stegotext. Especially in channels or platforms where messages are ordinarily short, a stegotext may be made up of a series of messages rather than a single message. In other words, the ciphertext is encoded across a plurality of messages.

[0027] A distribution of messages of a channel is the combination of the message space for that channel (all messages that can be sent by that channel) and the probability of each message arising in natural communication on that channel. In some channels the message space may be unbounded, at least in theory, if the channel imposes no maximum message length. Even if the maximum message length is bounded in practice, it may often be much larger than required for steganographic applications. However, many messages have an effectively zero probability of arising in natural communication and so can be disregarded. The distribution of messages can therefore be regarded as the collection of messages having a non-negligible probability of arising in natural communication.

[0028] Problem Setting

The objects involved in steganography can be divided into two classes: those which are externally specified and those which require algorithmic specification. Each class contains three objects. The externally specified objects include the distribution over plaintext messages M, the distribution over covertext C, and the random source generator R.

[0029] The distribution over plaintext messages may be known by the adversary, but is not known by the sender or the receiver. However, the sender and receiver are aware of the domain M over which M ranges. The realized plaintext message is explicitly made known to the sender, but not to the receiver or the adversary.

[0030] The covertext distribution C is assumed to be known by the sender, the receiver, and the adversary.

[0031] The random source generator R provides the sender with a mechanism to take random samples from distributions. This random source is known to the sender but not to the receiver or the adversary. As a result, randomness involved in the sender’s encoding process cannot be exactly reproduced by the receiver or the adversary.

[0032] The objects requiring algorithmic specification, which are collectively referred to as a stegosystem, are the key generator KG, the encoder 8, and the decoder D.

[0033] The key generator KG produces a private key K, whose realization is an element of {0, 1 } ¹ for some positive integer 1. This private key is shared between the sender and receiver over a secure channel prior to the start of the stegoprocess and can be used to coordinate communication. The key generation process used by the key generator KG may be known to the adversary, but the realization of the key K is not.

[0034] The encoder 8 takes a private key K, a plaintext message M, and a source of randomness R as input and produces a stegotext S in the space of covertexts C. [0035] The decoder D takes a private key K and a stegotext S as input and returns an estimated plaintext message M.

[0036] Many of the objects described above are depicted graphically in Figure 1, which depicts a general steganography process in which the present invention can be applied. The sender receives a plaintext message M, a source of randomness R, and a private key K, and outputs a stegotext S. The receiver receives the same private key K as the sender, along with the stegotext S. The adversary also receives the stegotext S.

[0037] Security

There are multiple ways to quantify the security level of a steganographic procedure. An aim of the invention is to achieve perfectly secure steganography.

[0038] Definition 3.1. [Cachin, 1998] Given covertext distribution C and plaintext message space M, a stegosystem {K, ε, D) is e-secure against passive adversaries if the KL divergence between the distribution of covertext C and the distribution of stegotext S less than e; i.e., KL(C, S) < e. It is perfectly secure if the KL divergence is zero; i.e., KL(C, S) = 0.

[0039] In other words, a steganographic system is perfectly secure if the distribution of stegotext S communicated by the sender is exactly the same as the distribution of covertext C. [0040] Methodological Outline

One class of steganographic solution methods follows the following outline:

1. The sender and receiver use their shared private key K to inject the plaintext message space M into a space of binary sequences X = {0,1}

called ciphertext. By using a random key, this injection can be done in such a way that the distribution over {0, 1}

is uniformly random, regardless of the distribution of M. (For example, one could generate K, uniformly at random, convert each m to binary x = bin(m), and use the mapping m → bin(m) XOR K.)

2. The sender uses an encoder {0, 1} to map the ciphertext X into stegotext (which

exists in the space of covertexts).

3. The sender sends the stegotext S over the channel.

4. The receiver decodes the stegotext back into binary ciphertext.

5. The receiver decodes the binary back to the plaintext message space. (For the example above, the receiver can recover the binarized message bin(m) = m(bin(m)XOR K)XOR K using the shared private key, and invert the binary to recover the plaintext m.) [0041] In the outline above, steps 1, 3, and 5 can be accomplished using standard operations in steganography literature and, thus, are not discussed in detail herein. Our methodological contribution specifically concerns steps 2 and 4.

[0042] Minimum Entropy Coupling

Let X and Y be probability distributions over finite sets

A coupling

and Y is a joint distribution over

such that, for all

and such that, for all y ∈

x y) = Y(y). In other words, a coupling is a joint distribution over that marginalizes to X and Y, respectively. In general, there may be many

possible couplings for distributions X and Y. As an example, Figure 2 visually depicts two possible couplings for the same marginal distributions. The left coupling has lower entropy than the right coupling (shading reflects the probability mass). Let (X , Y) denote the set of

all couplings. The goal of minimum entropy coupling (MEC) is to find the element of T

(X , Y) with minimal entropy. In other words, to find y ∈

(X , Y) such that the entropy H(y) = - Σ_x,yγ(x,y) log y(x,y) is no larger than that of any other coupling in

(X , K).

[0043] In general, computing the MEC is an NP-hard problem. That said, there has been substantial recent progress in approximating MECs. Cicalese et al. [2019], Rossi [2019] recently showed that it is possible to approximate MECs in N log N time with a solution guaranteed to be suboptimal by no more than one bit. Even more recently, Sokota et al. [2022] introduced an iterative minimum entropy coupling approach (iMEC) that uses an approximate MEC algorithm as a subroutine to couple distributions with arbitrarily large supports, so long as one of the distributions is uniform and the other can be specified autoregressively. This approach provably produces couplings, meaning that exact marginalization to the inputs is guaranteed, regardless of the input distributions. This approach, which is referred to herein as iterative minimum entropy coupling (iMEC), is described in further detail below.

[0044] Iterative Minimum Entropy Coupling

Assume that X is a uniform distribution and let Xi x ... X_n = X and Yi x ... x Y_m = Y be factorizations over the spaces that X and Y range. iMEC implicitly defines a coupling y between X and Y using procedures that iteratively call an approximate MEC as a subroutine [Sokota et al., 2022], These procedures can sample y(Y | x) and query y(X | y) for a given x and y respectively. To align with steganography terminology, these operations can be referred to as encoding and decoding. Because these procedures share a similar structure, we describe them as a unified operation as follows: 1. Initialize a uniform distribution over for each i =

2. Iterate

Select i* =

to be the index of block whose distribution has maximal entropy. Call the approximate MEC subroutine between Denote this coupling as

v. If performing encoding, set if performing decoding, is known and no

sampling is required. Update to be equal to

3. If performing encoding, return y; if performing decoding, return

[0045] Steganography as Minimum Entropy Coupling

An exemplary embodiment in which steganography is treated as a coupling problem is as follows:

Steganography as Coupling

4. Let y

be a coupling over ciphertext distribution X and covertext distribution C.

5. Given ciphertext x, let the sender communicate stegotext

6. Given stegotext S, let the receiver estimate ciphertext arg max

C = S).

Viewing steganography as a coupling problem has value because of its strong security guarantees.

[0046] Proposition 1. Steganography as coupling has perfect steganographic security.

Proof. Consider that the distribution of stegotext S is dictated by y for a given

ciphertext x. Thus the marginal distribution of stegotext is given by

By definition of a coupling, we have y(C) = C. Therefore, we have S = C and KL(C, S) = 0.

[0047] Proposition 2. Performing steganography as coupling with a minimum entropy coupling procedure maximizes the mutual information I(M;S) between the plaintext message and the stegotext, subject to the constraint of perfect steganographic security.

Proof. Consider that I (M; S)' = H (M) + H (S) — (M, S). Now, note that the distribution of M is externally specified and, therefore, H(M) cannot be optimized. Next, recall that perfect steganographic security implies S = C. Thus, the distribution of S is externally specified, implying H(S) cannot be optimized. Finally, recall that the ciphertext encoding is injective and ranges over a discrete distribution. Therefore, H (M,S) = H(X, S). Noting that H(X, S’) is exactly the quantity being minimized by minimum entropy coupling yields our result.

[0048] We also observe that, given the sender’s procedure, the receiver’s behaviour is deterministic.

[0049] Remark 1. Given the sender’s encoding procedure, the receiver’s decoding procedure minimizes its error rate.

Proof. Because the sender’s encoding process is dictated by the joint distribution y, the posterior over ciphertexts is Therefore, the error rate is

S), which is minimized by

= arg max

[0050] It follows from the above that, among steganography procedures with perfect security, the one induced by minimum entropy provably maximizes the amount of information transmitted by the sender. Furthermore, this insight is easily extended beyond theoretical results. Because, as discussed in the background, it is always possible to make ciphertext look uniformly random, iMEC [Sokota et al., 2022] can immediately be plugged into the steganography as coupling framework for arbitrary covertext distributions. And while iMEC does not possess proven approximation guarantees as a minimum entropy coupling algorithm, it yields performant couplings in large-scale settings, as we will see in the experiments.

[0051] Experiments

We empirically compare iMEC against arithmetic coding [Ziegler et al., 2019] and Meteor [Kaptchuk et al., 2021] on four different covertext types. We also include a variant of Meteor that employs bin-sorted probabilities [Kaptchuk et al., 2021, Meteor:reorder],

[0052] Experiment setup

Our first covertext distribution consists of uniformly random noise (UNIF) of dimension 40 and a mean channel entropy of

= 5.32 bits. The second and third covertext distributions are variants of GPT-2 [Radford et al., 2019] with 12 attention modules [Wolf et al., 2020] conditioned on 1024-character strings from the Wikitext-103 dataset [Merityet al., 2016]. The second covertext distribution performs top-k sampling from a re-normalised categorical distribution over the 40 highest-probability outputs. The third covertext distribution instead performs nucleus sampling [Holtzman et al., 2020] from the highest- probability outputs that together comprise 90% of the raw channel entropy [Radford et al., 2019]. The fourth covertext distributions consists of a text-to-speech (TTS) pipeline [Yang et al., 2022] based on Tacotron-2 [Shen et al., 2018] mel-spectogram encodings, followed by a WaveRNN generative model [Kalchbrenner et al., 2018] pretrained on the LJSpeech dataset [Ito and Johnson, 2017] with an average channel entropy of He =0.35 bits. We condition the WaveRNN model using text generated by GPT-2 that we convert into audio data using mel- spectogram encodings.

[0053] Our implementation of iMEC makes use of the approximative minimum entropy coupling heuristic suggested by [Kocaogluet al., 2017,Algorithml] as its subprocedure. We also integrated the original implementations of arithmetic coding Ziegler et al. [2019] and Meteor/Meteor:reorder Kaptchuk et al. [2021] into our codebase.

[0054] All experiments were performed on a AMD Ryzen Threadripper PRO 3955WX with 16 physical cores and 2x NVIDIA GeForce RTX 3090 GPUs. Note that, apart from neural model forwards, iMEC encoding and decoding occupies just a single CPU core, while both algorithmic coding and Meteor make use of multiple CPUs, and Meteor:reorder also makes use of GPUs during encoding and decoding.

[0055] All ciphertexts are 80-bit bitstrings sampled uniformly at random. We measure the encoding efficiency of each method by measuring the amount of covertext required to transmit these bitstrings. We tuned the hyper-parameters of each method to yield the best performance on this task. For iMEC, we stop transmission when the entropy of every posterior block is less than 0.1, i < 0.1. We found the error rate for this

threshold to be lower than 1 in 105 bits. For arithmetic coding and Meteor/Meteor:reorder, we performed a grid search over precision values of 8, 16, 32 and 48 bits. We find that both arithmetic coding and Meteor get stuck in a (seemingly) infinite loop at a frequency of about 3%. For simplicity, we simply discard these runs from our investigation. This may result in slight over estimation of these algorithms’ encoding efficiencies.

[0056] Results

Empirical Security We first investigate the empirical security of each method. As estimating KL(C, S) is difficult, we instead show KL(S, C), which can easily be estimated from samples. Figure 3 is a graph of Kullback-Leibler divergences between the stegotext distribution and covertext distribution for various methods including embodiments of the invention and comparative examples. Error bars indicate estimates of 95% confidence intervals computed from the central limit theorem over 100 runs. [0057] As expected from theory, we find that iMEC’s security is on order of the precision of the numerical data type employed in its implementation — rendering it e-secure for arbitrarily small e in practice. On the other hand, arithmetic coding’s divergence is consistently many orders of magnitude above the precision of the data type. For Meteor and Meteor:reorder, we observe good security for a uniform distribution. But, for more realistic distributions, they yield divergences many orders of magnitude above the precision of the data type. That said, consistent with Kaptchuk et al.’s findings, we observe that Meteor does result in better security than arithmetic coding. On the other hand, for Meteor:reorder, we see poorer security than arithmetic coding for realistic distributions, and even observe a divergent KL divergence for WaveRNN.

[0058] Encoding Efficiency We next compare encoding efficiencies and bit rates. We say bit rate to mean the number of bits encoded per token. We say encoding efficiency to mean the bit rate normalized to the average channel entropy. Figure 4 shows graphs showing comparative encoding efficiencies (left) and bit rates (right) of methods of the invention and comparative examples. Each method is evaluated at tuned hyperparameters, error bars indicate estimates of 95% confidence intervals computed using the central limit theorem from 1000 runs. Horizontal lines on the right plot correspond to the mean entropy of the channel. We present results for iMEC with varying block sizes

over the ciphertext. These block sizes are hyperparameters for iMEC and can be made larger or smaller. We find that iMEC with a block size of 20 bits yields superior encoding efficiency to the baselines across all three of our realistic distributions. We also observe that iMEC with a block size of 16 bits generally yields competitive or superior performance to the other methods, and that 10 bit block sizes also performs competitively. Impressively, at a standard encoding frequency of 24kHz, iMEC’s performance for WaveRNN would allow it to encode a private message of 225 kilobytes in just 30 seconds of TTS voicemail — sufficient for sending compressed images. Among the baselines, we observe that arithmetic coding tends to produce higher efficiency than Meteor, again consistent with Kaptchuk et al.’s findings.

[0059] Speed Lastly, we examine the speed of each algorithm. Figure 5 is a graph showing comparative speed evaluation from the cover distribution of methods of the invention and comparative examples. Each method is evaluated at optimal hyperparameters, error bars indicate estimates of 95% confidence intervals computed using the central limit theorem from 1000 runs. Horizontal lines indicate the amount of time required for model inference for GPT- 2 and WaveRNN. [0060] While, in the previous section, we observed that increasing iMEC’s block size generally improves encoding efficiency, we see here that this improved efficiency does not come without cost. While 10 bit blocks require an order of magnitude less time than model inference, 16 bit blocks require the same order of magnitude of time as model inference, and 20 bit blocks require an order of magnitude more time than that. The wall-clock time of arithmetic coding and Meteor are generally comparable to that of the 10 bit blocks (though, as noted in the experimental setup, they also use more computational resources than iMEC), while the wall-clock of Meteor:reorder varies somewhat significantly depending on the task. We believe it is possible (perhaps even likely) that innovations in approximate minimum entropy coupling will allow some of the cost of coupling to be distributed across multiple cores, making the block sizes we experiment with here much cheaper.

[0061] Figure 6 is a graph showing bit error rate as a function of the belief entropy threshold. The error bars shown are the standard deviation of the mean over 100 trajectories. As is suggested by the figure, the error rate can be made arbitrarily small by selecting a sufficiently small threshold value.

[0062] Figure 7 is a graph of bit error rate (X) and non-termination frequencies (circles) for arithmetic coding vs. precision hyperparameter. We show estimates over 100 trajectories. While errors were not observed, non-termination occurred with non-negligible probability. [0063] A practical embodiment of the invention is depicted in more detail in Figure 8. In an initial stage (I), the sender and receiver obtain a Covertext distribution D, a Context generator G, and a pseudo random number generator (PRNG) Seed f . These items may be generated by one of the communication parties (sender or receiver) and communicated securely to the other or obtained by both communication parties from a third party.

[0064] In the communication stage (II), the sender uses the PRNG Seed f in a (pseudo-) random bitmask generator to generate a random bitmask of the same length as the message M that is to be communicated. In the case of a long message, with some loss of security, a shorter bitmask may be used by repeating it to match the length of the message. The bitmask is applied to the message, e.g. by XOR, so as to result in a randomised message.

[0065] The randomised message, covertext distribution D and a source of randomness are applied to the iMEC encoder to generate the stegotext S. The context generator generates a context for the stegotext . For example the context may be a prompt for a machine learning algorithm or natural language processing algorithm (e.g. GPT-2) to generate text. The context should be known to or determinable by both sender or receiver and desirably changes moderately frequently. For example, the sender and receiver may agree to obtain the context from a commonly accessible news site. Alternatively the sender and receiver may reuse part of a previous message as the context for the next or generate the context using a pseudorandom algorithm using a common seed.

[0066] The stegotext S is communicated to the receiver via the channel. The receiver uses an iMEC decoder with the covertext distribution D and a Uniform message belief B (uniform posterior over messages) to decode the stegotext S and recover the randomised message. The receiver also uses the PRNG Seed f in a (pseudo-) random bitmask generator to generate the random bitmask in order to unmask the output of the iMEC decoder and obtain the original message M.

[0067] The present invention proposes that steganography can be approached as a minimum entropy coupling problem. First, coupling algorithms yield steganography procedures with perfect security; second, among steganography algorithms with perfect security, the one induced by minimum entropy coupling maximizes information throughput. In aggregate, these findings suggest that the steganography problem setting may be viewed most naturally through the lens of minimum entropy coupling. Furthermore, this is also practical. Using approximate and iterative minimum entropy coupling [Kocaoglu et al., 2017, Sokota et al., 2022], steganography can be performed with deep generative model covertext distributions. In empirical evaluations, it is shown that iterative minimum entropy coupling is perfectly secure in practice, up to numerical precision, and exhibits superior efficiency compared to existing methods.

[0068] It is noted that the present invention does not provide protection against adversarial interference with the stegotext, which is assumed to arrive at the receiver unperturbed. Detection and/or correction of errors in the channel can be provided by standard channel coding techniques. In addition, perfect undetectability of the stegotext depends on having access to an exact distribution of the covertext (the messages in which the stegotext is to be hidden). Even modern deep generative models do not exactly capture complex distributions (though it is expected that this issue will be somewhat ameliorate over time, due to ongoing research efforts in the deep learning community). Furthermore, in realistic scenarios, the distribution of “normally” occurring content may shift over time and depend on other external context, making it difficult to capture. Nevertheless, in many cases the covertext distribution can be modelled sufficiently accurately to provide a high degree of concealment and the invention is useful in settings in which the covertext distribution can be modelled accurately. [0069] Stegotext samples

To illustrate the effect of bias, a sample stegotext for both iMEC (block size 10), as well as Meteor:reorder (precision 32) is shown below. The private message length is 20 bytes. Both examples have been mildly post-processed for readability, including by removing special characters and whitespaces.

Context:

Heck horses are dun or grullo (a dun variant) in color, with no white markings. The breed has primitive markings, including a dorsal stripe and horizontal striping on the legs. Heck horses generally stand between 12.2 and 13.2 hands (50 and 54 inches, 127 and 137 cm) tall. The head is large, the withers low, and the legs and hindquarters iMEC produces the following stegotext: are short. The neck is wide and thick, a characteristic that can be inherited from the male. The face can be seen as a broad head, with pointed toes. The head and neck are often used as a tool for hunting, though their appearance often depends on their social organization. The legs are

Meteor:reorder produces the following stegotext: have a narrow and angular shape. The fore and hind legs are longer than the head. The tail is broad and short in a shape similar to the neck or neckbone. The front legs have a sharp protrusion that leads from the head to the head but not from the tail. The hind legs have long pangs (2) and lower

Note how Meteor:reorder’s high bias seemingly lowers the content quality of the output text. [0070] In summary, Steganography is the practice of encoding a plaintext message into another piece of content, called a stegotext, in such a way that an adversary would not realize that hidden communication is occurring. This problem setting possesses two (competing) objectives: 1) To make the stegotext as similar as possible to the “normally” occuring content (known as covertext); 2) To encode as much information as possible about the content of the plaintext into the stegotext. The present disclosure shows that any coupling procedure can be used to achieve perfectly secure steganography (assuming shared private keys for the communicating parties) against computationally unbounded passive adversaries. It is also shown that, among steganography procedures with perfect security, the one induced by minimum entropy coupling maximizes the amount of information transmitted over the channel. By combining these insights with techniques for approximate and iterative minimum entropy coupling the present invention provides a steganography procedure that can be scaled to arbitrary covertext distributions, without sacrificing security guarantees. Furthermore, it is shown that this procedure is able to encode plaintext messages into language and audio model covertext distributions with greater efficiency than alternative scalable approaches, despite having stricter security constraints.

[0071] The methods of the present invention may be performed by computer systems comprising one or more computers. A computer used to implement the invention may comprise one or more processors, including general purpose CPUs, graphical processing units (GPUs), neural processing units (NPUs), tensor processing units (TPUs) or other specialized processors. Multi-threaded, multi-core and parallel processors are especially advantageous to perform steps involving machine learning algorithms. A computer used to implement the invention may be physical or virtual. A computer used to implement the invention may be a server, a client or a workstation. Multiple computers used to implement the invention may be distributed and interconnected via a network such as a local area network (LAN) or wide area network (WAN). Individual steps of the method may be carried out by a computer system but not necessarily the same computer system. Results of a method of the invention may be displayed to a user or stored in any suitable storage medium. The present invention may be embodied in a non-transitory computer-readable storage medium that stores instructions to carry out a method of the invention. The present invention may be embodied in a computer system comprising one or more processors and memory or storage storing instructions to carry out a method of the invention. The present invention may be incorporated into software updates or add-ons for a pre-existing system or device.

[0072] Having described the invention it will be appreciated that variations may be made to the above described embodiments which are not intended to be limiting. The invention is defined in the appended claims and their equivalents.

[0073] References

M. Blum and N. Hopper. Toward a theory of steganography. 2004.

C. Cachin. An information-theoretic model for steganography. In D. Aucsmith, editor, Information Hiding, pages 306-318, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg. ISBN978-3-540-49380-8.

C. Cachin. Digital steganography, 2004.

A. Chamberlain. Applications of cryptography, Mar 2017. URL https://blogs.ucl.ac.uk/infosec/2017/03/12/applications-of-cryptography/. F. Cicalese, L. Gargano, and U. Vaccaro. Minimum-entropy couplings and their applications. IEEE Transactions on Information Theory, 65:3436-3451, 2019

F. Dai and Z. 378 Cai. Towards near-imperceptible steganographic text. In Proceedings of the 57^th Annual Meeting of the Association for Computational Linguistics, pages 4303-4308, Florence, Italy, July 2019. Association for Computational Linguistics, doi: 10.18653/vl/P19-1422. URL https://aclanthology.org/P19-1422.

A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=rygGQyrFvH.

K. Ito and L. Johnson. The Ij speech dataset. https://keithito.com/LJ-Speech-Dataset'', 2017.

N. Kalchbrenner, E. Eisen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. Oord, S. Dieleman, and K. Kavukcuoglu. Efficient Neural Audio Synthesis In Proceedings of the 35th International Conference on Machine Learning, pages 2410-2419. PMLR, July 2018. URL https://proceedings.mlr.press/v80/kalchbrennerl8a.html. ISSN: 2640- 3498.

G. Kaptchuk, T. M. Jois, M. Green, and A. D. Rubin. Meteor: Cryptographically secure steganography for realistic distributions. In Proceedings of the 2021 ACMSIGSAC Conference on Computer and Communications Security, CCS ’21, page 1529-1548, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384544. doi: 10.1145/3460120.3484550. URL https://doi.org/10.1145/3460120.3484550.

J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman and Hall/CRC Press, 2007. ISBN 978-1-58488-551-1.

M. Kocaoglu, A. Dimakis, S. Vishwanath, and B. Hassibi. Entropic Causal Inference. In AAAI, 2017.

M.KovaGevic', I. Stanojevic and V. Senk. On the entropy of couplings. Information and Computation, 242:369-382, 2015. ISSN 0890-5401. doi: https://doi.Org/10.1016/j.ic.2015.04.003. URL https://www.sciencedirect.eom/science/article/pii/S0890540115000450.

S. Merity, C. Xiong, J. Bradbury, and R. Socher. Pointer Sentinel Mixture Models. CoRR, abs/1609.07843, 2016. URLhttp://arxiv.org/abs/l 609.07843. arXiv: 1609.07843. P. Moulinand J. O’Sullivan. Information-theoretic analysis of information hiding.

IEEE Transactions on Information Theory, 49(3):563-593, 2003. doi: 10.1109/TIT.2002.808134.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language Models are Unsupervised Multitask Learners, undefined 2019. URL https:/7www.semanticscholar. org/paper/Language-Models-are-Unsupervised-Multitask- Leamers-Radford-Wu/9405cc0d6169988371b2755e573cc28650dl4dfe.

M. Rossi. Greedy additive approximation algorithms for minimum-entropy coupling problem. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 1127— 1131, 2019. doi: 10.1109/ISIT.2019.8849717.

B. Ryabko and D. Ryabko. Asymptotically optimal perfect steganographic systems. Problems of Information Transmission, 45:184-190, 062009. doi: 10.1134/S0032946009020094.

B. Ryabko and D. Ryabko. Constructing perfect steganographic systems. Information and Computation, 209(9): 1223-1230, 2011. ISSN 0890-5401. doi: hitps://doi.org/10.1016/j,ic.2011.06.004. URL https://www.sciencedirect.eom/science/article/pii/S0890540111001064.

P. Sallee. Model-based steganography. volume 2939, pages 154-167, 10 2003. ISBN 978-3-540-21061-0. doi: 10.1007/978-3-540-24624-4 12.

J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y.Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomvrgiannakis, and Y.Wu. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4779- 4783, Apr. 2018. doi: 10.1109/IC AS SP. 2018.8461368. ISSN: 2379-190

S. Sokota, C. S. deWitt, M. Igl, L. M. Zintgraf, P. Torr, M. Strohmeier, J. Z. Kolter, S. Whiteson, and J. N. Foerster. Communicating via markov decision processes. In Proceedings of the 39^th International Conference on Machine Learning, ICML’22. JMLR.org, 2022.

A. Somekh-Baruch and N. Merhav. On the error exponent and capacity games of private watermarking systems. IEEE Transactions on Information Theory, 49(3):537-562, 2003. doi: 10.1109/TIT.2002. 808132. A. Somekh-Baruch and N. Merhav. On the capacity game of public watermarking systems. IEEE Transactions on Information Theory, 50(3):511-524, 2004. doi: 10.1109/TIT.2004.824920.

D. Volkhonskiy, I. Nazarov, B. Borisenko, and E. Bumaev. Steganographic generative adversarial networks. Proceedings of NIPS 2016 Workshop on Adversarial Training, 032017.

Y. Wang and P. Moulin. Perfectly secure steganography: Capacity, error exponents, and code constructions. IEEE Transactions on Information Theory, 54(6):2706-2722, 2008. doi: 10.1109/TIT.2008.921684.

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush. Hugging Face’s Transformers: State-of-the-art Natural Language Processing. Technical Report arXiv: 1910.03771, arXiv, July 2020. URL http://arxiv.org/abs/1910.03771. arXiv: 1910.03771 [cs] type: article.

Y.-Y. Yang, M. Hira, Z. Ni, A. Chourdia, A. Astafurov, C. Chen, C.-F. Yeh, C. Puhrsch, D. Pollack, D. Genzel, D. Greenberg, E. Z. Yang, J. Lian, J. Mahadeokar, J. Hwang, J. Chen, P. Goldsborough, P. Roy, S. Narenthiran, S. Watanabe, S. Chintala, V. Quenneville- Belair, and Y. Shi. Torch Audio: Building Blocks for Audio and Speech Processing. Technical Report arXiv:2110.15018, arXiv, Feb. 2022. URLhttp://arxiv.org/abs/2110.15018. arXiv:2110.15018 [cs, eess] type: article.

Z. Ziegler, Y. Deng, and A. Rush. Neural linguistic steganography. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1210-1215, HongKong, China, Nov. 2019. Association for Computational Linguistics, doi: 10.18653/vl/D19-1115. URL https://aclanthology.org/D19-1115.

Claims

1. A method of encoding a secret message, the method comprising: obtaining a ciphertext corresponding to the secret message using a private key (K); and sampling a stegotext space using the ciphertext to obtain a stegotext encoding the secret message; wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts.

2. A method of decoding stegotext, the method comprising: sampling a stegotext space using the stegotext to obtain a ciphertext encoding the secret message; and obtaining a secret message from the ciphertext using a private key (K); wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts.

3. A method according to claim 1 or 2 wherein the sampling is performed using an iterative minimum entropy coupling process.

4. A method according to claim 3 wherein the iterative minimum entropy coupling process comprises iteratively using selected approximate minimum entropy coupling.

5. A method according to any one of the preceding claims wherein the stegotext space is represented by a trained machine learning algorithm.

6. A method according to claim 5 wherein the machine learning algorithm is trained using messages from a specific communications channel and further comprising transmitting or receiving the stegotext using the specific communications channel.

7. A method according to claim 5 or 6 wherein the machine learning algorithm utilises a shared context.

8. A method according to any one of the preceding claims wherein the private key is a random or pseudorandom number.

9. A communication method for communicating a secret message between a sender and a receiver, the method comprising: the sender and the receiver obtaining a shared private key and a shared context; the sender obtaining a ciphertext corresponding to the secret message using the shared private key (K); the sender sampling a stegotext space represented by the shared context using the ciphertext to obtain a stegotext encoding the secret message; wherein the sampling is performed by reference to a minimum entropy coupling between the stegotext space and a ciphertext space containing all possible ciphertexts; the sender transmitting the stegotext; the receiver receiving the stegotext; the receiver sampling a stegotext space using the stegotext to obtain the ciphertext encoding the secret message; and obtaining the secret message from the ciphertext using the private key (K).

10. A computer program comprising computer-interpretable code that, when executed by one or more computer systems, instructs the computer system(s) to perform a method according to any one of the preceding claims.

11. A communication device configured to implement a method according to any one of the preceding claims.