WO2022093249A1

WO2022093249A1 - Latent representation scrambler and generative network adjustments

Info

Publication number: WO2022093249A1
Application number: PCT/US2020/058062
Authority: WO
Inventors: Manu RASTOGI; Amalendu IYER; Madhu Athreya; Srikanth KUTHURU
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-05-05

Abstract

A system and method are disclosed to provide adjustments to a scrambler and a generative network. A scrambled representation is provided from a scrambler network applied to a latent representation of original data. Generated data is provided from a generative network applied to the scrambled representation. The generated data is compared to the original data to obtain a generative error. The scramble representation is compared to the latent representation to obtain a scrambler error. The generative network is adjusted to increase the generative error and the scrambler network is adjusted to reduce the scrambler error.

Description

LATENT REPRESENTATION SCRAMBLER AND GENERATIVE NETWORK ADJUSTMENTS

Background

[0001] Machine learning is an application of artificial intelligence (Al) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

Brief Description of the Drawings

[0002] Figure 1 is a block diagram illustrating an example method of scrambling a latent representation.

[0003] Figure 2 is a block diagram illustrating an example system that can implement the example method of Figure 1 .

[0004] Figure 3 is block diagram illustrating an example implementation of the example method of Figure 1 .

[0005] Figure 4 is a block diagram of another example system that can implement the example methods of Figures 1 and 3.

Detailed Description

[0006] An example of machine learning includes deep learning, or deep structured learning, which can be based on artificial neural networks with representation learning, such as a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. For example, machine learning processes build a mathematical model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, machine vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, and material inspection. Learning can be supervised, unsupervised, or semi-supervised. For instance, supervised learning includes the task of learning to map an input to an output based on example input-output pairs and infers a function from labeled training data from a set of training examples. Unsupervised learning or self-supervised learning looks for previously undetected patterns in a data set without pre-existing labels. Semi-supervised learning, a related variant, makes use of supervised and unsupervised techniques.

[0007] Some deep learning models can apply data in a latent space to a generative network. One example of self-supervised learning, presented for illustration here, includes Contrastive Predictive Coding, or CPC, which includes a network trained without labeled data. Instead, the network is applied to predict a future observation in latent space, such as a compressed, or lower dimensional space, given a small input. Autoregressive models can be applied in this latent space to make predictions that are many steps in the future. Noise- Contrastive Estimation can be included for the loss function in a manner similar to ways that have been used for learning word embeddings in natural language models. The CPC model can be applied to different data modalities such as images, speech, natural language and reinforcement learning, and the same mechanism learns information on each of these domains.

[0008] A feature of CPC is the use of latent space, or a compressed space into which original data, or higher dimensional data, is projected. Original data can be transformed into a latent representation of the original data via an encoder. A contrastive approach is applied in training, e.g., a model attempts to discern between correct and incorrect sequences. In an example using audio data, a network can be used to predict speech given a current context; or, in an example of video data, a network can be applied to predict a future frame in latent space. A loss is framed as a contrastive loss, in which a binary classifier is used to compare the predicted future with a set of samples in which one sample, such as only one sample from a group of samples, is the actual ground truth and the rest are negative samples. The architecture generally includes an encoder network that generates an embedding in a latent space zt: g_enc(x). Another neural network, such as a generative network that may be auto regressive, is then used to generate a context vector, ct from a sequence of latent vectors from the encoder ct: g_ar(z<t).

[0009] Deep learning systems such as CPC are apt in circumstances that include vast amounts of training data, which can present a number of issues. For example, some applications, such as models that classify images, have a large, public and readily available data set. But other, less apparent applications can include a much smaller data set. And some enterprises may have better access to data than others. For example, enterprises in the social media space or the media streaming space may have enormous amounts of data compared to enterprises in other fields. Other issues include data bandwidth. The current model of training machine learning or deep learning systems includes exporting data to the cloud, or to off-site datacenters with scalable and large amounts of compute and storage resources. On-premises resources for training can be limited, and this can make exporting data the more practical option for effective training in many circumstances.

[0010] The current practice of transferring data, which may include corporate data such as corporate documents or recordings of meetings, to an offsite facility can raise data privacy concerns. Data breaches at the offsite facility, for example, can expose sensitive or competitive information that may be better kept secret.

[0011] Figure 1 illustrates an example method 100 to enhance the privacy of data during training, such as data used for training a self-supervised network, which may employ latent representations of original data. Original data can be collected from such sources including computing devices, webcams, microphones, smart speakers, smart cameras, data repositories and can include or provide such items as corporate documents, meeting recordings, clicks of a touchpad, images. The original data can be encoded into a latent space using an encoder network as a latent representation of original data. For example, the encoder can be located on-premises or off-premises such as in the cloud where the latent representation of the original data can be aggregated or stored. In several instances, however, latent representations can be used to recover the original data, and breaches of latent representations, once decoded, can expose sensitive information. Method 100 can be applied to obfuscate the latent representation enough to make recreation of the original data difficult while preserving enough information in the latent space to provide effective training data.

[0012] In method 100, a latent representation of original data is scrambled to create a scrambled representation, and the scrambled representation is generated to create a generated data. For example, the scrambled representation is an obfuscated version of the latent representation, and, as used in this disclosure, the generated data is an attempted recreation of the original data from the scramble representation. A scrambler network can be applied to the latent representation to obtain the scrambled representation. A generative network can be applied to the scrambled presentation to obtain the generated data. In one example, the generative network may attempt to recreate or reconstruct the latent representation from the scrambled representation and then attempt recreate the original data from the recreated latent representation, although other examples are contemplated. The generated data is compared to the original data to obtain a generative error at 102. The scrambled representation is compared to the latent representation to obtain a scrambler error at 104. The generative network is adjusted to increase the generative error and the scrambler network is adjusted to reduce the scrambler error at 106. For example, the generative network is repeatedly adjusted to maximize the generative error while the scrambler network is repeatedly adjusted to minimize the scrambler error. This combination produces a scrambled representation of the latent representation that includes enough usable information to provide useful training data into a network such as a CPC while obfuscating the latent representation enough to preserve the privacy of the original data if, for example, there is a breach of the scrambled representation.

[0013] Figure 2 illustrates an example system 200 that can be used to implement method 100. In one example, features of system 200 can be included as part of a self-supervised network. For instance, incoming data can be encoded into the latent space, and another network can sample data from the future and generate a latent representation of negative samples. Still another network can be trained for disambiguating positive latent representations of true samples against a negative latent representation, such as several latent representations. In one example, the negative samples can be provided from a set of devices, including computing devices, webcams, microphones, and smart speakers. In the example of system 200, a device 202 can be used to provide original data 204 to an encoder network 206. In one instance, a set of devices or a plurality of devices can be used to provide original data to a plurality of encoder networks. The system 200 further includes a generative network 208, and a scrambler network 210 disposed between the encoder network 206 and the generative network 208.

[0014] The encoder network 206 receives the original data 204 and provides a latent representation of the original data, or latent representation 212. In one example, the encoder network 206 can be an artificial neural network to provide the latent representation 212 typically by dimensionality reduction, such as to ignore signal noise from the original data 204. In other examples the encoder network 206 can transform the original data 204 from a high dimensional space to a low dimensional space via linear or nonlinear dimensionality reduction techniques such as Principal Component Analysis (PCA). The original data 204, for example, includes useful information and other information. The latent representation 212 includes useable information from the higher dimensional, or uncompressed or relatively less compressed, original data 204.

[0015] In a typical example of a self-supervised network, an encoder network can provide a latent representation directly to a generative network. In the example of system 200, the encoder network 206 provides the latent representation 212 to the scrambler network 210 to provide a scrambled representation of the latent representation, or a scrambled representation 214. The system 200 can be coupled to the device 202 via a communication connection, such as a computer network connection. In one example, the encoder 204 is coupled to the device 202 via the communication connection. In another example, the encoder 204 is located with the device 202, and the encoder 204 is coupled to the scrambler network 210 via the communication connection. In the example of system 200, the scrambled representation is provided to the generative network 208.

[0016] In the example, the scrambler network 210 can include an artificial neural network to add signal noise to latent representation 212 to obfuscate the latent representation 212 and provide the scrambled representation 214. The scrambler network 210 can be parameterized by weights. The scrambler network 210 responds to weighted updates, or a first set of weighted updates used to parameterize the scrambler network, to adjust the amount of obfuscation of the latent representation 212. For example, the scrambler network 210 can be adjusted, such as via the weighted updates, to heavily obfuscate the latent representation 212 via signal noise such that usable information in the scrambled representation 214 from the latent representation 212 is destroyed. For instance, the latent representation 212 includes useful information for use by the self-supervised network, but a heavily obfuscated latent representation as the scrambled representation 214 can present very little to no useful information for use by the self-supervised network. Also, the scrambler network 210 can be adjusted, such as via the weighted updates, to lightly obfuscate the latent representation 212 via signal noise such that usable information in the scrambled representation 214 of the latent representation 212 is preserved. For instance, the latent representation 212 includes useful information for use by the self-supervised network, and a lightly obfuscated latent representation as the scrambled representation 214 can present much useful information for use by the self-supervised network. The scrambler network 210 responds to the weighted updates to provide a selected amount of obfuscation to the latent representation 212 as it provides the scrambled representation 214.

[0017] System 200 includes a scrambler comparator 220 to make a determination as to an amount of difference between the latent representation 212 and the scrambled representation 214. The scrambler comparator 220 receives the latent representation 212 from the encoder network 206 and the scrambled representation 214 from the scrambler network 210. In one example, the scrambler comparator 220 measures an amount of error between the latent representation 212 and the scrambled representation 214 and provides a scrambler error 224. For example, the scrambler comparator 220 can apply an amount of mutual information in the latent representation 212 and the scrambled representation 214 as a metric of the scrambler error 224.

[0018] In the example, the generative network 208 can include an artificial neural network to create higher dimensional data similar to the original data 204 from data in the latent space. For example, the generative network 208 receives the scrambled representation 214, which may be of the same dimension as the latent representation 212, and reconstructs the scrambled representation into reconstruction data, or generated data 216. The generative network 208 can be parameterized by weights. The generative network 208 can learn to map from a latent space to a data distribution of interest in a learning network including a self-supervised learning network. For example, a generative network can generate candidates and a discriminative network can be used to evaluate the candidates in a type of general adversarial network. In one example, the generative network 208 is a type of deconvolutional neural network. In an example of typical a self-supervising network, the generative network receives the latent representation directly and can reproduce the original data from the latent representation. In the example of the system 200, the generative network 208 receives the scrambled representation 214 but is unable to faithfully reproduce the original data 204 as the generated data 216. The ability of the generative network 208 to reliably not reproduce the scrambled representation 214 into original data as the generated data is provided via weighted updates, or a second set of weighted updates used to parameterize the generative network, provided to the generative network 208. For example, reconstructing original data from the scrambled representation is more difficult for the generative network 208 than reconstructing original data from the latent representation.

[0019] System 200 includes a generative comparator 222 to make a determination as to an amount of difference between the generated data 216 and the original data 204. The generative comparator 222 receives the original data 204 from the device 202 and the generated data 216 from the generative network 208. In one example, the generative comparator 222 measures an amount of error between the original data 204 and the generated data 216 and provides a generative error 226. For example, the scrambler comparator 220 can apply a mean square error between the original data 204 and the generated data 216 as a metric for the generative error 226.

[0020] System 200 is adjusted to increase the generative error 226 and reduce the scrambler error 224 For example, the generative network 208 is adjusted, such as via the second set of weighted updates, to increase the generative error 226, and the scrambler network 210 is adjusted, such as via the first set of weighted updates, to reduce the scrambler error 224. In one aspect, the system 200 can modify each of the first and second set of weighted updates to try to increase, which can include maximize, the generative error 226 and try to reduce, which can include minimize, the scrambler error 224. In one instance of such a combination, the amount of useable information in the scrambled representation is maximized while the ability of the generative network to reproduce the original data is made difficult. For instance, the generative network is adjusted such that reconstructing original data from the scrambled representation is most difficult as compared to reconstructing original data from the latent representation.

[0021] The conflicting duality of increasing or maximizing the generative error 226 while reducing or minimizing the scrambler error 224 is addressed with a conditional general adversarial network, or cGAN 230. The scrambler error 224 and the generative error 226 can be provided to the cGAN. An objective function of the cGAN can be selected to reduce the scrambler error 224 and to increase the generative error 226. For example, the objective function can be selected to minimize the scrambler error 224 and to maximize the generative error 226. Based on the objective function, the generative network 208 is adjusted via an output 218 of the cGAN 230 to increase the generative error 226, such as by an output 218b to adjust the weights to parameterize the generative network 208, such as with a second set of weighted updates provided from the cGAN 230 to the generative network 208. The scrambler network 210 is adjusted via an output 218 of the cGAN 230 to reduce the scrambler error 224, such as by another output 218a to adjust the weights to parameterize the scrambler network 210, such as with a first set of weighted updates provided from the cGAN 230 to the scrambler network 210.

[0022] The outputs 218, such as the first set of weighted updates 218a and second set of weighted updates 218b can be used to adjust the parameterization of the artificial neural networks in the generative network 208 and the scrambler network 210. For example, the weighted updates 218a, 218b can include changes to the weights used in the artificial neural networks and can be applied to the weights used in the artificial neural networks, and the weighted updates 218a, 218b can include new weights to replace the weights used in the artificial neural networks. Weights can include weight and biases used in the artificial neural networks.

[0023] System 200 can be implemented to include a combination of a hardware device, such as one or more hardware devices, and a computer program, such as one or more computer programs, for controlling a processor and memory to perform a method, such as method 100. For example, the artificial neural networks of system 200 can be implemented as a processor readable medium or processor readable storage device having a set of executable instructions for controlling the processor to perform a method, such as implementation of the encoder network 206, generative network 208, scrambler network 210, comparators 220 222, and cGAN 230. For example, system 200 can be used to implement method 100 as a computerized method.

[0024] Figure 3 illustrates an example method 300 that can be implemented with system 200 in accordance with method 100. Method 300 can be implemented as a computerized method. Method 300 can receive original data, such as data in a high dimensional space, such as voice recordings, video, and documents. In one example, the original data can be itself in a data compressed form, such as lossy compression based on transform coding including multimedia formats of JPEG, MPEG, and MP3. The original data may be received from one device or a plurality of devices.

[0025] Method 300 can apply an encoder to generate a latent representation of the original data at 302. The encoder receives data in the high dimensional space, or relatively higher dimensional space, and reduces dimensionality, in one example, to retain useful information in the latent representation of the original data. The latent representation of the original data is scrambled to provide a scrambled representation of the latent representation, or scrambled representation, at 304. For example, a scrambler network, such as scrambler network 210, can be applied to add noise to the latent representation to obfuscate or muddle the latent representation. The scrambled representation at 304 can include an encrypted latent representation, which alters the scope of the latent representation. In one example, the scrambled representation is in the same dimensional space as the latent representation. In one example, the latent representation is scrambled with an artificial neural network.

[0026] A generated data is constructed from the scrambled representation via generative network, such as generative network 208, at 306. For example, the scrambled representation is provided to the generative network, which constructs the generative data in a higher dimensionality than the scrambled representation or the latent representation. In one example of the generative network applied to construct generated data, the generative network can be provided with the latent representation and may reconstruct the original data from the latent representation. In one example, the generated data is constructed from the scrambled representation via an artificial neural network. [0027] The generated data is compared to the original data to obtain a generative error at 308. The scrambled representation is compared to the latent representation to obtain a scrambler error at 310. In one example, the generated data and the original data are in the same dimensional space. The generative error can be determined from how different the generative data is from the original data, such as via mean square error as a metric. The scrambled representation and the latent representation can also be in the same dimensional space, which is a different dimensional space than the generated data and the original data. The amount of mutual information in the latent representation and the scrambled representation can be applied as a metric to determine the scrambler error.

[0028] The generative error and the scrambler error are applied to an objective function to increase the generative error and to reduce the scrambler error at 312. The contradicting features of the objective function, to increase generative error while reducing scrambler error, or maximizing generative error while reducing scrambler error, can be applied with an adversarial network, such as a cGAN. The cGAN can provide adjusted weights, or a first set of weighted updates, to be applied to the scrambler network 210 or adjust the weights of the neural network applied to provide the scrambled representation from the latent representation. Also, the cGAN can provide adjusted weights, or a second set of weighted updates, to be applied to the generative network 208 or adjust the weights of the neural network applied to construct the generated data from the scrambled representation.

[0029] Figure 4 illustrates an example system 400 that can implement system 200 to apply methods 100 and 300. System 400 can include a computing device 402, such a plurality of computing devices, having a processor 404 and a memory 406 to store instructions and data that are executable by the processor 404. For example, instructions stored in memory 406 to be executed by processor 404 can include scrambler network 410 as an artificial neural network parameterized by a weights, the scrambler network 410 to receive a latent representation and provide a scrambled representation; generative network 412 as an artificial neural network parameterized by a weights, the generative network 412 to receive the scrambled representation and construct generator data; and an additional artificial neural network 414, which may include an adversarial network such as a cGAN, to updates the weights of the scrambler network 410 and the weights of the generative network 412 so as to preserve enough information between the scrambled representation and the latent representation, and thus between the scrambled representation and the original data while making difficult for the generative network 412 to reproduce the original data.

[0030] The processor 404 may include two or more processing cores on a chip or two or more processor chips. In some examples, the computing device 402 can also have additional processing or specialized processors (not shown), such as a graphics processor for general-purpose computing on graphics processor units, to perform processing functions offloaded from the processor 404. The memory 406 may be arranged in a hierarchy and may include one or more levels of cache. Depending on the configuration and type of computing device, memory 406 may be volatile (such as random access memory (RAM)), non-volatile (such as read only memory (ROM), flash memory, etc.), or some combination of the two. Memory can include memory devices and storage devices for storage of readable instructions, data structures, program modules or other data that can be accessed by the computing device 402. Accordingly, a propagating signal by itself does not qualify as storage media. In one example, the computing device 402 can include communication connections that can facilitate the computing device 402 to be used as part of a computer network, which is a collection of computing devices and possibly other devices interconnected by communications channels that facilitate communications and allows sharing of resources and information among interconnected devices. Examples of computer networks include a local area network, a wide area network, the internet, or other network.

[0031] In one example, one or more of computing devices 402 can be configured as servers in a datacenter to provide distributed computing services such as cloud computing services. A data center can provide pooled resources on which customers or tenants can dynamically provision and scale applications as needed without having to add servers or additional networking. The datacenter can be configured to communicate with local computing devices such used by cloud consumers including personal computers, mobile devices, embedded systems, or other computing devices that may provide original data or latent representations of the original data. Within the data center, computing device 402 can be configured as servers, either as stand alone devices or individual blades in a rack of one or more other server devices. One or more host processors, such as processors 404, as well as other components including memory 406, on each server run a host operating system that can support multiple virtual machines. A tenant may initially use one virtual machine on a server to run an application. The datacenter may activate additional virtual machines on a server or other servers when demand increases, and the datacenter may deactivate virtual machines as demand drops. A cloudcomputing environment can be implemented in a recognized model to run in a data center including network-connected datacenters.

[0032] Although specific examples have been illustrated and described herein, a variety of alternate and/or equivalent implementations may be substituted for the specific examples shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific examples discussed herein. Therefore, it is intended that this disclosure be limited only by the claims and the equivalents thereof.

Claims

1 . A method, comprising: comparing a generated data from a scrambled representation of a latent representation of original data to the original data to obtain a generative error, the scrambled representation provided from a scrambler network applied to the latent representation of the original data and the generated data provided from a generative network applied to the scrambled representation; comparing the scramble representation of the latent representation of the original data to the latent representation to obtain a scrambler error; and adjusting the generative network to increase the generative error of the scrambled representation and the scrambler network to reduce the scrambler error of the scrambled representation.

2. The method of claim 1 wherein the adjusting the generative network and the scrambler network includes applying weighted updates to the generative network and the scrambler network.

3. The method of claim 2 wherein the weighted updates replace weights that parameterize the generative network and the scrambler network.

4. The method of claim 2 wherein the adjusting the generative network and the scrambler network includes applying a first set of weighted updates to the scrambler network and applying a second set of weighted updates to the generative network.

5. The method of claim 1 wherein the adjusting the generative network includes minimizing the scrambler error while maximizing the generative error.

6. The method of claim 1 wherein the scrambler error is based on mutual information and the generative error is based on mean square error.

7. A system, comprising: a scrambler network to scramble a latent representation of original data and provide a scrambled representation; a generative network to generate a generated data from the scrambled representation; a generative comparator to receive the original data and the generated data and provide a generative error a scrambler comparator to receive the latent representation and the scrambled representation and provide a scrambler error; and a network to receive the generative error and the scrambler error and adjust the scrambler network and the generative network to increase the generative error and to reduce the scrambler error.

8. The system of claim 7 wherein the scrambler network is based on a first artificial neural network and the generative network is based on a second artificial neural network, the first and second artificial neural networks parameterized by weights wherein the network adjusts the scrambler network and the generative network via weighted updates applied to the weights.

9. The system of claim 7 wherein the latent representation is received from an encoder network.

10. The system of claim 9 wherein the encoder network is operably coupled via a communication connection to a device that provides the original data.

11 . The system of claim 7 wherein the network is an adversarial network based on an artificial neural network.

12 The system of claim 11 wherein the adversarial network is a conditional general adversarial network. 16

13. A method, comprising: encoding a latent representation of an original data; scrambling the latent representation via a scrambler network to obtain a scrambled representation; constructing a generative data from the scrambled representation via a generative network, and comparing the generative data to the original data to obtain a generative error; comparing the scrambled representation to the latent representation to obtain a scrambler error; and applying the scrambler error and the generative error to adjust the scrambler network to reduce the scrambler error and to adjust the generative network to increase the generative error.

14. The method of claim 13 wherein the latent representation is in a lower dimensional space than the original data, the generative data is in a same dimensional space as the original data, and the scrambled representation is an encrypted version of the latent representation.

15. The method of claim 13 wherein the scrambler network and the generative network are adjusted via an objective function to maximize the generative error and minimize the scrambler error.