GB2616199A

GB2616199A - Automatically adjusting data access policies in data analytics

Info

Publication number: GB2616199A
Application number: GB2308825.5A
Authority: GB
Inventors: K Baughman Aaron; Kwatra Shikhar; Ekambaram Vijay; Narotambhai Marvaniya Smitkumar
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-11-24
Filing date: 2021-10-14
Publication date: 2023-08-30
Anticipated expiration: 2041-10-14
Also published as: CN116490871A; US20220164457A1; JP2023550445A; DE112021006167T5; GB2616199B; GB202308825D0; WO2022111112A1

Abstract

From a first model parameter, an autoencoder network is generated. A reconstruction error for the autoencoder network is measured, the reconstruction error comprising a difference between an input to the autoencoder network and a corresponding output from the autoencoder network, the input to the autoencoder network comprising a portion of an initial set of data. The reconstruction error and a confidence score corresponding to a complexity level of the autoencoder network are aggregated into a level of difficulty score of the autoencoder network. From the level of difficulty score and an initial data access policy level corresponding to the initial set of data, a derived data access policy level corresponding to the initial data access policy level is generated, the derived data access policy level enforcing access to a transformed set of data generated by applying a transformation to the initial set of data.

Claims

1. A computer-implemented method comprising: generating, from a first model parameter, an autoencoder network; measuring a reconstruction error for the autoencoder network, the reconstruction error comprising a difference between an input to the autoencoder network and a corresponding output from the autoencoder networ k, the input to the autoencoder network comprising a portion of an initial s et of data; aggregating, into a level of difficulty score of the autoencoder network, the reconstruction error and a confidence score corresponding to a comple xity level of the autoencoder network; and generating, from the level of difficulty score and an initial data access policy leve l corresponding to the initial set of data, a derived data access policy level corresponding to the initial data acce ss policy level, the derived data access policy level enforcing access to a transformed se t of data generated by applying a transformation to the initial set of dat a.

2. The computer-implemented method of claim 1, further comprising: training, using a training subset of the initial set of data, the autoencoder network.

3. The computer-implemented method of claim 2, wherein the training is performed to minimize a reconstruction error of t he autoencoder network.

4. The computer-implemented method of claim 2, wherein the training is performed to minimize a difference between an out put of an encoder portion of the autoencoder network and a transformed set of data generated by applying the transformation to the training subset.

5. The computer-implemented method of claim 1, further comprising: measuring, for the autoencoder network, the complexity level.

6. The computer-implemented method of claim 1, further comprising: generating, from the level of difficulty score, a set of model parameters, a second model parameter in the set of model parameters comprising a vari ation from the first model parameter; generating, from the set of model parameters, a set of autoencoder networks; measuring a model-specific reconstruction error of each autoencoder networ k in the set of autoencoder networks, the model-specific reconstruction error comprising a difference between a n input to an autoencoder network in the set of autoencoder networks and a corresponding output from the autoencoder network in the set of autoencod er networks, the input to the autoencoder network in the set of autoencoder networks c omprising the portion of the initial set of data; and aggregating, into a level of difficulty score of the set of autoencoder networks, the model-specific reconstruction error of each autoencoder network and a set of confidence scores, each confidence score corresponding to a complexity level of an autoencod er network in the set of autoencoder networks.

7. The computer-implemented method of claim 1, wherein the model parameter comprises a number of hidden layers in an enc oder portion of the autoencoder network and a number of hidden layers in a decoder portion of the autoencoder network.

8. The computer-implemented method of claim 1, wherein the model parameter comprises a number of dimensions in an output of an encoder portion of the autoencoder network.

9. A computer program product for automatically adjusting a data access polic y, the computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: program instructions to generate, from a first model parameter, an autoencoder network; program instructions to measure a reconstruction error for the autoencoder network, the reconstruction error comprising a difference between an input to the autoencoder network and a corresponding output from the autoencoder networ k, the input to the autoencoder network comprising a portion of an initial s et of data; program instructions to aggregate, into a level of difficulty score of the autoencoder network, the reconstruction error and a confidence score corresponding to a comple xity level of the autoencoder network; and program instructions to generate, from the level of difficulty score and an initial data access policy leve l corresponding to the initial set of data, a derived data access policy level corresponding to the initial data acce ss policy level, the derived data access policy level enforcing access to a transformed se t of data generated by applying a transformation to the initial set of dat a.

10. The computer program product of claim 9, further comprising: program instructions to train, using a training subset of the initial set of data, the autoencoder network.

11. The computer program product of claim 10, wherein the training is performed to minimize a reconstruction error of t he autoencoder network.

12. The computer program product of claim 10, wherein the training is performed to minimize a difference between an out put of an encoder portion of the autoencoder network and a transformed set of data generated by applying the transformation to the training subset.

13. The computer program product of claim 9, further comprising: program instructions to measure, for the autoencoder network, the complexity level.

14. The computer program product of claim 9, further comprising: program instructions to generate, from the level of difficulty score, a set of model parameters, a second model parameter in the set of model parameters comprising a vari ation from the first model parameter; program instructions to generate, from the set of model parameters, a set of autoencoder networks; program instructions to measure a model-specific reconstruction error of e ach autoencoder network in the set of autoencoder networks, the model-specific reconstruction error comprising a difference between a n input to an autoencoder network in the set of autoencoder networks and a corresponding output from the autoencoder network in the set of autoencod er networks, the input to the autoencoder network in the set of autoencoder networks c omprising the portion of the initial set of data; and program instructions to aggregate, into a level of difficulty score of the set of autoencoder networks, the model-specific reconstruction error of each autoencoder network and a set of confidence scores, each confidence score corresponding to a complexity level of an autoencod er network in the set of autoencoder networks.

15. The computer program product of claim 9, wherein the model parameter comprises a number of hidden layers in an enc oder portion of the autoencoder network and a number of hidden layers in a decoder portion of the autoencoder network.

16. The computer program product of claim 9, wherein the model parameter comprises a number of dimensions in an output of an encoder portion of the autoencoder network.

17. The computer program product of claim 9, wherein the stored program instructions are stored in the at least one of the one or more storage media of a local data processing system, and wherein the stored program instructions are transferred over a networ k from a remote data processing system.

18. The computer program product of claim 9, wherein the stored program instructions are stored in the at least one of the one or more storage media of a server data processing system, and wherein the stored program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system.

19. The computer program product of claim 9, wherein the computer program product is provided as a service in a cloud environment.

20. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storag e devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising: program instructions to generate, from a first model parameter, an autoencoder network; program instructions to measure a reconstruction error for the autoencoder network, the reconstruction error comprising a difference between an input to the autoencoder network and a corresponding output from the autoencoder networ k, the input to the autoencoder network comprising a portion of an initial s et of data; program instructions to aggregate, into a level of difficulty score of the autoencoder network, the reconstruction error and a confidence score corresponding to a comple xity level of the autoencoder network; and program instructions to generate, from the level of difficulty score and an initial data access policy leve l corresponding to the initial set of data, a derived data access policy level corresponding to the initial data acce ss policy level, the derived data access policy level enforcing access to a transformed set of data gen erated by applying a transformation to the initial set of data.