CN110968885A - Model training data storage method and device, electronic equipment and storage medium - Google Patents

Model training data storage method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110968885A
CN110968885A CN201911310374.2A CN201911310374A CN110968885A CN 110968885 A CN110968885 A CN 110968885A CN 201911310374 A CN201911310374 A CN 201911310374A CN 110968885 A CN110968885 A CN 110968885A
Authority
CN
China
Prior art keywords
data
model training
segment
segments
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911310374.2A
Other languages
Chinese (zh)
Inventor
赖伟彬
周茜
马泽祥
杨潇峰
朱霍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911310374.2A priority Critical patent/CN110968885A/en
Publication of CN110968885A publication Critical patent/CN110968885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides a model training data storage method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining model training data; segmenting the model training data into at least two data segments; respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments; storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments. According to the model training data storage method and device, the electronic equipment and the storage medium, the risk of data leakage is avoided to a certain extent for single model training data through the processing mode of dividing, encrypting and separately storing the data. If the data are user privacy data, the disclosure of the user privacy is avoided.

Description

Model training data storage method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a model training data storage method and device, electronic equipment and a storage medium.
Background
Machine learning, a subject that is dedicated to study how to improve the performance of the system itself by means of calculation and by using experience. Machine learning is the main focus of research on algorithms that generate "models" from data on a computer, i.e., "learning algorithms" (learning algorithms). The process of learning a model from data is called "learning" or "training", and is accomplished by executing some learning algorithm. The data used in the training process is called "training data".
Training data used for model training is directly stored on a training machine (a device for completing training), and a method for ensuring that the training data is not leaked is needed.
Disclosure of Invention
In view of the above, an objective of the embodiments of the present invention is to provide a method and an apparatus for storing model training data, an electronic device, and a storage medium, so as to solve the above problems.
In view of the above object, a first aspect of the embodiments of the present invention provides a model training data storage method, including:
obtaining model training data;
segmenting the model training data into at least two data segments;
respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments;
storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments.
Optionally, segmenting the model training data into at least two data segments comprises:
and respectively segmenting each piece of model training data into at least two data segments.
Optionally, segmenting the model training data into at least two data segments includes:
determining the number of data segmentation segments;
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
Optionally, the data segment markers include data markers and segment markers;
the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data.
Optionally, storing the at least two encrypted data segments in a storage container, comprising:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
Optionally, the unique segment identifier of at least one data segment in the at least two data segments is the same as the unique segment identifier of a data segment obtained by segmenting the other model training data;
storing the at least two encrypted data segments in a storage container, comprising:
storing the encrypted data segments having the same unique segment identification in the same storage container;
storing the encrypted data segments having different unique segment identifications in different storage containers.
Optionally, the encrypting the at least two data fragments respectively to obtain at least two encrypted data fragments includes:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key for encrypting the data segment in a key management center.
Optionally, the model training data storage method further includes:
retrieving the encrypted data segment of the model training data from the storage container;
retrieving a key of the encrypted data segment from the key management and control center;
decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment;
and synthesizing the data fragments, and restoring to obtain model training data.
Optionally, the retrieving the key of the encrypted data segment from the key management center further includes:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
Optionally, the storage container is a device for model training.
Optionally, the step of dividing the model training data into at least two data segments and the step of encrypting the at least two data segments to obtain at least two encrypted data segments are performed in a transient memory.
In a second aspect of the embodiments of the present invention, there is provided a model training data storage device, including:
the acquisition module is used for acquiring model training data;
a segmentation module for segmenting the model training data into at least two data segments;
the encryption module is used for respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments;
a storage module for storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments.
Optionally, the segmentation module is configured to:
and respectively segmenting each piece of model training data into at least two data segments.
Optionally, the segmentation module is configured to:
determining the number of data segmentation segments;
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
Optionally, the data segment markers include data markers and segment markers;
the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data.
Optionally, the storage module is configured to:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
Optionally, the unique segment identifier of at least one data segment in the at least two data segments is the same as the unique segment identifier of a data segment obtained by segmenting the other model training data; the storage module is configured to:
storing the encrypted data segments having the same unique segment identification in the same storage container, and storing the encrypted data segments having different unique segment identifications in different storage containers.
Optionally, the encryption module is configured to:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key for encrypting the data segment in a key management center.
Optionally, the model training data storage device further comprises:
a decryption module for retrieving the encrypted data segments of the model training data from the storage container; retrieving a key of the encrypted data segment from the key management and control center; decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment;
and the synthesis module is used for synthesizing the data fragments and restoring to obtain model training data.
Optionally, the decryption module is configured to:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
Optionally, the storage container is a device for model training.
Optionally, the step of dividing the model training data into at least two data segments and the step of encrypting the at least two data segments to obtain at least two encrypted data segments are performed in a transient memory.
In a third aspect of the embodiments of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method when executing the program.
In a fourth aspect of embodiments of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method.
As can be seen from the foregoing, the model training data storage method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present invention combine data segmentation and encryption during data processing, and adopt an isolation storage manner, thereby avoiding a risk of data leakage.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of an embodiment of a model training data storage system according to the present invention;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a model training data storage method according to the present invention;
FIG. 3 is a schematic flow chart diagram illustrating another embodiment of a model training data storage method according to the present invention;
FIG. 4 is a schematic block diagram illustrating one embodiment of a model training data store provided by the present invention;
fig. 5 is a schematic structural diagram of an embodiment of a device for implementing the model training data storage method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
In a model training implementation mode, training data of model training is directly stored on a training machine, and data are prevented from being leaked in a network isolation mode. In a financial scene, a large amount of user privacy data can be circulated in each link of model training, and if only network isolation protection is provided, the data security can face greater challenges. This is because developers may view and even download the data through technical means, and lack a management tool that is difficult to track once leaked. Under the requirement of data security, the sensitive data is expected to be safe and controllable in the model training stage, so that data leakage is avoided.
For example, when an identification card recognition model is trained, user identification card picture information and identification card marking information need to be stored in a training machine at the same time, training data on a disk can be directly read during model training, an algorithm engineer of the training data can easily contact the data, the data can be directly checked as long as the algorithm engineer has machine authority, even the data can be downloaded through some channels, and great risk of data leakage exists.
In another implementation, the data is secured by encrypting the training data. However, the encryption strategy is relatively single, the same encryption algorithm is usually used for encrypting the same data, and if leakage occurs, all data can be decrypted only by cracking to obtain the key.
Based on this, the embodiment of the invention provides a model training data storage system, which can improve the safety of training data to a certain extent. FIG. 1 is a schematic structural diagram of an embodiment of a model training data storage system provided by the invention. Referring to FIG. 1, the model training data storage system may include a model training data storage and at least two storage containers. The model training data storage device and the storage container are both devices with data processing functions, and data exchange can be carried out between the model training data storage device and the storage container. The device having a data processing function may be, for example, a server, a distributed storage device, or the like. The model training data storage device and the storage container can realize data exchange through a network. The network may be a wired network or a wireless network. Optionally, in a model training scenario, the model training data storage device may be a training machine (i.e., an apparatus for performing model training), and a data exchange channel exists between the training machine and the storage container, so as to implement data exchange.
Referring to fig. 1, for example, when the training data needs to be stored, the model training data storage device firstly divides the model training data into at least two data segments (n data segments shown in fig. 1), each data segment is only a part of the model training data, and a single data segment cannot be restored to the model training data, so that the content of the corresponding model training data cannot be viewed when the single data segment is obtained, and the data content can only be viewed after all the data segments are restored to the model training data. Here, the segmentation method of the model training data may be various, and is not limited to any data segmentation method. Optionally, an equal-size segmentation mode may be adopted, that is, the model training data is segmented into equal-size data segments according to a preset segmentation number; or the model training data is segmented into data segments with equal size according to the size of a preset data segment, and the size of the last segmented data segment is not larger than the size of the preset data segment.
Next, at least two divided data fragments are encrypted to obtain at least two encrypted data fragments (n encrypted data fragments shown in fig. 1). Alternatively, the key used for the encryption process is different. For example, for data segment 1, the encryption key is key 1, for data segment 2, the encryption key is key 2, and so on, for data segment n, the encryption key is key n, and key 1, key 2 … … differ from key n to key n. Of course, this is only an example description, and the encryption method is various, and different encryption strategies may be adopted.
Finally, storing the at least two encrypted data segments in a storage container; in other words, the storage containers of the at least two encrypted data fragments are not single, and it is required to ensure that at least one encrypted data fragment and other encrypted data fragments are stored in different storage containers. For example, as shown in fig. 1, an encrypted data segment 1 is stored in a storage container 1, an encrypted data segment 2 is stored in a storage container 2, and so on, and an encrypted data segment n is stored in a storage container n. Of course, the separate storage is only required to achieve that, when a certain storage container is invaded, all encrypted data fragments which can be restored into the model training data cannot be obtained, and thus the model training data cannot be obtained, therefore, in some scenarios, it is not required to separately set a storage container for each encrypted data fragment, the number of the storage containers may be as small as only two, and of all encrypted data fragments of the same model training data, only one encrypted data fragment may be stored in, for example, the storage container 1, while the other encrypted data fragments are stored in, for example, the storage container 2, so as to achieve the effect that the separate storage is intended to achieve here.
It should be noted that the model training data refers to individual model training data, which is sample data used for algorithm model training, and the data representation form of the model training data includes: pictures, text, video, audio, etc. Therefore, by the processing mode of segmenting, encrypting and separately storing the data, the data security of the single model training data is greatly improved, and the risk of data leakage is avoided to a certain extent.
FIG. 2 is a flow chart illustrating an embodiment of a model training data storage method provided by the present invention.
As shown in fig. 2, the model training data storage method includes:
step 11: model training data is obtained.
Here, the model training data refers to model training data to be stored, and at the same time, the model training data is individual model training data, which is sample data for algorithmic model training, and data representation forms of which include: pictures, text, video, audio, etc. Optionally, the model training data may be labeled model training data or unlabeled model training data.
Step 12: the model training data is segmented into at least two data segments.
Here, the data division may refer to dividing a complete model training data into multiple parts according to a predetermined rule and strategy, and ensuring that the model training data can be restored.
Optionally, segmenting the model training data into at least two data segments includes:
and respectively segmenting each piece of model training data into at least two data segments.
Therefore, each piece of data is divided, each piece of data is guaranteed to be composed of data fragments, even if one part of the data fragments forming the data is leaked and intercepted, the data is difficult to restore into complete data, and therefore the data is prevented from being leaked.
Optionally, segmenting the model training data into at least two data segments includes:
determining the number of data segmentation segments, i.e. how many segments the model training data should be segmented into, which may be preset or set according to the size of the model training data (e.g. larger data, larger segmentation number, smaller data, smaller segmentation number);
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
Here, the method of dividing the model training data according to the number of data division pieces may be equal-size division, that is, dividing the model training data into equal-size data pieces according to the number of data division pieces. Of course, it is understood that other division methods may be used, and the division method is not limited to the example division method.
Optionally, the data segment markers include data markers and segment markers; the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data. Here, the unique segment id does not mean that the segment id is unique, but means that the segment id is unique in all data segments into which the same data is divided.
For example, referring to fig. 3, for data a, it has a data flag a, which is the unique data identifier of data a, when data a is divided into n segments, segment flags may be set in turn for each segment, as shown in fig. 3, the segment flag of the first data segment of data a is 1, and then the data segment flag of the first data segment of data a is the combination of the data flag of data a and the segment flag of the first data segment, i.e., a segment.1. In this way, for the data a, the data segment markers of the n data segments are the a segment.1 and the a segment.2 2 … … a segment.n, respectively, because the data markers of the data a are unique, the data segment markers of the data segments of the data a are also unique as long as the segment markers of the data segments are different from each other (i.e., the segment markers are unique in all the data segments into which the same data is divided).
Similarly, for data B, if the data a is identified as above, the data segments of the n data segments are respectively labeled as B segment.1 and B segment.2 2 … … B segment.n.
Step 13: and respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments.
Here, the encryption processing may be performed by converting a plaintext into a ciphertext by an encryption algorithm and an encryption key. The specific encryption mode can be selected according to the requirement, and is not limited to a certain encryption mode.
Optionally, the encrypting the at least two data fragments respectively to obtain at least two encrypted data fragments includes:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key used to encrypt the data segment at a key management center (KMI).
Here, the key used for encryption, where the key tag of the key corresponds to the data segment tag of the encrypted data segment, may mean that when the data segment tag of a certain data segment is known, the corresponding key tag of the key used for encryption may be found accordingly, and then the corresponding key is found according to the key tag, so that the corresponding encrypted data segment can be decrypted by using the key.
Optionally, referring to fig. 3, the key token and the segment token are in a one-to-one correspondence relationship.
For example, as shown in fig. 3, for data segment 1 of data a, the key used for encryption may be key 1, and the key label of the key is, for example, K1. Thus, when the data segment 1 of a certain data needs to be encrypted, only the key 1 with the key mark of K1 needs to be found, and the key can be used for encrypting the data segment 1; meanwhile, when the encrypted data segment 1 needs to be decrypted, only the key 1 with the key mark K1 needs to be found, and the encrypted data segment 1 can be decrypted.
Similarly, for the data segments 2 to n, the keys used for encryption may be keys 2 to n in sequence, the corresponding key labels may be K2 to Kn, the keys may be stored in the key management and control center (KMI) in a unified manner, and when decryption is required using the keys, the keys corresponding to the key labels may be called from the key management and control center (KMI).
Through the implementation mode, different keys are adopted for encrypting different data fragments after the data fragments are divided, so that the cracking cost of a user is increased, and the data security is improved.
It should be noted that, after the data segment is encrypted, the encrypted data segment is obtained, and for the encrypted data segment, the unique data segment mark is also carried, so that the encrypted data segment can be identified by the data segment mark.
Optionally, the step 12 of dividing the model training data into at least two data segments and the step 13 of encrypting the at least two data segments respectively to obtain at least two encrypted data segments are both performed in a transient memory (e.g., a memory of a device), that is, the entire data dividing and encrypting process is completed in the transient memory without any intermediate data being stored in the transient memory, so that data security is further improved.
Step 14: storing the at least two encrypted data segments in a storage container; in other words, the storage containers of the at least two encrypted data fragments are not single, and it is required to ensure that at least one encrypted data fragment and other encrypted data fragments are stored in different storage containers, that is, the encrypted data fragments of the same model training data are stored in a separated storage manner. The separate storage is only needed to achieve that when a certain storage container is invaded, all encrypted data fragments which can be restored into the model training data cannot be obtained, and therefore the model training data cannot be obtained.
Optionally, storing the at least two encrypted data segments in a storage container, comprising:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
Through the correspondence between the unique fragment identification of the encrypted data fragment and the storage container, the storage and the calling of the encrypted data fragment can be better realized.
Optionally, the unique segment identifier of at least one data segment in the at least two data segments is the same as the unique segment identifier of a data segment obtained by segmenting the other model training data;
storing the at least two encrypted data segments in a storage container, comprising:
storing the encrypted data segments having the same unique segment identification in the same storage container;
storing the encrypted data segments having different unique segment identifications in different storage containers.
For example, referring to fig. 3, if the data fragment identifiers carried by each encrypted data fragment of data a are respectively a fragment.1, a fragment.2 2 … … a fragment.n, for a fragment.1, it is stored in the storage container 1 (which can be represented as the server 1), for a fragment.2, it is stored in the storage container 2 (which can be represented as the server 2), and so on, for a fragment.n, it is stored in the storage container n (which can be represented as the server n).
Similarly, for the data B, if the data fragment carried by each encrypted data fragment of the data B is identified as the B fragment.1, the B fragment.2 2 … … B fragment.n, for the B fragment.1, the data fragment is stored in the storage container 1 (which can be represented as the server 1), for the B fragment.2, the data fragment is stored in the storage container 2 (which can be represented as the server 2), and so on, for the B fragment.n, the data fragment is stored in the storage container n (which can be represented as the server n).
Therefore, the encrypted data segments of the same data are stored separately, and meanwhile, the corresponding encrypted data segments can be found in the corresponding storage containers more easily according to the correspondence between the segment identifiers and the storage container sequence numbers.
Optionally, the storage container is a device for model training. Because the algorithm model training is on the training machine, in order to reduce the network I/O in the training phase, the overall efficiency is improved by directly storing the data on the training machine or a specific storage machine.
It can be seen from the foregoing embodiments that, in the model training data storage method provided in the embodiments of the present invention, data partitioning and encryption are combined during data processing, and an isolation storage manner is adopted, so that a risk of data leakage is avoided. Because the data is the data segment when being stored, the privacy of the user cannot be exposed even if the data segment is leaked, and meanwhile, because the isolated storage mode is adopted, a complete data packet is difficult to obtain from one storage container.
In some embodiments, as shown in fig. 2, the model training data storage method further includes:
step 15: retrieving the encrypted data segment of the model training data from the storage container.
Step 16: and calling the key of the encrypted data fragment from the key management and control center.
Optionally, the retrieving the key of the encrypted data segment from the key management center further includes:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
Therefore, the security is improved by performing the authority verification when the key is acquired. In addition, when the key is called, the key management and control center can record the record of key acquisition, so that tracking is realized.
And step 17: and decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment.
Step 18: and synthesizing the data fragments, and restoring to obtain model training data, so that the model training data obtained by restoration can be used for carrying out corresponding model training.
Optionally, the step 17 and the step 18 of decrypting the encrypted data segments and synthesizing the encrypted data segments into the model training data are both performed in the transient memory, so that no intermediate data and model training data are stored on the ground, and the security of the data is ensured.
Optionally, the whole decryption and synthesis process can be packaged into a Software Development Kit (SDK), the unified data SDK is used for reading data from the specified storage during model training, and the model training is not intrusive by opening KMI, so that the use cost is low.
It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.
FIG. 4 is a flow chart illustrating another embodiment of a model training data storage method provided by the present invention.
As shown in fig. 4, the model training data storage device includes:
an obtaining module 21, configured to obtain model training data;
a segmentation module 22 configured to segment the model training data into at least two data segments;
the encryption module 23 is configured to encrypt the at least two data segments respectively to obtain at least two encrypted data segments;
a storage module 24 for storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments.
It can be seen from the foregoing embodiments that, in the model training data storage apparatus provided in the embodiments of the present invention, data partitioning and encryption are combined during data processing, and an isolation storage manner is adopted, so that a risk of data leakage is avoided.
As an alternative embodiment, the segmentation module is configured to:
and respectively segmenting each piece of model training data into at least two data segments.
As an alternative embodiment, the segmentation module 22 is configured to:
determining the number of data segmentation segments;
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
Optionally, the data segment markers include data markers and segment markers;
the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data.
As an alternative embodiment, the storage module 24 is configured to:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
Optionally, the unique segment identifier of at least one data segment in the at least two data segments is the same as the unique segment identifier of a data segment obtained by segmenting the other model training data; the storage module is configured to:
storing the encrypted data segments having the same unique segment identification in the same storage container, and storing the encrypted data segments having different unique segment identifications in different storage containers.
As an alternative embodiment, the encryption module 23 is configured to:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key for encrypting the data segment in a key management center.
In some embodiments, the model training data storage device further comprises:
a decryption module 25, configured to retrieve the encrypted data segment of the model training data from the storage container; retrieving a key of the encrypted data segment from the key management and control center; decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment;
and the synthesis module 26 is configured to synthesize the data segments and restore the data segments to obtain model training data.
As an alternative embodiment, the decryption module 25 is configured to:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
Optionally, the storage container is a device for model training.
Optionally, the step of dividing the model training data into at least two data segments and the step of encrypting the at least two data segments to obtain at least two encrypted data segments are performed in a transient memory.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 5 shows a more specific hardware structure diagram of the electronic device provided in this embodiment. The apparatus may include: a processor 31, a memory 32, an input/output interface 33, a communication interface 34, and a bus 35. Wherein the processor 31, the memory 32, the input/output interface 33 and the communication interface 34 are communicatively connected to each other within the device via a bus 35.
The processor 31 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The Memory 32 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 32 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 32 and called by the processor 31 for execution.
The input/output interface 33 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 34 is used for connecting a communication module (not shown in the figure) to realize the communication interaction of the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 35 includes a path that transfers information between the various components of the device, such as processor 31, memory 32, input/output interface 33, and communication interface 34.
It should be noted that although the above-mentioned device only shows the processor 31, the memory 32, the input/output interface 33, the communication interface 34 and the bus 35, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (24)

1. A model training data storage method, comprising:
obtaining model training data;
segmenting the model training data into at least two data segments;
respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments;
storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments.
2. The method of claim 1, wherein segmenting the model training data into at least two data segments comprises:
and respectively segmenting each piece of model training data into at least two data segments.
3. The method of claim 2, wherein segmenting the model training data into at least two data segments comprises:
determining the number of data segmentation segments;
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
4. The method of claim 3, wherein the data segment tags comprise data tags and segment tags;
the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data.
5. The method of claim 4, wherein storing the at least two encrypted data segments in a storage container comprises:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
6. The method according to claim 5, wherein the unique segment identifier of at least one of the at least two data segments is the same as the unique segment identifier of the data segment segmented by the other model training data;
storing the at least two encrypted data segments in a storage container, comprising:
storing the encrypted data segments having the same unique segment identification in the same storage container;
storing the encrypted data segments having different unique segment identifications in different storage containers.
7. The method of claim 3, wherein the encrypting the at least two data segments respectively to obtain at least two encrypted data segments comprises:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key for encrypting the data segment in a key management center.
8. The method of claim 7, further comprising:
retrieving the encrypted data segment of the model training data from the storage container;
retrieving a key of the encrypted data segment from the key management and control center;
decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment;
and synthesizing the data fragments, and restoring to obtain model training data.
9. The method of claim 8, wherein retrieving keys for the encrypted data segments from the key management center further comprises:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
10. The method of claim 1, wherein the storage container is a device for model training.
11. The method of claim 1, wherein the steps of segmenting the model training data into at least two data segments and encrypting the at least two data segments respectively to obtain at least two encrypted data segments are performed in a transient memory.
12. A model training data storage device, comprising:
the acquisition module is used for acquiring model training data;
a segmentation module for segmenting the model training data into at least two data segments;
the encryption module is used for respectively encrypting the at least two data fragments to obtain at least two encrypted data fragments;
a storage module for storing the at least two encrypted data segments in a storage container; wherein at least one of the at least two encrypted data segments is stored in a different storage container than the other encrypted data segments.
13. The method of claim 12, wherein the partitioning module is to:
and respectively segmenting each piece of model training data into at least two data segments.
14. The apparatus of claim 13, wherein the means for segmenting is configured to:
determining the number of data segmentation segments;
and segmenting the model training data into a corresponding number of data segments according to the number of the data segmentation segments, and endowing each data segment with a unique data segment mark.
15. The apparatus of claim 14, wherein the data segment tags comprise data tags and segment tags;
the data mark is a unique data mark of the model training data, and the segment mark is a unique segment mark of the data segment in the at least two data segments obtained by segmenting the same model training data.
16. The apparatus of claim 15, wherein the storage module is configured to:
and storing the corresponding encrypted data fragment into a corresponding storage container according to the unique fragment identifier in the encrypted data fragment.
17. The apparatus according to claim 16, wherein the unique segment identifier of at least one of the at least two data segments is the same as the unique segment identifier of the data segment segmented by another model training data; the storage module is configured to:
storing the encrypted data segments having the same unique segment identification in the same storage container, and storing the encrypted data segments having different unique segment identifications in different storage containers.
18. The apparatus of claim 14, wherein the encryption module is to:
respectively encrypting the at least two data fragments by using different keys to obtain at least two encrypted data fragments; each key is provided with a unique key mark, and the key mark is in a corresponding relation with a data fragment mark of a data fragment encrypted by the key mark;
storing the key for encrypting the data segment in a key management center.
19. The apparatus of claim 18, further comprising:
a decryption module for retrieving the encrypted data segments of the model training data from the storage container; retrieving a key of the encrypted data segment from the key management and control center; decrypting the encrypted data fragment by using the key to restore the encrypted data fragment into the data fragment;
and the synthesis module is used for synthesizing the data fragments and restoring to obtain model training data.
20. The apparatus of claim 19, wherein the decryption module is to:
sending a key calling request;
receiving a key calling permission verification message;
returning the key calling authority;
and if the authority passes the verification, receiving the key of the encrypted data fragment.
21. The apparatus of claim 12, wherein the storage container is a device for model training.
22. The apparatus of claim 12, wherein the steps of segmenting the model training data into at least two data segments and encrypting the at least two data segments respectively to obtain at least two encrypted data segments are performed in a transient memory.
23. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 11 when executing the program.
24. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 11.
CN201911310374.2A 2019-12-18 2019-12-18 Model training data storage method and device, electronic equipment and storage medium Pending CN110968885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911310374.2A CN110968885A (en) 2019-12-18 2019-12-18 Model training data storage method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911310374.2A CN110968885A (en) 2019-12-18 2019-12-18 Model training data storage method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110968885A true CN110968885A (en) 2020-04-07

Family

ID=70034948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911310374.2A Pending CN110968885A (en) 2019-12-18 2019-12-18 Model training data storage method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110968885A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984631A (en) * 2020-09-02 2020-11-24 深圳壹账通智能科技有限公司 Production data migration method and device, computer equipment and storage medium
CN113657955A (en) * 2021-06-30 2021-11-16 亳州市药通信息咨询有限公司 Chinese-medicinal material supply and demand resource allocation integration system based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607393A (en) * 2013-11-21 2014-02-26 浪潮电子信息产业股份有限公司 Data safety protection method based on data partitioning
CN105356997A (en) * 2015-08-06 2016-02-24 华南农业大学 Security distributed data management method based on public cloud
CN107430668A (en) * 2015-01-03 2017-12-01 迈克菲股份有限公司 Backed up for personal device and the safe distribution of cloud data
CN109936546A (en) * 2017-12-18 2019-06-25 北京三快在线科技有限公司 Data encryption storage method and device and calculating equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607393A (en) * 2013-11-21 2014-02-26 浪潮电子信息产业股份有限公司 Data safety protection method based on data partitioning
CN107430668A (en) * 2015-01-03 2017-12-01 迈克菲股份有限公司 Backed up for personal device and the safe distribution of cloud data
CN105356997A (en) * 2015-08-06 2016-02-24 华南农业大学 Security distributed data management method based on public cloud
CN109936546A (en) * 2017-12-18 2019-06-25 北京三快在线科技有限公司 Data encryption storage method and device and calculating equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕光明: "《应用统计专业硕士优秀教学案例集》", 31 May 2019, 北京:中国统计出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984631A (en) * 2020-09-02 2020-11-24 深圳壹账通智能科技有限公司 Production data migration method and device, computer equipment and storage medium
CN113657955A (en) * 2021-06-30 2021-11-16 亳州市药通信息咨询有限公司 Chinese-medicinal material supply and demand resource allocation integration system based on big data

Similar Documents

Publication Publication Date Title
CN108932297B (en) Data query method, data sharing method, device and equipment
US20160117518A1 (en) File Encryption/Decryption Device And File Encryption/Decryption Method
US20150078550A1 (en) Security processing unit with configurable access control
CN110263505B (en) Picture processing method and device based on block chain
CN107579962A (en) A kind of method and device of source code encryption and decryption
CN101853357A (en) Software protection method
CN107516045A (en) Document protection method and device
US10102386B2 (en) Decrypting content protected with initialization vector manipulation
CN116662941B (en) Information encryption method, device, computer equipment and storage medium
CN112417485B (en) Model training method, system and device based on trusted execution environment
US9449193B2 (en) Information processing apparatus
CN110968885A (en) Model training data storage method and device, electronic equipment and storage medium
CN102799815A (en) Method and device for safely loading program library
CN112434326A (en) Trusted computing method and device based on data flow
CN110032877A (en) Image access method and its system
CN112287376A (en) Method and device for processing private data
CN108985109B (en) Data storage method and device
EP2856377B1 (en) Identification and execution of subsets of a plurality of instructions in a more secure execution environment
CN108021801B (en) Virtual desktop-based anti-leakage method, server and storage medium
CN109919109A (en) Image-recognizing method, device and equipment
JP2012059258A (en) System and method for protecting electronic key
US11934539B2 (en) Method and apparatus for storing and processing application program information
CN111198692A (en) Installation package generation method and device
CN106485158A (en) A kind of transparent encryption method based on hdfs and system
CN110516468B (en) Method and device for encrypting memory snapshot of virtual machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026914

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20200407

RJ01 Rejection of invention patent application after publication