CN113536327A - Data processing method, device and system - Google Patents

Data processing method, device and system Download PDF

Info

Publication number
CN113536327A
CN113536327A CN202010313781.5A CN202010313781A CN113536327A CN 113536327 A CN113536327 A CN 113536327A CN 202010313781 A CN202010313781 A CN 202010313781A CN 113536327 A CN113536327 A CN 113536327A
Authority
CN
China
Prior art keywords
data
sensitive data
key
sensitive
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010313781.5A
Other languages
Chinese (zh)
Inventor
安金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010313781.5A priority Critical patent/CN113536327A/en
Publication of CN113536327A publication Critical patent/CN113536327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a data processing method, a data processing device and a data processing system, and relates to the technical field of computers. One embodiment of the method comprises: dividing a sensitive data island in a data lake; responding to a data writing request sent by a production database, and traversing data to be stored corresponding to the data writing request; when the traversal result indicates that the data to be stored comprises sensitive data, generating a corresponding key for the sensitive data; encrypting the sensitive data by using the secret key; and storing the encrypted sensitive data to a sensitive data island, and storing non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake. The implementation mode realizes the encrypted storage of the sensitive data in the data lake, thereby ensuring the security of the sensitive data in the data lake.

Description

Data processing method, device and system
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, apparatus, and system.
Background
The data lake is a centralized repository allowing all structured and unstructured data to be stored at any scale. At present, the data lake is adopted to store data in an increasingly wide mode.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
because the data lake stores the data integrally, the sensitive data in the data lake has potential safety hazard.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, apparatus, and system, which implement encrypted storage of sensitive data in a data lake, thereby ensuring security of the sensitive data in the data lake.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:
dividing a sensitive data island in a data lake;
responding to a data writing request sent by a production database, and traversing data to be stored corresponding to the data writing request;
when the traversal result indicates that the data to be stored comprises sensitive data, generating a corresponding key for the sensitive data;
encrypting the sensitive data using the key;
storing the encrypted sensitive data to the sensitive data island, and storing non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake.
Preferably, the data processing method further includes: setting a plurality of key generation schemes;
generating a corresponding key for the sensitive data, comprising:
selecting a target key generation scheme from the plurality of key generation schemes;
and generating a corresponding key for the sensitive data by using the target key generation scheme.
Preferably, the first and second electrodes are formed of a metal,
after the step of generating the corresponding key for the sensitive data, further comprising:
generating a corresponding key feature identifier for the key;
correspondingly storing the key and the key characteristic identification, and providing the key characteristic identification for an authorized user of sensitive data;
and providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
Preferably, the first and second electrodes are formed of a metal,
the key characteristic identification comprises sensitive data storage address information;
the step of providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user comprises the following steps:
when receiving a query request with the key feature identifier, searching a key according to the key feature identifier, and searching encrypted sensitive data according to the sensitive data storage address information;
and decrypting the searched encrypted sensitive data by using the searched key, and providing the decrypted sensitive data to the authorized user.
Preferably, the first and second electrodes are formed of a metal,
the key characteristic identification further comprises decryption key information and sensitive data storage address information;
the step of providing corresponding sensitive data for the authorized user according to the key characteristic identification provided by the authorized user comprises the following steps:
when a query request with the key feature identifier is received, analyzing sensitive data storage address information included in the feature identifier of the target encryption scheme, and searching encrypted sensitive data according to the sensitive data storage address information;
and sending the encrypted sensitive data to the authorized user, so that the terminal or the server side where the authorized user is located analyzes the decryption key information included in the key feature identifier, and decrypting the encrypted sensitive data by using the analyzed decryption key.
Preferably, the data lake is deployed in a cluster;
the step of dividing sensitive data islands comprises: and dividing a plurality of sensitive data storage nodes from the cluster, wherein the plurality of sensitive data storage nodes form the sensitive data island.
Preferably, the data processing method further includes: setting a rule engine, wherein the rule engine comprises a configured sensitive use case and sensitive characteristics obtained through a machine learning model;
the step of traversing the data to be stored corresponding to the data writing request comprises the following steps:
reading the data to be stored through the rule engine, and judging whether part of the read data to be stored meets the configured sensitive use case or the sensitive feature,
if so, determining the sensitive data in the data to be stored, and taking the determined sensitive data in the data to be stored as a traversal result.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including: a traversal unit, an encryption unit and a storage processing unit, wherein,
the traversal unit is used for responding to a data writing request sent by a production database and traversing data to be stored corresponding to the data writing request;
the encryption unit is used for generating a corresponding key for the sensitive data when the traversal result of the traversal unit comprises the sensitive data in the data to be stored; encrypting the sensitive data using the key;
the storage processing unit is used for dividing a sensitive data island in a data lake; and storing the sensitive data encrypted by the encryption unit into the sensitive data island, and storing non-sensitive data in the data to be stored into a storage area outside the sensitive data island in the data lake.
Preferably, the first and second electrodes are formed of a metal,
the encryption unit is further used for setting a plurality of key generation schemes and selecting a target key generation scheme from the plurality of key generation schemes; and generating a corresponding key for the sensitive data by using the target key generation scheme.
Preferably, the data processing apparatus further includes: a unit is provided in which, among other things,
the encryption unit is further configured to generate a corresponding key feature identifier for the key; correspondingly storing the key and the key feature identifier;
the providing unit is used for providing the key characteristic identification generated by the encryption unit to an authorized user of the sensitive data, and providing the corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
In a third aspect, an embodiment of the present invention provides a data processing system, including: a production database, a data lake, and any one of the data processing apparatuses described above, wherein,
the production database is used for transmitting the data of the production database as the data to be stored;
the data processing device is used for dividing the sensitive data island for the data lake; traversing the data to be stored, and generating a corresponding key for sensitive data when the traversal result comprises the sensitive data in the data to be stored; encrypting the sensitive data using the key; sending the encrypted sensitive data to the sensitive data island, and sending non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake;
the data lake comprises: the sensitive data island and a storage area outside the sensitive data island, wherein,
the sensitive data island is used for storing the encrypted sensitive data;
and the storage area outside the sensitive data island is used for storing non-sensitive data in the data to be stored.
One embodiment of the above invention has the following advantages or benefits: by traversing the data to be stored, generating a key for the sensitive data when the sensitive data is traversed, encrypting the sensitive data through the key, then storing the encrypted sensitive data into a sensitive data island in a data lake, and storing the non-sensitive data in the data to be stored into a storage area outside the sensitive data island in the data lake, the encrypted storage of the sensitive data and the separate storage of the sensitive data and the non-sensitive data in the data lake are realized, and the security of the sensitive data in the data lake is ensured.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of generating a corresponding key for sensitive data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a main flow of providing stored sensitive data to a user according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a main flow for providing stored sensitive data to a user according to another embodiment of the present invention;
FIG. 5 is a schematic diagram of a main flow for providing stored sensitive data to a user according to yet another embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a main flow of traversing data to be stored corresponding to a data write request according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of the main elements of a data processing apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of the major elements of a data processing system according to an embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 10 is a schematic block diagram of a computer system suitable for use with a server implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The data lake is a centralized repository allowing all structured and unstructured data to be stored at any scale. It can store data as is (without first structuring the data), can run different types of analysis, such as from control panel and visualization to big data processing, real-time analysis and machine learning, to guide making better decisions.
Additionally, the data lake may store relationship data from line of business applications, as well as non-relationship data from mobile applications, IoT devices, and social media. When data is captured, no data structure or data pattern is defined. This means that the data lake can store all the data without the need for elaborate and knowing which answers to questions the user may need in the future. Users can obtain insight on data using different types of analytics, such as SQL queries, big data analytics, full text searches, real-time analytics, and machine learning.
At present, data lakes are widely used, and no proper method for ensuring the safety of sensitive data exists for the sensitive data in the data lakes.
The sensitive data refers to some data threatening the safety of user information or the safety of a business system, for example, identity information of a user such as an identity card number, a password, business data which is generated by the business system and needs to be kept secret, transaction data in an e-commerce system, and user information in a logistics system such as address information, contact information, names and the like.
Fig. 1 is a data processing method according to an embodiment of the present invention, and as shown in fig. 1, the data processing method may include the following steps:
s100: dividing a sensitive data island in a data lake;
s101: responding to a data writing request sent by a production database, and traversing data to be stored corresponding to the data writing request;
s102: when the traversal result indicates that the data to be stored comprises sensitive data, generating a corresponding key for the sensitive data;
s103: encrypting the sensitive data by using the secret key;
s104: storing the encrypted sensitive data to the sensitive data island, and storing non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake.
However, the step S100 is not necessarily executed in each data processing procedure. In the first aspect, generally, when the data lake is first enabled, the above step S100 may be performed first for data processing (storage of sensitive data and non-sensitive data) based on the data lake that is first enabled, and accordingly, the step S101 may be performed directly for data processing (storage of sensitive data and non-sensitive data) not based on the data lake that is first enabled. In the second aspect, step S100 may also be executed at each data processing, and the divided sensitive data island stores sensitive data in the data to be currently stored. In the third aspect, after the steps S101 to S104 are executed for multiple times, when the remaining space of the sensitive data island is insufficient, step S100 is executed again to further divide a new sensitive data island, so as to increase the storage space of the sensitive data island and achieve the capacity expansion capability of the sensitive data island.
The data writing request in step S101 may be to receive an electrical signal sent by the production database or start information, so as to establish a communication connection between the data lake and the production database, so that the data lake receives a data stream transmitted by the production database, that is, data to be stored. In the scheme provided by the embodiment of the invention, in the process of transmitting the data to be stored to the data lake by the production database, the data stream (the data to be stored) can be directly traversed, and the sensitive data is extracted from the data to be stored as part of the traversal result. Traversing the data to be stored means that when the data to be stored passes through one node in the transmission line, the data to be stored passing through the node is read or the data to be stored passing through the node is scanned to obtain sensitive data existing in the data to be stored.
The result of the traversal in the step S102 indicates that the data to be stored includes sensitive data, and specifically, the result of the traversal may include sensitive data in the data to be stored; the traversal result may not include sensitive data, but only the data to be stored is found to include sensitive data, and feature labeling may be performed in the traversal result (for example, labeling 1 in the traversal result indicates that there is sensitive data in the traversed data to be stored), where a specific labeling manner is not limited to numbers, letters, and the like, as long as whether the data to be stored includes sensitive data can be reflected in the traversal result. In addition, when the traversal result does not include sensitive data but only finds that the data to be stored includes the sensitive data, the feature labeling can be performed on the sensitive data included in the data to be stored, so as to facilitate the subsequent identification, search or search of the sensitive data.
In step S104, the extracted sensitive data may be replaced by a data stream after being encrypted by the key, so that the encrypted sensitive data and the non-sensitive data are stored together; the encrypted sensitive data can also be stored separately.
The production database refers to a database for generating data, such as a database corresponding to a business system.
In the embodiment shown in fig. 1, by traversing the data to be stored, when the sensitive data is traversed, a key is generated for the sensitive data, the sensitive data is encrypted by the key, the encrypted sensitive data is stored in a sensitive data island in a data lake, and the non-sensitive data in the data to be stored is stored in a storage area outside the sensitive data island in the data lake, so that the sensitive data is encrypted and stored, and the sensitive data and the non-sensitive data are stored separately in the data lake, thereby ensuring the security of the sensitive data in the data lake.
In addition, the sensitive data and the non-sensitive data are stored separately through the process, and the sensitive data island can be specially protected so as to further improve the safety of the sensitive data.
In addition, compared with the data volume of the stored data, the data volume of the data to be stored is much smaller, so that compared with the method for searching the sensitive data from the stored data, the sensitive data is traversed from the data to be stored, and the sensitive data searching efficiency can be improved.
In an embodiment of the present invention, the data processing method may further include: setting a plurality of key generation schemes; accordingly, as shown in FIG. 2, generating a corresponding key for sensitive data may include the steps of:
s201: selecting a target key generation scheme from a plurality of key generation schemes;
s202: and generating a corresponding key for the sensitive data by using the target key generation scheme.
Step S201 may be to randomly select a target key generation scheme; the key generation schemes can also be selected as target key generation schemes in a sequential round-robin manner according to the storage order of the plurality of key generation schemes. The plurality of key generation schemes may include a symmetric key generation scheme, an asymmetric key generation scheme, and a hybrid symmetric-asymmetric key generation scheme. The key can be generated based on the authorized user information corresponding to the data to be stored, regardless of a symmetric key generation scheme, an asymmetric key generation scheme or a symmetric-asymmetric key mixed generation scheme.
The specific implementation manner of the step S202 of generating the corresponding key for the sensitive data may be: generating a key for all sensitive data in the data to be stored traversed by step S101 (the data to be stored generally refers to data from the same user or the same production database); the following steps can be also included: sensitive data is segmented and a corresponding key is generated for each segment of sensitive data (the key corresponding to each segment of sensitive data is typically generated by a different target key generation scheme).
Through the key generation process, an attacker is difficult to obtain the key for encrypting the sensitive data, the difficulty of key cracking is increased, and the security of the sensitive data can be further improved.
In addition, a key management engine can be provided for a plurality of key generation schemes, and the key management engine can provide a modification interface for an authorized user to modify the key generation scheme or add a new key generation scheme. The key generation scheme may be set according to the actual needs of the user. Correspondingly, after the key generation scheme is modified or a new key generation scheme is added, the modified key generation scheme or the new key generation scheme is selected for the subsequent process of generating the key for the sensitive data.
In an embodiment of the present invention, as shown in fig. 3, after the step of generating the corresponding key for the sensitive data, for providing the stored sensitive data for the user, the following steps may be further included:
s301: generating a corresponding key feature identifier for the key;
s302: correspondingly storing the key and the key characteristic identification, and providing the key characteristic identification for an authorized user of the sensitive data;
s303: and providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
The authorized users of the sensitive data may be a production database for sending the sensitive data, a service system for generating the sensitive data, and/or a terminal, and the authorized users have key feature identifiers, and can obtain the corresponding sensitive data through the key feature identifiers.
The key feature identifier may include a character string converted from the sensitive data storage address information, a key storage sequence number, and other random codes, where the character string converted from the sensitive data storage address information may be implemented by using an existing character string conversion method, such as using an ASCII code table for character string conversion.
The specific implementation of storing the key and the key feature identifier correspondingly may be that the encrypted key storage address and the encrypted key feature identifier are stored in a key management table; therefore, the key storage address can be determined according to the key characteristic identifier, and the key can be found through the decrypted key storage address. The key and the key feature identifier are stored separately, preferably on different nodes in the cluster, so that after the node where the key feature identifier is located is attacked, the security of the key can still be ensured, and an attacker who simply obtains the key can hardly locate the position of sensitive data. And the sensitive data storage address can be obtained only when the key characteristic identifier and the key are obtained, so that the sensitive data can be obtained. In addition, by the method for generating the key feature identifier, after the key management table is attacked, the administrator can conveniently recover the corresponding key management table for the key.
The key characteristic identification is obtained by an attacker, effective information is difficult to analyze from the key characteristic identification, and the security of the key and sensitive data is further ensured through the key characteristic identification.
In one embodiment of the invention, the key signature includes sensitive data storage address information; accordingly, as shown in fig. 4, the steps for providing the authorized user with the corresponding sensitive data may include:
s401: when receiving a query request with a key feature identifier, searching a key according to the key feature identifier, and searching encrypted sensitive data according to sensitive data storage address information;
s402: and decrypting the searched encrypted sensitive data by using the searched key, and providing the decrypted sensitive data for the authorized user.
The key feature identifier may include sensitive data storage address information, and the key feature identifier includes a character string converted from the sensitive data storage address information by a preset conversion rule (the preset conversion rule is converted by using a corresponding relationship in an ASCII code table, for example).
The decryption is realized before the encrypted key is sent to the authorized user through the process, so that the authorized user is prevented from managing the key, namely, the encryption and the decryption are finished by the same equipment, and the security of the encryption and the decryption can be ensured.
In an embodiment of the present invention, the key feature identifier further includes decryption key information and sensitive data storage address information; accordingly, as shown in fig. 5, the steps for providing the authorized user with the corresponding sensitive data may include:
s501: when a query request with a key feature identifier is received, analyzing sensitive data storage address information included in the feature identifier of the target encryption scheme, and searching encrypted sensitive data according to the sensitive data storage address information;
s502: and sending the encrypted sensitive data to the authorized user, so that the terminal or the server side where the authorized user is located analyzes the decryption key information included in the key feature identification, and decrypting the encrypted sensitive data by using the analyzed decryption key.
The decryption key information and the sensitive data storage address information can be character strings converted by the existing character conversion technology, and generally only a client or a server used by an authorized user can convert the decryption key information into a corresponding decryption key. For a symmetric key generation scheme, the decryption key is the same as the key used to encrypt the sensitive data mentioned above; for the asymmetric key generation scheme, the key for encrypting the sensitive data is a generated private key; the decryption key is the corresponding public key.
The process of analyzing the sensitive data storage address information is to convert a character string corresponding to the sensitive data storage address information into a corresponding sensitive data storage address.
The terminal or the server analyzes the decryption key information included in the key feature identifier, namely, converts a character string corresponding to the decryption key information into a corresponding decryption key.
The transmission of the encrypted sensitive data is realized through the process, and the security of the transmission of the sensitive data is ensured.
In one embodiment of the invention, a data lake is deployed in a cluster; accordingly, the step of dividing out the sensitive data islands may comprise: and dividing a plurality of sensitive data storage nodes from the cluster, wherein the plurality of sensitive data storage nodes form a sensitive data island. The sensitive data and the non-sensitive data are physically and separately stored, and therefore the safety of the sensitive data is further guaranteed.
In one embodiment of the invention, a rule engine is provided, wherein the rule engine comprises configured sensitive use cases and sensitive features obtained through a machine learning model; accordingly, as shown in fig. 6, a specific embodiment of the data to be stored corresponding to the traversal data write request may include:
s601: reading data to be stored through a rule engine;
s602: judging whether the read part of the data to be stored meets the configured sensitive use case or sensitive feature, if so, executing S603; otherwise, executing S604;
s603: determining sensitive data in the data to be stored, taking the determined sensitive data in the data to be stored as a traversal result, and ending the current process;
s604: it is determined that the data to be stored does not include sensitive data.
It is worth noting that the sensitive use cases in the rule engine and the sensitive features obtained through the machine learning model can be modified or added. The sensitive use case is, for example, an account number, an identity card number, a mobile phone number, and the like, and the sensitive use case can be manually configured for a user. The machine learning model can be an existing support vector machine, a Bayesian algorithm, a neural network and the like.
In addition, the steps of traversing the data to be stored corresponding to the data writing request can be realized by adopting a rule engine. The various key generation schemes and the generation of the corresponding key for the sensitive data can be realized by an encryption engine; whereas the scheme presented in figure 4 above may be implemented by a decryption engine. Based on this, the data processing method provided by the embodiment of the present invention may obtain the sensitive data through one general engine (the general engine includes a rule engine, an encryption engine, and a decryption engine), encrypt or decrypt the sensitive data, and may call different engines for different steps or phases.
As shown in fig. 7, an embodiment of the present invention provides a data processing apparatus 700, where the data processing apparatus 700 may include: a traversal unit 701, an encryption unit 702, and a storage processing unit 703, wherein,
the traversal unit 701 is configured to traverse data to be stored corresponding to a data write request in response to the data write request sent by the production database;
the encryption unit 702 is configured to generate a corresponding key for the sensitive data when the result traversed by the traversal unit 701 includes the sensitive data in the data to be stored; encrypting the sensitive data by using the secret key;
the storage processing unit 703 is used for storing the processing unit, is used for dividing out the sensitive data island in the data lake; sensitive data encrypted by the encryption unit 702 is stored in the sensitive data island, and non-sensitive data in the data to be stored is stored in a storage area outside the sensitive data island in the data lake.
In an embodiment of the present invention, the encryption unit 702 is further configured to set a plurality of key generation schemes, and select a target key generation scheme from the plurality of key generation schemes; and generating a corresponding key for the sensitive data by using the target key generation scheme.
In an embodiment of the present invention, the data processing apparatus 700 further includes: a unit (not shown in the figures) is provided, in which,
an encrypting unit 702, further configured to generate a corresponding key feature identifier for the key; correspondingly storing the key and the key characteristic identifier;
and a providing unit (not shown in the figure) for providing the key feature identifier generated by the encryption unit to an authorized user of the sensitive data, and providing the corresponding sensitive data for the authorized user according to the key feature identifier of the authorized user.
In an embodiment of the present invention, the key feature identifier further includes decryption key information and sensitive data storage address information; accordingly, the number of the first and second electrodes,
a providing unit (not shown in the figure), configured to, when receiving a query request with a key feature identifier, parse sensitive data storage address information included in the feature identifier of the target encryption scheme, and search for encrypted sensitive data according to the sensitive data storage address information; and sending the encrypted sensitive data to the authorized user, so that the terminal or the server side where the authorized user is located analyzes the decryption key information included in the key feature identification, and decrypting the encrypted sensitive data by using the analyzed decryption key.
Wherein, the above units can be realized by corresponding engines. For example, the traversal unit may be implemented by a rule engine (not shown in the figure), the rule engine is configured with a sensitive use case and a sensitive feature obtained by a machine learning model, and the rule engine may be configured to read data to be stored, determine whether a part of the read data to be stored satisfies the configured sensitive use case or the sensitive feature, determine, if yes, sensitive data in the data to be stored, and use the determined sensitive data in the data to be stored as a traversal result, otherwise, determine that the data to be stored does not include the sensitive data.
As another example, the encryption unit may be implemented by an encryption engine (not shown). For another example, the decryption portion of the rendering unit (not shown) may be implemented by a decryption engine (not shown).
As shown in fig. 8, an embodiment of the present invention provides a data processing system 800, where the data processing system 800 includes: production database 801, data lake 802, and data processing apparatus 700 provided in any of the embodiments described above, wherein,
the production database 801 is used for transmitting data of the production database as data to be stored;
a data processing device 700 for dividing a sensitive data island for a data lake; traversing the data to be stored transmitted by the production database 801, and generating a corresponding key for the sensitive data when the traversal result includes the sensitive data in the data to be stored; encrypting the sensitive data by using the secret key; sending the encrypted sensitive data to a sensitive data island 8021, and sending the non-sensitive data in the data to be stored to a storage area 8022 outside the sensitive data island in the data lake 802;
data lake 802 includes: a sensitive data island 8021 and a storage area 8022 outside the sensitive data island, wherein,
the sensitive data island 8021 is used for storing the encrypted sensitive data;
a storage area 8022 outside the sensitive data islands for storing non-sensitive data of the data to be stored.
Fig. 9 shows an exemplary system architecture 900 of a data processing method or data processing apparatus to which embodiments of the present invention may be applied.
As shown in fig. 9, system architecture 900 may include end devices 901, 902, 903, network 904, server 905, production database 906, data lake 907, and query server 908. Network 904 is used to provide a medium for communication links between end devices 901, 902, 903 and server 905, between server 905 and production database 906, between server 905 and data lake 907, between production database 906 and data lake 907, and between server 905 and query server 908. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have installed thereon various messenger client applications such as, for example only, a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc.
The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 905 may be a server providing various services, such as a server managing sensitive data transmitted by the production database 906 to the data lake 907, and a background management server (for example only) providing support for a data query request sent by a user using the terminal device 901, 902, 903 or the query server 908. The backend management server may analyze and otherwise process the traversed data such as the sensitive data, and store the processing result (e.g., the encrypted sensitive data — just an example) in the data lake.
It should be noted that the data processing method provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the data processing apparatus is generally disposed in the server 905.
It should be understood that the numbers of terminal devices, networks, servers, production database servers, data lake servers, and query servers in fig. 9 are merely illustrative. There may be any number of terminal devices, networks, servers, production database servers, data lake servers, and query servers, as desired for an implementation.
Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use as a server in implementing embodiments of the present invention is shown. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a traversal unit, an encryption unit, and a storage processing unit. The names of the units do not form a limitation to the unit itself under certain conditions, for example, a traversal unit may also be described as a unit for traversing data to be stored corresponding to a data write request.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: dividing a sensitive data island in a data lake; responding to a data writing request sent by a production database, and traversing data to be stored corresponding to the data writing request; when the traversal result comprises sensitive data in the data to be stored, generating a corresponding key for the sensitive data; encrypting the sensitive data by using the secret key; and storing the encrypted sensitive data to the sensitive data island, and storing the non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake.
The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: setting a plurality of key generation schemes; selecting a target key generation scheme from a plurality of key generation schemes; and generating a corresponding key for the sensitive data by using the target key generation scheme.
The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: generating a corresponding key feature identifier for the key; correspondingly storing the key and the key characteristic identification, and providing the key characteristic identification for an authorized user of the sensitive data; and providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: the key characteristic identifier comprises sensitive data storage address information; when receiving a query request with a key feature identifier, searching a key according to the key feature identifier, and searching encrypted sensitive data according to sensitive data storage address information; and decrypting the searched encrypted sensitive data by using the searched key, and providing the decrypted sensitive data for the authorized user.
The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: the key characteristic identification further comprises decryption key information and sensitive data storage address information; when a query request with a key feature identifier is received, analyzing sensitive data storage address information included in the feature identifier of the target encryption scheme, and searching encrypted sensitive data according to the sensitive data storage address information; and sending the encrypted sensitive data to the authorized user, so that the terminal or the server side where the authorized user is located analyzes the decryption key information included in the key feature identification, and decrypting the encrypted sensitive data by using the analyzed decryption key.
According to the technical scheme of the embodiment of the invention, the sensitive data in the data lake is encrypted and stored by traversing the data to be stored, generating the key for the sensitive data when the sensitive data is traversed, encrypting the sensitive data through the key, and then storing the encrypted sensitive data and the non-sensitive data in the data to be stored in the data lake, so that the security of the sensitive data in the data lake is ensured.
According to the technical scheme of the embodiment of the invention, compared with the data volume of the stored data, the data volume of the data to be stored is much smaller, so that compared with the method for searching the sensitive data from the stored data, the sensitive data is traversed from the data to be stored, and the sensitive data searching efficiency can be improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A data processing method, comprising:
dividing a sensitive data island in a data lake;
responding to a data writing request sent by a production database, and traversing data to be stored corresponding to the data writing request;
when the traversal result indicates that the data to be stored comprises sensitive data, generating a corresponding key for the sensitive data;
encrypting the sensitive data using the key;
storing the encrypted sensitive data to the sensitive data island, and storing non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake.
2. The data processing method of claim 1,
further comprising: setting a plurality of key generation schemes;
generating a corresponding key for the sensitive data, comprising:
selecting a target key generation scheme from the plurality of key generation schemes;
and generating a corresponding key for the sensitive data by using the target key generation scheme.
3. The data processing method of claim 2,
after the step of generating the corresponding key for the sensitive data, further comprising:
generating a corresponding key feature identifier for the key;
correspondingly storing the key and the key characteristic identification, and providing the key characteristic identification for an authorized user of sensitive data;
and providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
4. The data processing method of claim 3,
the key characteristic identification comprises sensitive data storage address information;
the step of providing corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user comprises the following steps:
when receiving a query request with the key feature identifier, searching a key according to the key feature identifier, and searching encrypted sensitive data according to the sensitive data storage address information;
and decrypting the searched encrypted sensitive data by using the searched key, and providing the decrypted sensitive data to the authorized user.
5. The data processing method of claim 3,
the key characteristic identification further comprises decryption key information and sensitive data storage address information;
the step of providing corresponding sensitive data for the authorized user according to the key characteristic identification provided by the authorized user comprises the following steps:
when a query request with the key feature identifier is received, analyzing sensitive data storage address information included in the feature identifier of the target encryption scheme, and searching encrypted sensitive data according to the sensitive data storage address information;
and sending the encrypted sensitive data to the authorized user, so that the terminal or the server side where the authorized user is located analyzes the decryption key information included in the key feature identifier, and decrypting the encrypted sensitive data by using the analyzed decryption key.
6. The data processing method of claim 1, wherein the data lake is deployed in a cluster;
the step of dividing sensitive data islands comprises: and dividing a plurality of sensitive data storage nodes from the cluster, wherein the plurality of sensitive data storage nodes form the sensitive data island.
7. The data processing method according to any one of claims 1 to 5,
further comprising: setting a rule engine, wherein the rule engine comprises a configured sensitive use case and sensitive characteristics obtained through a machine learning model;
the step of traversing the data to be stored corresponding to the data writing request comprises the following steps:
reading the data to be stored through the rule engine, and judging whether part of the read data to be stored meets the configured sensitive use case or the sensitive feature,
if so, determining the sensitive data in the data to be stored, and taking the determined sensitive data in the data to be stored as a traversal result.
8. A data processing apparatus, comprising: a traversal unit, an encryption unit and a storage processing unit, wherein,
the traversal unit is used for responding to a data writing request sent by a production database and traversing data to be stored corresponding to the data writing request;
the encryption unit is used for generating a corresponding key for the sensitive data when the traversal result of the traversal unit comprises the sensitive data in the data to be stored; encrypting the sensitive data using the key;
the storage processing unit is used for dividing a sensitive data island in a data lake; and storing the sensitive data encrypted by the encryption unit into the sensitive data island, and storing non-sensitive data in the data to be stored into a storage area outside the sensitive data island in the data lake.
9. The data processing apparatus of claim 8,
the encryption unit is further used for setting a plurality of key generation schemes and selecting a target key generation scheme from the plurality of key generation schemes; and generating a corresponding key for the sensitive data by using the target key generation scheme.
10. The data processing apparatus of claim 8, further comprising: a unit is provided in which, among other things,
the encryption unit is further configured to generate a corresponding key feature identifier for the key; correspondingly storing the key and the key feature identifier;
the providing unit is used for providing the key characteristic identification generated by the encryption unit to an authorized user of the sensitive data, and providing the corresponding sensitive data for the authorized user according to the key characteristic identification of the authorized user.
11. A data processing system, comprising: production database, data lake and data processing device according to one of claims 8 to 10, wherein,
the production database is used for transmitting the data of the production database as the data to be stored;
the data processing device is used for dividing the sensitive data island for the data lake; traversing the data to be stored, and generating a corresponding key for sensitive data when the traversal result comprises the sensitive data in the data to be stored; encrypting the sensitive data using the key; sending the encrypted sensitive data to the sensitive data island, and sending non-sensitive data in the data to be stored to a storage area outside the sensitive data island in the data lake;
the data lake comprises: the sensitive data island and a storage area outside the sensitive data island, wherein,
the sensitive data island is used for storing the encrypted sensitive data;
and the storage area outside the sensitive data island is used for storing non-sensitive data in the data to be stored.
12. An electronic device for data processing, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010313781.5A 2020-04-20 2020-04-20 Data processing method, device and system Pending CN113536327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010313781.5A CN113536327A (en) 2020-04-20 2020-04-20 Data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010313781.5A CN113536327A (en) 2020-04-20 2020-04-20 Data processing method, device and system

Publications (1)

Publication Number Publication Date
CN113536327A true CN113536327A (en) 2021-10-22

Family

ID=78123673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010313781.5A Pending CN113536327A (en) 2020-04-20 2020-04-20 Data processing method, device and system

Country Status (1)

Country Link
CN (1) CN113536327A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679301A (en) * 2022-03-01 2022-06-28 北京明朝万达科技股份有限公司 Method and system for accessing data lake data by using security sandbox
CN115174246A (en) * 2022-07-18 2022-10-11 中国银行股份有限公司 Information processing method and system
CN116915760A (en) * 2023-09-12 2023-10-20 哈尔滨工程大学三亚南海创新发展基地 Full-network data communication packaging method and system based on http
CN117556447A (en) * 2023-11-29 2024-02-13 金网络(北京)数字科技有限公司 Data encryption method and device based on classification recognition and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848944A (en) * 2005-04-05 2006-10-18 华为技术有限公司 IPTV system, enciphered digital programme issuing and watching method
US20110161656A1 (en) * 2009-12-29 2011-06-30 International Business Machines Corporation System and method for providing data security in a hosted service system
US20150248564A1 (en) * 2014-02-28 2015-09-03 International Business Machines Corporation Protecting sensitive data in software products and in generating core dumps
CN109271798A (en) * 2018-09-13 2019-01-25 深圳萨摩耶互联网金融服务有限公司 Sensitive data processing method and system
CN109298840A (en) * 2018-11-19 2019-02-01 平安科技(深圳)有限公司 Data integrating method, server and storage medium based on data lake
CN109726572A (en) * 2018-12-28 2019-05-07 中国移动通信集团江苏有限公司 Data management-control method, device, equipment, computer storage medium and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1848944A (en) * 2005-04-05 2006-10-18 华为技术有限公司 IPTV system, enciphered digital programme issuing and watching method
US20110161656A1 (en) * 2009-12-29 2011-06-30 International Business Machines Corporation System and method for providing data security in a hosted service system
US20150248564A1 (en) * 2014-02-28 2015-09-03 International Business Machines Corporation Protecting sensitive data in software products and in generating core dumps
CN109271798A (en) * 2018-09-13 2019-01-25 深圳萨摩耶互联网金融服务有限公司 Sensitive data processing method and system
CN109298840A (en) * 2018-11-19 2019-02-01 平安科技(深圳)有限公司 Data integrating method, server and storage medium based on data lake
CN109726572A (en) * 2018-12-28 2019-05-07 中国移动通信集团江苏有限公司 Data management-control method, device, equipment, computer storage medium and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679301A (en) * 2022-03-01 2022-06-28 北京明朝万达科技股份有限公司 Method and system for accessing data lake data by using security sandbox
CN114679301B (en) * 2022-03-01 2023-10-20 北京明朝万达科技股份有限公司 Method and system for accessing data of data lake by utilizing safe sandbox
CN115174246A (en) * 2022-07-18 2022-10-11 中国银行股份有限公司 Information processing method and system
CN115174246B (en) * 2022-07-18 2024-03-19 中国银行股份有限公司 Information processing method and system
CN116915760A (en) * 2023-09-12 2023-10-20 哈尔滨工程大学三亚南海创新发展基地 Full-network data communication packaging method and system based on http
CN116915760B (en) * 2023-09-12 2023-12-26 哈尔滨工程大学三亚南海创新发展基地 Full-network data communication packaging method and system based on http
CN117556447A (en) * 2023-11-29 2024-02-13 金网络(北京)数字科技有限公司 Data encryption method and device based on classification recognition and storage medium

Similar Documents

Publication Publication Date Title
US10903976B2 (en) End-to-end secure operations using a query matrix
CN113536327A (en) Data processing method, device and system
US20180212753A1 (en) End-To-End Secure Operations Using a Query Vector
CN112131599A (en) Method, device, equipment and computer readable medium for checking data
CN111339206B (en) Block chain-based data sharing method and device
US11442922B2 (en) Data management method, data management apparatus, and non-transitory computer readable medium
WO2024060630A1 (en) Data transmission management method, and data processing method and apparatus
CN107707528B (en) Method and device for isolating user information
CN115481440B (en) Data processing method, device, electronic equipment and medium
CN115599959A (en) Data sharing method, device, equipment and storage medium
CN112966286B (en) Method, system, device and computer readable medium for user login
CN111030930B (en) Decentralized network data fragment transmission method, device, equipment and medium
CN113761566A (en) Data processing method and device
CN111786874B (en) Caller identification method and device
CN113132115B (en) Certificate switching method, device and system
CN112069517B (en) Method and device for managing user rights
CN116112172B (en) Android client gRPC interface security verification method and device
CN111783044B (en) Method and device for sharing login state
CN110784602B (en) Soft telephone communication method, device, terminal and storage medium
CN113420331B (en) Method and device for managing file downloading permission
CN110602076B (en) Identity using method, device and system based on master identity multiple authentication
CN114090893A (en) Data query method, system, device, computer readable medium and electronic equipment
CN117609971A (en) User identity verification method and device
CN115952524A (en) Data writing method, data query device, equipment and medium
CN114428967A (en) Data transmission method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination