WO2017129138A1 - Data protection method and apparatus in data warehouse - Google Patents

Data protection method and apparatus in data warehouse Download PDF

Info

Publication number
WO2017129138A1
WO2017129138A1 PCT/CN2017/072699 CN2017072699W WO2017129138A1 WO 2017129138 A1 WO2017129138 A1 WO 2017129138A1 CN 2017072699 W CN2017072699 W CN 2017072699W WO 2017129138 A1 WO2017129138 A1 WO 2017129138A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
data
encryption
name
partition
Prior art date
Application number
PCT/CN2017/072699
Other languages
French (fr)
Chinese (zh)
Inventor
阳方
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017129138A1 publication Critical patent/WO2017129138A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Definitions

  • the present disclosure relates to the field of data security, for example, to a data protection method and apparatus in a data warehouse.
  • Hive is a data warehouse infrastructure. Hive is a mechanism for storing, querying, and analyzing data stored in the Hadoop Distributed File System (HDFS). It maps traditional structured data tables to data files on HDFS and provides simple Class structured query language (Structured Query Language, SQL) for querying. Among them, SQL is also called Hibernate Query Language (HQL), and Hive can convert SQL statements into programming model (MapReduce) tasks to run, thus achieving large-scale data processing.
  • the resource entity used to carry data in the Hive includes a database, a table, a partition, and a bucket.
  • the database can be regarded as a collection of multiple tables, so the user data can be considered to be stored in three entity objects: a table, a partition, and a bucket.
  • a column is a field in a table and is part of a partition. A column can be distributed across multiple partitions. Therefore, all operations in Hive can be viewed as operations on tables, columns, partitions, or buckets.
  • Hive's data security issues are reflected in the following aspects:
  • Each user in the Hive can authorize each other, but can only authorize the operation of the table. If the column, partition, or bucket cannot be authorized, the user can only be allowed or prohibited from accessing the specific table. Data is controlled at the column level, partition level, and bucket level, such as allowing or disabling only users to access several columns or accessing data for several buckets;
  • the data files corresponding to the data in the Hive are stored on the HDFS.
  • the storage format used by Hive is the text format (TEXT) or the Record Columnar File (RCFile). If a Hive User did not get other Hive users Authorization, obtain the permission to query the table, column, partition or bucket created by other Hive users, but the Hive user can still obtain the data information by directly obtaining the underlying HDFS file, which is equivalent to bypassing the upper layer of Hive.
  • the control mechanism poses a serious threat to the data security of the data warehouse.
  • Embodiments of the present disclosure provide a data protection method and apparatus in a data warehouse, which can perform access control and column, partition, or bucket level authority control on a data warehouse user, and combine encryption of user data to enable The data in the data warehouse is protected.
  • Embodiments of the present disclosure provide a data protection method in a data warehouse, including:
  • Receiving a user request where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request.
  • the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request.
  • the new data or the existing data in the data warehouse is encrypted.
  • the determining whether the user identity information and the operation request are legal including:
  • the unauthorized operation includes: creating a table operation, a data import operation, a modify table operation, an encryption operation on the existing data, and a data query operation.
  • encrypting the newly added data in the data warehouse including:
  • the first encryption configuration information includes one or more of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. Encryption algorithm and decryption algorithm;
  • the creation of the table operation carries the first encryption configuration information, storing the first encryption configuration information in the encryption information table;
  • the data import operation carries one of a name of a first target table, a name of a first target column, a name of a first target partition, and a name of a first target bucket that need to import data.
  • the first target table includes one or more tables in the created table
  • the first target column includes one or more columns of the created table
  • the first target partition includes the created table
  • the first target bucket including one or more buckets of the created table
  • encrypting the existing data in the data warehouse including:
  • the modification table operation carries the second encryption configuration information, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, a name of the second target partition, and a second One or more of the names of the target buckets, an encryption algorithm, and a decryption algorithm;
  • the modification table operation carries the second encryption configuration information, using the encryption algorithm carried in the modification table operation, the second target table, the second target column, the second target partition, and the Encrypting data in one or more of the second target buckets;
  • the second encrypted configuration information is stored in the encrypted information table.
  • encrypting the existing data in the data warehouse including:
  • the encryption operation on the existing data carries the name of the table to be processed and the third encryption configuration information
  • the third encryption configuration information includes the third target table that needs to be encrypted.
  • the acquiring the data query operation, and querying the third target table, the third target column, the third target partition, and the third according to the data query operation The data in one or more of the target buckets, the query data is obtained, including:
  • the data query operation carries the name of the third target table, the name of the third target column, the name of the third target partition, and the name of the third target bucket
  • Reading one or more of the third target table, the third target column, the third target partition, and the third target bucket from a distributed file system according to the data query operation The data;
  • the encrypted data is decrypted to obtain the query data.
  • the method further includes:
  • the operation request is an authorization operation, determining whether one or more of the table, column, partition, and bucket in which the user carried in the authorization operation authorizes other users to operate belongs to the user;
  • the method further includes:
  • the data in the encrypted information table is encrypted by using a preset encryption algorithm.
  • the embodiment of the present disclosure further provides a data protection device in a data warehouse, including:
  • the receiving module is configured to receive a user request, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level.
  • the operational requests includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level.
  • a first determining module configured to determine whether the user identity information and the operation request are legal
  • a second determining module configured to determine, when the first determining module determines that the identity information and the operation request are both legal, the operation type of the operation request, where the operation type includes an authorized operation and an unauthorized Operation;
  • the encryption module is configured to encrypt the newly added data or the existing data of the data warehouse when the second determining module determines that the operation request is an unauthorized operation.
  • the first determining module includes:
  • the first determining unit is configured to determine, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
  • the second determining unit is configured to determine whether the permission to execute the operation request is a preset operation authority of the user, and if yes, the operation request is legal.
  • the unauthorized operation includes: creating a table operation, a data import operation, a modify table operation, an encryption operation on the existing data, and a data query operation.
  • the encryption module includes:
  • a first obtaining unit configured to acquire the create table operation, and create a table according to the structural information of the table carried in the create table operation;
  • the third determining unit is configured to determine whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket.
  • a first storage unit configured to: when the third determining unit determines that the creation table operation carries the first encryption configuration information, storing the first encryption configuration information in the encryption information table;
  • a second obtaining unit configured to acquire the data import operation, where the data import operation carries a name of a first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target One or more of the names of the buckets, the first target table comprising one or more tables in the created table, the first target column comprising one or more columns of the created table, the first The target partition includes one or more partitions of the created table, the first target bucket including one or more buckets of the created table;
  • a fourth determining unit configured to determine whether one or more of the first target table, the first target column, the first target partition, and the first target bucket are in the encryption information table Storing a corresponding encryption algorithm
  • a third acquiring unit configured to: when the fourth determining unit determines one or more of the first target table, the first target column, the first target partition, and the first target bucket Acquiring one or more of the first target table, the first target column, the first target partition, and the first target bucket when a corresponding encryption algorithm is stored in the encryption information table Corresponding encryption algorithm;
  • a first encryption unit configured to acquire data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and utilize the acquired
  • the encryption algorithm encrypts the data to be imported to obtain the first encrypted data
  • a first writing unit configured to write the first encrypted data correspondingly to one of the first target table, the first target column, the first target partition, and the first target bucket Or a variety of.
  • the encryption module includes:
  • a fourth obtaining unit configured to acquire the modified table operation
  • the fifth determining unit is configured to determine whether the second encryption configuration information is carried in the operation of the modification table, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, and a second One or more of the name of the target partition and the name of the second target bucket, an encryption algorithm, and a decryption algorithm;
  • a second encryption unit configured to: when the second determination unit carries the second encryption configuration information in the operation of the modification table, using the encryption algorithm carried in the modification table operation on the second target table, Encrypting data in one or more of the second target column, the second target partition, and the second target bucket;
  • the second storage unit is configured to store the second encrypted configuration information in the encrypted information table.
  • the encryption module includes:
  • the fifth obtaining unit is configured to obtain the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encryption configuration information, where the third encryption configuration information includes One or more of an encrypted third target table name, a third target column name, a third target partition name, and a third target bucket name, an encryption algorithm, and a decryption algorithm, the third target table including One or more tables in the to-be-processed table, the third target column includes one or more columns of the to-be-processed table, and the third target partition includes one or more partitions of the to-be-processed table
  • the third target bucket includes one or more buckets of the to-be-processed table;
  • a query unit configured to acquire the data query operation, and query one of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more kinds, obtaining query data;
  • a second writing unit configured to write the query data into the temporary table correspondingly
  • the third encryption unit is configured to encrypt the query data written in the temporary table by using the encryption algorithm carried in the encryption operation on the existing data to obtain the second encrypted data;
  • a third writing unit configured to write the second encrypted data into the to-be-processed table in an overlay manner
  • a third storage unit configured to store the third encrypted configuration information in the encrypted information table
  • Delete the unit set to delete the temporary table.
  • the query unit includes:
  • a first obtaining subunit configured to acquire the data query operation, where the data query operation carries a name of the third target table, a name of the third target column, a name of the third target partition, and a location One or more of the names of the third target buckets;
  • Querying a subunit configured to read from the distributed file system the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more;
  • a determining subunit configured to determine whether the encrypted data exists in one or more of the third target table, the third target column, the third target partition, and the third target bucket;
  • a second obtaining subunit configured to determine one or more of the third target table, the third target column, the third target partition, and the third target bucket when the determining subunit determines And obtaining, in the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located;
  • the decryption subunit is configured to decrypt the encrypted data using the decryption algorithm to obtain the query data.
  • the device further includes:
  • an authorization module configured to: when the second determining module determines that the operation request is an authorization operation, determine one or more of a table, a column, a partition, and a bucket that the user carried in the authorization operation authorizes other users to operate Whether it belongs to the user, if yes, the authorization operation is performed, and if not, the authorization operation is cancelled.
  • the encryption module further includes:
  • the fourth encryption unit is configured to encrypt the data in the encrypted information table by using a preset encryption algorithm.
  • the present disclosure also provides a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the above method.
  • the present disclosure also provides an electronic device, including:
  • At least one processor At least one processor
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform any of the methods described above.
  • the data protection method in the data warehouse of the embodiment of the present disclosure can control the access of the data warehouse user by preventing the legality of the user identity information and the operation request carried in the received user request, and preventing the access of the illegal user;
  • User data can be flexibly manipulated through table-level, column-level, partition-level, and bucket-level operational requests; and data in the data warehouse can be protected by encrypting user data. Therefore, the data protection method in the data warehouse of the embodiment of the present disclosure can protect data in the data warehouse.
  • FIG. 1 is a flow chart showing a data protection method in a data warehouse of a first embodiment of the present disclosure
  • FIG. 2 is a block diagram showing the structure of a data protection device in a data warehouse according to a second embodiment of the present disclosure
  • FIG. 3 is a second structural block diagram of a data protection device in a data warehouse according to a second embodiment of the present disclosure
  • FIG. 4 is a block diagram showing the structure of a data protection device in a data warehouse according to a third embodiment of the present disclosure
  • FIG. 5 is a flowchart showing an application of a data protection device in a data warehouse in a third embodiment of the present disclosure
  • FIG. 6 is a first branch flow chart showing an application of a data protection device in a data warehouse of a third embodiment of the present disclosure
  • Figure 7 is a flow chart showing a second branch of the application of the data protection device in the data warehouse of the third embodiment of the present disclosure.
  • FIG. 8 is a third branch flow chart showing an application of a data protection device in a data warehouse of a third embodiment of the present disclosure
  • Figure 9 is a flowchart showing a fourth branch of the application of the data protection device in the data warehouse of the third embodiment of the present disclosure.
  • FIG. 10 is a block diagram showing the hardware structure of an electronic device according to a fifth embodiment of the present disclosure.
  • Hive's permission control process is unchanged.
  • the original data is encrypted by a third-party encryption tool, and the encrypted data is imported into the Hive table, and the query is received.
  • the data is exported from the Hive table, and the exported data is manually decrypted.
  • the first related method can only solve the problem in a limited way; because the data is imported and exported repeatedly, the operation is cumbersome and time consuming; the data is encrypted by deploying a third-party encryption tool outside the Hive, which increases the complexity of the system, and After the data is encrypted, the data length is generally increased. When the data of the changed length is imported into the Hive, the import efficiency of the Hive system is reduced. Because the data is encrypted before entering Hive, Hive's MapReduce processing function cannot be borrowed. Data encryption can only be performed according to a specified method (usually a table). It is not possible to flexibly select an encrypted object, such as selecting one or several tables, columns, and Partition or bucket, etc.
  • a third-party network authentication protocol (Kerberos) component is introduced in the Hive privilege control, and the Kerberos component of the third party can be directly accessed in the privilege control module of the Hive, as part of the privilege control module, through Kerberos
  • Kerberos network authentication protocol
  • the cost of deploying third-party Kerberos components is relatively high, and it is very complicated.
  • the steps of generating certificates and configuration for Kerberos components are quite cumbersome.
  • the first configuration steps may not be too cumbersome, but the user rights are modified and the machine is To reduce capacity and capacity, you need to regenerate the certificate, distribute the certificate, and restart the system.
  • the Kerberos downtime may lead to the risk that the entire cluster consisting of multiple nodes cannot be serviced.
  • the configuration of Kerberos itself is also complicated, which leads to the performance degradation of the Hadoop cluster running jobs. Therefore, Kerberos is rarely used in big data.
  • the second related method cannot solve the authorization level problem, and this method cannot protect the data of the underlying HDFS.
  • the data protection method in the related art has a poor security mechanism, is cumbersome to operate, and is time consuming, and cannot perform flexible operations on tables, columns, partitions, and buckets, and the degree of confidentiality is not high, and the performance of Hadoop cluster running operations is low. And the problem of higher deployment costs.
  • Embodiments of the present disclosure provide a data protection method in a data warehouse.
  • a user request input by a user is received, where the user request carries a user Identity information and an operation request, the operation request including one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request; determining the user identity information and the location Whether the operation request is legal; if the identity information and the operation request are both legal, determining the operation type of the operation request; and if the operation request is an unauthorized operation, adding to the data warehouse Data or existing data is encrypted.
  • the data protection method in the data warehouse of the embodiment of the present disclosure can protect the data in the data warehouse by performing access control on the data warehouse user and permission control at the column, partition, or bucket level to encrypt the user data. .
  • the method includes the following steps.
  • step 110 a user request entered by the user is received.
  • the user request carries user identity information and an operation request, and the operation request includes one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request.
  • the operation request includes one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request.
  • the user When a user needs to access the Hive system for specific data operations, the user inputs a user request to the Hive system.
  • the user request may carry the identity information of the user, such as the user name and the Internet Protocol (IP) address of the user accessing the Hive system. .
  • IP Internet Protocol
  • step 120 it is determined whether the user identity information and the operation request are legal.
  • step 120 includes:
  • a whitelist is stored in the Hive system, and the whitelist stores identity information of a user having data access rights and a user having data access rights, such as a user name and a user accessing the Hive system. IP address, etc.
  • the whitelist is read to determine whether the user identity information carried in the user request is legal, that is, whether the user is a user recorded in the whitelist, and if so, the user identity information is legal, and the user is allowed to The user accesses the Hive system. If not, the user is denied access to the Hive system.
  • the Hive system also stores preset operation rights for legitimate users to operate on data in the Hive system. For example, the default operational authority that each legitimate user has is stored in the operation permission table. After determining that the user accessing the Hive system belongs to the legal user, the read operation permission table determines whether the permission to execute the operation request is the preset operation authority of the user, that is, whether the user has a table corresponding to the operation request. Operation authority of a column, partition, or bucket (for example, determining whether the user has permission to create a table), and if so, the operation request is legal, and if not, the operation request is refused, and the process ends.
  • the read operation permission table determines whether the permission to execute the operation request is the preset operation authority of the user, that is, whether the user has a table corresponding to the operation request. Operation authority of a column, partition, or bucket (for example, determining whether the user has permission to create a table), and if so, the operation request is legal, and if not, the operation request is refused,
  • step 130 if the identity information and the operation request are both legal, the operation type of the operation request is determined.
  • the operation type of the operation request is divided into an authorization operation and an unauthorized operation.
  • the authorization operation may be that a legitimate user of the Hive system can authorize other users to access a column, a partition, or a bucket of the legitimate user.
  • Unauthorized operations can include creating table operations, data import operations, modifying table operations, encrypting existing data, and data query operations.
  • the operation request is an authorization operation
  • Embodiments of the present disclosure provide flexible authorization based on table, column, partition, and bucket levels, allowing Hive users to authorize other users to access a column, a partition, or a bucket.
  • step 140 may be performed.
  • step 140 if the operation request is an unauthorized operation, the new data or the existing data in the data warehouse is encrypted.
  • the process of encrypting new data in the data warehouse is included in the process of creating tables and data import.
  • encrypting new data in the data warehouse includes:
  • the first encryption configuration information includes one or more of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket.
  • the first encryption configuration information includes one or more of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket.
  • the creation table operation carries the first encryption configuration information
  • the first encryption configuration letter is The information is stored in the encrypted information table
  • the data import operation carries a name of a first target table that needs to import data, a name of a first target column, a name of a first target partition, and a name of a first target bucket
  • the first target table includes one or more tables in the created table
  • the first target column includes one or more columns of the created table
  • the first target partition includes the created One or more partitions of the table, the first target bucket including one or more buckets of the created table
  • An empty table X with only a structure can be created according to the structural information of the table carried in the table creation operation, for example, the field name and the data type. Since there is data to be encrypted in the table X, the encryption table configuration information is carried in the table creation operation, and the encryption configuration information may include the name of the column A and the name of the column B to be encrypted, the encryption algorithm, and the decryption. algorithm.
  • the name of column A is id
  • the name of column B is name
  • a table X is created by using SQL statement
  • the data in column A and column B located in table X is encrypted, which can be as follows:
  • column.encode.columns indicates the encrypted column, if it is empty, it means that it will be in the table.
  • column.encode.classname indicates the encryption algorithm (class library), and the encryption algorithm can be Data Encryption Standard (DES). ).
  • an open plug-in architecture may be adopted, that is, the Hive system includes an encryptor class library and a decryptor class library. Therefore, the encryptor corresponding to the encryption algorithm may be called to encrypt the data, and the call corresponds to the decryption algorithm. The decryptor decrypts the data and it is convenient to replace the encryption algorithm and the decryption algorithm.
  • the encrypted configuration information corresponding to Table X can be stored in the encrypted information table.
  • the encryption information table includes an encryption algorithm used when encrypting based on a table, a column, a partition, or a bucket.
  • the table X is created, that is, an empty directory corresponding to the table X is created in the HDFS, and the structure information (field name, field type, etc.) of the table X is created. , also known as dictionary information, stored in a physical database (such as MySQL, PostgreSQL, etc.).
  • the process of encrypting existing data in the data warehouse can be encrypted by modifying the table, or by creating a temporary table.
  • the existing data in the data warehouse is encrypted by modifying the table, which may include:
  • the modification table operation carries the second encryption configuration information, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, a name of the second target partition, and a second One or more of the names of the target buckets, an encryption algorithm, and a decryption algorithm;
  • the modification table operation carries the second encryption configuration information, using the encryption algorithm carried in the modification table operation, the second target table, the second target column, the second target partition, and the Encrypting data in one or more of the second target buckets;
  • the second encrypted configuration information is stored in the encrypted information table.
  • the encrypted configuration information is used as the carrying information of the modified table operation.
  • the table, the column, the partition, and the bucket may be carried in the modify table operation.
  • the encryptor corresponding to the encryption algorithm may be invoked to encrypt the table, column, partition, or bucket that needs to be encrypted.
  • the SQL statement is as follows:
  • the temporary table mode is created, and the existing data in the data warehouse is encrypted, which may include:
  • the encryption operation on the existing data carries the name of the to-be-processed table and the third encryption configuration information
  • the third encryption configuration information includes the third target that needs to be encrypted.
  • the third target column comprising one or more columns of the table to be processed
  • the third target partition comprising one or more partitions of the table to be processed
  • the third The target bucket includes one or more buckets of the to-be-processed table;
  • the coverage method is to clear the contents of the original file and write new content.
  • the acquiring the data query operation, and querying, in the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more of the following, obtaining query data, including:
  • the data query operation carries the name of the third target table, the name of the third target column, the name of the third target partition, and the third target bucket One or more of the names;
  • Reading one or more of the third target table, the third target column, the third target partition, and the third target bucket from a distributed file system according to the data query operation The data;
  • the encrypted data is decrypted to obtain the query data.
  • encrypting existing data in the M partition, the N partition, and the H partition in the table Y includes: creating a temporary table L having the same structure as the table Y, that is, creating a same as the table Y data dictionary, but not encrypting The temporary table L; and the data in the M partition, the N partition, and the H partition are read out from the HDFS in a data query manner.
  • the data dictionary is a directory of record database and application metadata that the user can access.
  • the names of the M partition, the N partition, and the H partition that need to be queried can be obtained from the data query operation; and the data in the M partition, the N partition, and the H partition are read from the HDFS.
  • the encrypted data may exist in the read data, the encrypted data may be decrypted before the read data is written into the temporary table L.
  • the decryption algorithm corresponding to the table, column, partition or bucket where the encrypted data is located may be searched in the encrypted information table, and when the corresponding decryption algorithm is found, the decryptor pair corresponding to the decryption algorithm is applied to be encrypted. The data is decrypted to obtain the queried data.
  • the data is correspondingly written into the temporary table L. Since the temporary table L has the same structure as the table Y, the same partition exists in the temporary table L. Therefore, the queried data is correspondingly written into the M partition, the N partition, and the H partition in the temporary table L.
  • the data in the M partition, the N partition, and the H partition in the temporary table L may be encrypted by using an encryption algorithm carried in the existing data encryption operation.
  • the encrypted data in the M partition, the N partition, and the H partition are overwritten into the M partition, the N partition, and the H partition in the table Y, and the previously created temporary table L is deleted.
  • the encryption algorithm and the decryption algorithm corresponding to the M partition, the N partition, and the H partition may be stored in the encrypted information table.
  • the name of the table to be queried, the name of the column, the name of the partition, and the name of the bucket may be obtained from the data query operation.
  • a decryption algorithm corresponding to one or more of the buckets if the corresponding decryption algorithm is found, decrypting is performed by using the found decryption algorithm, and the decrypted data is returned to the user, if the corresponding decryption algorithm is not found. , the read data can be returned to the user.
  • the encrypted information table may protect the data in the encrypted information table in the form of a hidden table in the metadata, and may also protect the data in the encrypted information table by encrypting the encrypted information table.
  • the data in the encrypted information table may be encrypted by using a preset encryption algorithm.
  • the operation permission table and whitelist are also part of the metadata. Encrypting the operation permission table and the whitelist also protects the security of the operation permission table and the whitelist.
  • the metastore is a metadata store. Before the metadata table (ie, the encryption information table, the operation permission table, and the whitelist) is written to the physical library, an encryption algorithm is used to encrypt all the contents of the table field. When the metadata is read from the physical database, the decryption algorithm is called to decrypt all the contents of the table field.
  • AES Advanced Encryption Standard
  • the Hive system super administrator does not configure the access control, the user's access control cannot be performed at this time, and the users cannot mutually authorize each other, but the data can be encrypted and protected.
  • data decryption is only displayed when the owner of the data (the owner of the table, column, partition, or bucket data) is queried, and the non-data owner can only When you see the encrypted data, you can't see the decrypted data.
  • the large-scale data processing of the Hive system is implemented when the programming model (MapReduce) task runs.
  • MapReduce runs the intermediate process produces some temporary data. Therefore, in some cases, temporary data generated by the intermediate process of MapReduce can be protected.
  • some data may need to be temporarily written to HDFS or disk. This data is called an intermediate process (such as the Map stage).
  • Temporary data generated, temporary data is often a small fragment of unencrypted data. If you want to protect temporary data, you can also apply the encryption algorithm and decryption algorithm to the intermediate process of MapReduce, and the intermediate process data to HDFS. Or when the disk writes data, the encryption algorithm is called to encrypt the write data, and when the intermediate process reads data from the HDFS or the disk, the decryption algorithm is called to decrypt the read data.
  • the embodiment of the present disclosure further provides a data protection device in a data warehouse.
  • the data protection device 20 in the data warehouse includes: a receiving module 21, a first determining module 23, a second determining module 25, and encryption. Module 27.
  • the receiving module 21 is configured to receive a user request input by the user, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, and a partition level operation request. And one or more of the bucket level operation requests.
  • the first determining module 23 is configured to determine whether the user identity information and the operation request are legal.
  • the second determining module 25 is configured to determine, when the first determining module 23 determines that the identity information and the operation request are both legal, the operation type of the operation request, where the operation type includes an authorization operation and a non-operation Authorized operation.
  • the encryption module 27 is configured to encrypt the newly added data or the existing data in the data warehouse when the second determination module 25 determines that the operation request is an unauthorized operation.
  • the first determining module 23 includes: a first determining unit 231 and a second determining unit 232.
  • the first determining unit 231 is configured to determine the user identity information according to the user identity information. Whether the corresponding user exists in a pre-stored white list, and if so, the user identity information is legal.
  • the second determining unit 232 is configured to determine whether the authority to execute the operation request is a preset operation authority possessed by the user, and if so, the operation request is legal.
  • the unauthorized operation includes: creating a table operation, a data import operation, modifying a table operation, encrypting an existing data, and a data query operation.
  • the encryption module 27 includes: a first obtaining unit 271, a third determining unit 272, a first storage unit 273, a second obtaining unit 274, a fourth determining unit 275, and a third acquiring.
  • the first obtaining unit 271 is configured to acquire a create table operation, and create a table according to the structure information of the table carried in the create table operation.
  • the third determining unit 272 is configured to determine whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a bucket.
  • the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a bucket.
  • the first storage unit 273 is configured to store the first encryption configuration information in the encryption information table when the third determination unit 272 determines that the creation table operation carries the first encryption configuration information.
  • the second obtaining unit 274 is configured to acquire a data import operation, where the data import operation carries a name of the first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target.
  • the data import operation carries a name of the first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target.
  • the first target table comprising one or more tables in the created table
  • the first target column comprising one or more columns of the created table
  • the first The target partition includes one or more partitions of the created table, the first target bucket including one or more buckets of the created table.
  • the fourth determining unit 275 is configured to determine that one or more of the first target table, the first target column, the first target partition, and the first target bucket are in the encrypted information table. Whether a corresponding encryption algorithm is stored.
  • the third obtaining unit 276 is configured to: when the fourth determining unit 275 determines one of the first target table, the first target column, the first target partition, and the first target bucket or Acquiring one of the first target table, the first target column, the first target partition, and the first target bucket when a plurality of corresponding encryption algorithms are stored in the encryption information table Or a variety of corresponding encryption algorithms.
  • the first encryption unit 277 is configured to acquire data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and utilize the acquired The encryption algorithm encrypts the data to be imported to obtain the first encrypted data.
  • the first writing unit 278 is configured to write the first encrypted data correspondingly to one of the first target table, the first target column, the first target partition, and the first target bucket Or a variety of.
  • the encryption module 27 further includes: a fourth obtaining unit 279, a fifth determining unit 2710, a second encrypting unit 2711, and a second storing unit 2712.
  • the fourth obtaining unit 279 is configured to acquire a modification table operation
  • the fifth determining unit 2710 is configured to determine whether the second encryption configuration information is carried in the modification table operation, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, and a second One or more of the name of the target partition and the name of the second target bucket, an encryption algorithm, and a decryption algorithm.
  • the second encryption unit 2711 is configured to use the encryption algorithm carried in the modification table operation to the second target when the second modification unit 2710 determines that the modification table operation carries the second encryption configuration information. Data in one or more of the table, the second target column, the second target partition, and the second target bucket is encrypted.
  • the second storage unit 2712 is configured to store the second encrypted configuration information in the encrypted information table.
  • the encryption module 27 further includes: a fifth obtaining unit 2713, a creating unit 2714, a querying unit 2715, a second writing unit 2716, a third encrypting unit 2717, and a third writing unit. 2718, a third storage unit 2719, and a deletion unit 2720.
  • the fifth obtaining unit 2713 is configured to obtain an encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encrypted configuration information, where the third encrypted configuration information includes One or more of an encrypted third target table name, a third target column name, a third target partition name, and a third target bucket name, an encryption algorithm, and a decryption algorithm, the third target table including One or more tables in the to-be-processed table, the third target column includes one or more columns of the to-be-processed table, and the third target partition includes one or more partitions of the to-be-processed table The third target bucket includes one or more buckets of the to-be-processed table.
  • the creating unit 2714 is set to create a temporary table having the same structure as the to-be-processed table.
  • the query unit 2715 is configured to acquire a data query operation, and query one of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation or A variety of data, get query data.
  • the second writing unit 2716 is arranged to write the query data correspondingly into the temporary table.
  • the third encryption unit 2717 is configured to encrypt the query data written in the temporary table by using the encryption algorithm carried in the encryption operation on the existing data to obtain the second encrypted data.
  • the third writing unit 2718 is configured to write the second encrypted data into the to-be-processed table in an overlay manner.
  • the third storage unit 2719 is configured to store the third encrypted configuration information in the encrypted information table.
  • the deleting unit 2720 is set to delete the temporary table.
  • the query unit 2715 includes: a first obtaining subunit, a query subunit, a determining subunit, a second acquiring subunit, and a decrypting subunit.
  • the first obtaining subunit is configured to acquire the data query operation, wherein the data query operation carries a name of the third target table, a name of the third target column, a name of the third target partition, and One or more of the names of the third target buckets.
  • the query subunit is configured to read one of the third target table, the third target column, the third target partition, and the third target bucket from the distributed file system according to the data query operation Data in one or more.
  • the determining subunit is configured to determine whether the encrypted data exists in one or more of the third target table, the third target column, the third target partition, and the third target bucket.
  • the second obtaining subunit is configured to: when the judgment result of the determining subunit is YES, acquire one or more of a table, a column, a partition, and a bucket in which the encrypted data is located from the encrypted information table. Corresponding decryption algorithm.
  • the decryption subunit is configured to decrypt the encrypted data using the decryption algorithm to obtain the query data.
  • the apparatus further includes: an authorization module 29.
  • the authorization module 29 is configured to: when the second determining module determines that the operation request is an authorized operation, Determining whether one or more of the tables, columns, partitions, and buckets that the user carried in the authorization operation authorizes other users to operate belong to the user, and if so, performing the authorization operation, if not, canceling the Authorized operation.
  • the encryption module 27 further includes a fourth encryption unit 2721 as shown in FIG.
  • the fourth encryption unit 2721 is configured to: after the first storage unit stores the first encryption configuration information in the encryption information table, encrypt the data in the encrypted information table by using a preset encryption algorithm; After storing the second encryption configuration information in the encryption information table, the second storage unit encrypts the data in the encrypted information table by using a preset encryption algorithm; and the third storage unit encrypts the third encryption After the configuration information is stored in the encrypted information table, the data in the encrypted information table is encrypted by using a preset encryption algorithm.
  • Data protection devices include:
  • the access control module 431 is configured to provide a whitelist function.
  • the whitelist function of the access control module 431 is invoked by the rights control module 43.
  • the users in the whitelist are allowed to access the Hive system, and the users in the whitelist are not allowed. It is not allowed to access the Hive system.
  • the whitelist is configured by the super administrator of the Hive system according to the specified interface. It can be stored in the metadata module 41 in the form of a configuration table (whitelist) or in the form of a configuration file on the local hard disk.
  • the access control module 431 is a plug-in component, and the user can inherit the specified interface and develop a customized whitelist function.
  • the privilege control module 43 is configured to provide a Hive system super administrator function on the basis of the Hive original privilege control module, invoke the whitelist function provided by the access control module 431, perform access control on the Hive system, and provide column-based, Flexible authorization at the partition and bucket level allows Hive users to authorize other users to access a column, a partition, or a bucket.
  • the statement parsing module 44 is configured to parse the SQL statement.
  • the encrypted information of the present disclosure is defined when the table is created or the SQL statement of the table is modified. Therefore, based on the original Hive statement parsing module, the parsing of the encrypted information is added, and the encrypted information is written as a table to the metadata. Module 41.
  • the encrypted information can include whether the table is encrypted, the name of the encrypted column, partition and bucket, and the name of the encryptor.
  • the serialization module 45 is arranged to provide functionality to write data into the HDFS 47. In the process of serialization, the data length tends to expand. In order to reduce the overhead of MapReduce network transmission, data encryption can be performed when data is written to HDFS 47.
  • the deserialization module 46 is arranged to provide a function for reading the table file from the HDFS 47. Since the data decryption length is reduced, in order to reduce the overhead of the MapReduce network transmission, the data can be decrypted during the deserialization process.
  • the HDFS 47 is set to store data files corresponding to data in the Hive system.
  • the data in the HDFS 47 can be read by the deserialization module 46, and the data is written into the HDFS 47 by the serialization module 45.
  • the metadata module 41 is configured to define an encryption information table, a white list table, and an operation authority table, and store the encrypted information of the table, the whitelist information, and the preset operation authority of the user.
  • the encrypted information of the table includes whether the table is encrypted, the encrypted column, the name of the partition bucket, and the name of the encryptor.
  • the disclosure defines the encryption information table and the white list.
  • the serialization module 45 or the deserialization module 46 is encrypted according to the encryption.
  • the information determines whether the data needs to be encrypted or decrypted.
  • the whitelist table stores the user name and the IP address of the user accessing the Hive system.
  • the data encryption and decryption module 42 is a separate plug-in component that is respectively called by the serialization module 45 and the deserialization module 46 to provide a data encryption algorithm and a decryption algorithm to implement data encryption and decryption functions.
  • the present disclosure adopts an AES encryption algorithm and an AES decryption algorithm.
  • AES is also called Rijndael encryption in cryptography, and is a block encryption standard adopted by the US federal government.
  • AES can replace DES.
  • AES has become one of the most popular algorithms for symmetric key encryption.
  • AES features fast encryption, compact coding and high security.
  • the encrypted data generated by the encryption algorithm is only related to the original data and the encryption key, that is, as long as the encrypted original data and the key are determined, in any case, the generated encrypted data is the same.
  • decryption As long as the key is obtained, the encrypted data can be decrypted, and the algorithm can be used in a multi-host and multi-node computing environment such as MapReduce.
  • the present disclosure adopts a plug-in architecture, a user can develop a custom encryption or decryption plugin by inheriting the interface defined by the present disclosure.
  • the key configuration module 421 is a sub-module of the data encryption and decryption module 42 and can be used to synthesize the encrypted key and the decrypted key.
  • the Hive system allocates a 32-bit length for each user for encryption.
  • Encryption key (key) the assigned key and the user-defined key synthesize the user's encryption key, and write through the Java Database Connectivity (JDBC) interface
  • JDBC Java Database Connectivity
  • step 510 the user requests that the user enter a user request.
  • the user whitelist is read by the metadata module 41. Because the whitelist stores the identity information of the user with the legal access authority, the whitelist can be read from the metadata module 41 in order to facilitate the determination of the identity information of the user who inputs the user request.
  • step 530 it is determined whether the user is in the white list.
  • the privilege control module 43 invokes the whitelist function provided by the access control module 431 to determine whether the user is in the whitelist. If the user is in the whitelist, the user is allowed to access the Hive system, and step 540 is performed. Within the list, the user is not allowed to access the Hive system. Step 5130 is executed to complete the process. Therefore, only the user authenticated by the rights control module 43 is allowed to access the Hive system and participate in subsequent operations.
  • step 540 the rights control module 43 obtains the user rights and compares the obtained user rights with the rights required in the operation.
  • the authority control module 43 reads the operation authority control table from the metadata module 41.
  • step 550 it is determined whether the operation request in the user request is an authorization operation. If the operation request is an authorization operation, step 560 is performed. If the operation request is not an authorization operation, step 570 is performed.
  • step 560 if the authorized table, column, partition, or bucket belongs to the user, the authorization is successful. If the authorized table, column, partition, or bucket does not belong to the user, the authorization is not successfully performed, and the process ends.
  • step 570 it is determined whether the user has the corresponding operation authority. If yes, step 580 is executed. If not, step 5130 is performed.
  • step 580 the type of operation of the user's operation request is determined.
  • the operation types include creating a table, modifying a table, importing data, querying data, and encrypting and protecting existing data.
  • step 590 is performed, see the first branch flow of FIG.
  • step 5100 is performed, see the second branch flow of FIG.
  • step 5110 is performed, see the third branch flow of FIG.
  • step 5120 is performed, see the fourth branch flow of FIG.
  • steps 5910 to 5950 are performed.
  • step 5910 a table or a modified table statement is created. If you need to create an encrypted table, column, partition, or bucket, you can define an encrypted table, column, partition, or bucket in the data dictionary.
  • step 5920 the data dictionary information for the table is written to the physical database by the metadata module 41.
  • step 5930 it is determined whether the statement of the creation table or the modification table carries the encryption configuration information. If yes, step 5940 is performed, and if not, step 5950 is performed.
  • step 5940 the encrypted configuration information is written by the metadata module 41 to the encrypted information table in the physical database.
  • step S5950 the flow is ended.
  • steps 51010 to 51070 are performed.
  • step 51010 a MapReduce process is executed.
  • step 51020 the serialization module 45 is invoked to serialize the data to the HDFS 47.
  • the metadata table information is read by the metadata module 41.
  • the metadata table information includes encrypted information.
  • step 51040 it is determined whether the imported target table, column, partition or bucket is encrypted. If yes, step 51050 is performed, and if not, step 51060 is performed.
  • step 51050 during the serialization process, the encryption algorithm provided by the encryption and decryption module 42 is invoked for data encryption.
  • the serialization module 45 invokes the key configuration module 421 to synthesize a key, and the calling metadata module 41 acquires the name of the column, partition or bucket to be encrypted, the name of the encryption algorithm, and the name of the decryption algorithm, and loads the specified encryption algorithm and decryption algorithm.
  • Data encryption is done based on the specified encrypted data (columns, partitions, or buckets to be encrypted) and the key.
  • step 51060 data is written to HDFS 47 and MapReduce ends.
  • step 51070 the process ends. After the data is written to HDFS 47, the process in this section ends.
  • steps 51110 to 51180 are performed.
  • step 51110 a MapReduce process is executed.
  • step 51120 the deserialization module 46 is called to read data from the HDFS 47.
  • the metadata information is read by the metadata module 41.
  • the metadata information includes encrypted table information.
  • step 51140 it is determined whether the table, column, partition or bucket of the query is encrypted. If it is encrypted, step 51150 is performed. If there is no encryption, step 51160 is performed.
  • step 51150 during the deserialization process, the decryption algorithm provided by the encryption and decryption module 42 is invoked for data decryption.
  • step 51160 after the data is read, MapReduce ends.
  • step 51170 the query data is returned to the user.
  • step 51180 the flow ends. After the query data is returned to the user, the process ends.
  • steps 5121 to 51211 are performed.
  • step 5121 a query is performed on the source table (columns, partitions, or buckets).
  • step 5122 the data to be queried is obtained.
  • step 5123 a common temporary table with the same homologous table data dictionary but no encryption is created.
  • step 5124 the data queried above is imported into the temporary table.
  • step 5125 MapReduce is executed.
  • step 5126 at the end of the MapReduce flow, the serialization module 45 is invoked to serialize the data to the HDFS 47.
  • step 5127 it is determined whether some tables, columns, partitions, or buckets of the source table need to be encrypted. If necessary, step 5128 is performed. If not, step 5129 is performed.
  • step 5128 during the serialization process, the encryption algorithm provided by the encryption and decryption module 42 is invoked for data encryption.
  • step 5129 the data is written to HDFS 47 and the MapReduce process ends.
  • the encrypted data is written to the source table in an overwrite manner.
  • step S51210 the ordinary temporary table is deleted to save the storage space of the system.
  • step S51211 the flow is ended. After the normal temporary table is deleted, the part of the process ends.
  • the order of creating the temporary table and the querying step of the table are not limited thereto.
  • encryption is performed due to the presence of the data encryption and decryption module 42. Or the decryption algorithm has been loaded into the system.
  • the encryption function and the decryption function can be defined by a User Defined Functions (UDF) without having to define the data dictionary, so that the encryption information can be obtained without parsing by the statement parsing module 44, and the encrypted information can be stored in the metabase. the process of.
  • UDF User Defined Functions
  • the table Tbl_encrypt can be used as a place to store data after encryption.
  • the decryption function decode is called, such as: select a, decode(c) from tbl_encrypt, and the content of c is encode(b). This way does not need to be in the data like creating a table or modifying a table.
  • the dictionary defines the encrypted table, column, partition or bucket. It can modify the original SQL statement, increase the call to the user-defined function, and artificially ensure that the encrypted table, column, partition or bucket is exactly the same when decrypted.
  • the present disclosure protects Hive data through user rights control and data encryption, and implements access control and column, partition, or bucket level authority control for users of the Hive system, and combines user data based on tables, Encryption and decryption at the column, partition or bucket level, protecting the data in the Hive data warehouse, reducing the security mechanism in the related technology, the operation is cumbersome and time-consuming, and the table, column, partition and bucket cannot be flexibly encrypted.
  • Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions arranged to perform any of the methods described above.
  • Embodiments of the present disclosure provide a hardware structure diagram of an electronic device.
  • the electronic device includes:
  • At least one processor 100 which is exemplified by a processor 100 in FIG. 10; and a memory 101, may further include a communication interface 102 and a bus 103.
  • the processor 100, the communication interface 102, and the memory 101 can complete communication with each other through the bus 103.
  • Communication interface 102 can be used for information transmission.
  • the processor 100 can call logic instructions in the memory 101 to perform the above method.
  • logic instructions in the memory 101 described above can be implemented in the form of software functional units. And when sold or used as a stand-alone product, it can be stored on a computer readable storage medium.
  • the memory 101 is a computer readable storage medium, and can be used to store a software program, a computer executable program, a program instruction or a module corresponding to the method in the embodiment of the present disclosure.
  • the processor 100 performs functional applications and data processing by running software programs, instructions or modules stored in the memory 101.
  • the memory 101 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal, and the like. Further, the memory 101 may include a high speed random access memory, and may also include a nonvolatile memory.
  • the technical solution of the present disclosure may be embodied in the form of a software product stored in a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) Performing all or part of the steps of the method of the embodiments of the present disclosure.
  • the foregoing storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random storage memory (RAM), a magnetic disk, or an optical disk.
  • the data protection method and device in the data warehouse provided by the disclosure can perform user access control on the data warehouse and permission control at the column, partition or bucket level to encrypt the user data, and can perform data on the data warehouse. protection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Storage Device Security (AREA)

Abstract

Disclosed are a data protection method and apparatus in a data warehouse. The method comprises: receiving a user request, the user request carrying user identity information and an operation request, the operation request comprising one or more of: a table-level operation request, a row-level operation request, a partition-level operation request and a bucket-level operation request; judging whether the user identity information and the operation request are legitimate; if both the user identity information and the operation request are legitimate, judging an operation type of the operation request; and if the operation request is an unlicensed operation, encrypting added data or existing data in a data warehouse.

Description

数据仓库中的数据保护方法及装置Data protection method and device in data warehouse 技术领域Technical field
本公开涉及数据安全领域,例如涉及一种数据仓库中的数据保护方法及装置。The present disclosure relates to the field of data security, for example, to a data protection method and apparatus in a data warehouse.
背景技术Background technique
Hive是一种数据仓库基础构架。Hive是一种可以存储、查询和分析存储在Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)中的数据的机制,将传统的结构化数据表与HDFS上的数据文件进行映射,并提供简单的类结构化查询语言(Structured Query Language,SQL),进行查询。其中,SQL也称作查询语言(Hibernate Query Language,HQL),同时Hive可以将SQL语句转换为编程模型(MapReduce)任务进行运行,从而实现大规模数据的处理。Hive中用于承载数据的资源实体包括数据库、表、分区和桶,其中,数据库可以看成多个表的集合,所以可以认为用户数据都是存放在表、分区及桶三种实体对象中,列是表中的一个字段,又是分区的组成部分,一个列可以分布在多个分区中,所以,Hive中所有的操作都可以看成是对表、列、分区或桶的操作。Hive is a data warehouse infrastructure. Hive is a mechanism for storing, querying, and analyzing data stored in the Hadoop Distributed File System (HDFS). It maps traditional structured data tables to data files on HDFS and provides simple Class structured query language (Structured Query Language, SQL) for querying. Among them, SQL is also called Hibernate Query Language (HQL), and Hive can convert SQL statements into programming model (MapReduce) tasks to run, thus achieving large-scale data processing. The resource entity used to carry data in the Hive includes a database, a table, a partition, and a bucket. The database can be regarded as a collection of multiple tables, so the user data can be considered to be stored in three entity objects: a table, a partition, and a bucket. A column is a field in a table and is part of a partition. A column can be distributed across multiple partitions. Therefore, all operations in Hive can be viewed as operations on tables, columns, partitions, or buckets.
Hive作为大规模的数据仓库和离线分析平台,得到了非常广泛的应用,但随之而来的数据安全性问题却没有引起足够的重视。其中,Hive的数据安全性问题体现在下面几个方面:As a large-scale data warehouse and offline analysis platform, Hive has been widely used, but the data security issues that follow have not received enough attention. Among them, Hive's data security issues are reflected in the following aspects:
Hive中所有用户都是平行的,没有系统超级管理员,无法对系统进行全局管理,无法限制一些非法用户的访问;All users in Hive are parallel. There is no system super administrator. The system cannot be managed globally and cannot restrict access by some illegal users.
Hive中的每个用户之间可以相互授权,但是只能对表的操作进行授权,对于列、分区或桶无法进行授权,即只能允许或禁止用户访问特定的表,导致无法根据需求对表数据进行列级别、分区级别以及桶级别的操作控制,如只允许或禁止用户访问几列或访问几个桶的数据等;Each user in the Hive can authorize each other, but can only authorize the operation of the table. If the column, partition, or bucket cannot be authorized, the user can only be allowed or prohibited from accessing the specific table. Data is controlled at the column level, partition level, and bucket level, such as allowing or disabling only users to access several columns or accessing data for several buckets;
Hive中的数据对应的数据文件是存放在HDFS上,Hive对于存放在HDFS上的文件,通常采用的存储格式是文本格式(TEXT)或者记录列文件格式(Record Columnar File,RCFile),如果一个Hive用户没有得到其他Hive用户 的授权,获得查询其他Hive用户创建的表、列、分区或桶的权限,但该Hive用户仍然能通过直接获取底层HDFS文件的方式,获取数据信息,这样相当于绕过了Hive的上层的权限控制机制,给数据仓库的数据安全造成了严重威胁。The data files corresponding to the data in the Hive are stored on the HDFS. For the files stored on the HDFS, the storage format used by Hive is the text format (TEXT) or the Record Columnar File (RCFile). If a Hive User did not get other Hive users Authorization, obtain the permission to query the table, column, partition or bucket created by other Hive users, but the Hive user can still obtain the data information by directly obtaining the underlying HDFS file, which is equivalent to bypassing the upper layer of Hive. The control mechanism poses a serious threat to the data security of the data warehouse.
针对上述问题,还没有发现完整和系统化的方案提出。In response to the above problems, no complete and systematic solution has been found.
发明内容Summary of the invention
本公开的实施例提供了一种数据仓库中的数据保护方法及装置,能够通过对数据仓库用户实施接入控制及列、分区或桶级别的权限控制,并结合对用户数据进行加密,能够对数据仓库中的数据进行保护。Embodiments of the present disclosure provide a data protection method and apparatus in a data warehouse, which can perform access control and column, partition, or bucket level authority control on a data warehouse user, and combine encryption of user data to enable The data in the data warehouse is protected.
本公开实施例提供了一种数据仓库中的数据保护方法,包括:Embodiments of the present disclosure provide a data protection method in a data warehouse, including:
接收用户请求,其中,所述用户请求中携带有用户身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种;Receiving a user request, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request. One or more;
判断所述用户身份信息及所述操作请求是否合法;Determining whether the user identity information and the operation request are legal;
若所述身份信息和所述操作请求均合法,则判断所述操作请求的操作类型,其中,所述操作类型包括授权操作和非授权操作;以及If the identity information and the operation request are both legal, determining an operation type of the operation request, where the operation type includes an authorization operation and an unauthorized operation;
若所述操作请求为非授权操作,则对所述数据仓库中的新增数据或已有数据进行加密。If the operation request is an unauthorized operation, the new data or the existing data in the data warehouse is encrypted.
可选的,上述方案中,所述判断所述用户身份信息及所述操作请求是否合法,包括:Optionally, in the foregoing solution, the determining whether the user identity information and the operation request are legal, including:
根据所述用户身份信息,判断所述用户身份信息对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法;以及Determining, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。Determining whether the permission to execute the operation request is a preset operation authority of the user, and if so, the operation request is legal.
可选的,上述方案中,所述非授权操作包括:创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。Optionally, in the foregoing solution, the unauthorized operation includes: creating a table operation, a data import operation, a modify table operation, an encryption operation on the existing data, and a data query operation.
可选的,上述方案中,对所述数据仓库中的所述新增数据进行加密,包括:Optionally, in the foregoing solution, encrypting the newly added data in the data warehouse, including:
获取所述创建表操作,并根据所述创建表操作中携带的表的结构信息创建 表;Obtaining the create table operation, and creating according to the structural information of the table carried in the create table operation table;
判断所述创建表操作中是否携带有第一加密配置信息,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或多种,加密算法以及解密算法;Determining whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes one or more of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. Encryption algorithm and decryption algorithm;
若所述创建表操作中携带有第一加密配置信息,则将所述第一加密配置信息存储在加密信息表中;If the creation of the table operation carries the first encryption configuration information, storing the first encryption configuration information in the encryption information table;
获取所述数据导入操作,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶;Obtaining the data import operation, where the data import operation carries one of a name of a first target table, a name of a first target column, a name of a first target partition, and a name of a first target bucket that need to import data. Or a plurality, the first target table includes one or more tables in the created table, the first target column includes one or more columns of the created table, and the first target partition includes the created table One or more partitions, the first target bucket including one or more buckets of the created table;
判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法;Determining whether one or more of the first target table, the first target column, the first target partition, and the first target bucket store a corresponding encryption algorithm in the encryption information table;
若是,则获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法;If yes, obtaining an encryption algorithm corresponding to one or more of the first target table, the first target column, the first target partition, and the first target bucket;
获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据;以及Obtaining data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and using the obtained encryption algorithm to compare data to be imported Encrypting to obtain the first encrypted data;
将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。And correspondingly writing the first encrypted data into one or more of the first target table, the first target column, the first target partition, and the first target bucket.
可选的,上述方案中,对所述数据仓库中的所述已有数据进行加密,包括:Optionally, in the foregoing solution, encrypting the existing data in the data warehouse, including:
获取所述修改表操作;Obtaining the modification table operation;
判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,加密算法以及解密算法;Determining whether the modification table operation carries the second encryption configuration information, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, a name of the second target partition, and a second One or more of the names of the target buckets, an encryption algorithm, and a decryption algorithm;
若所述修改表操作中携带有第二加密配置信息,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密;以及 And if the modification table operation carries the second encryption configuration information, using the encryption algorithm carried in the modification table operation, the second target table, the second target column, the second target partition, and the Encrypting data in one or more of the second target buckets;
将所述第二加密配置信息存储在所述加密信息表中。The second encrypted configuration information is stored in the encrypted information table.
可选的,上述方案中,对所述数据仓库中的所述已有数据进行加密,包括:Optionally, in the foregoing solution, encrypting the existing data in the data warehouse, including:
获取所述对已有数据的加密操作,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,加密算法以及解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶;Acquiring the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the table to be processed and the third encryption configuration information, where the third encryption configuration information includes the third target table that needs to be encrypted. One or more of a name, a name of a third target column, a name of a third target partition, and a name of a third target bucket, an encryption algorithm, and a decryption algorithm, the third target table including in the to-be-processed table One or more tables, the third target column including one or more columns of the pending table, the third target partition including one or more partitions of the pending table, the third target bucket Include one or more buckets of the pending table;
创建与所述待处理表具有相同结构的临时表;Creating a temporary table having the same structure as the pending table;
获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据;Obtaining the data query operation, and querying one or more of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data, obtain query data;
将所述查询数据对应写入所述临时表中;Correspondingly writing the query data into the temporary table;
利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据;Encrypting the query data written in the temporary table by using the encryption algorithm carried in the encryption operation of the existing data to obtain the second encrypted data;
将所述第二加密数据以覆盖方式对应写入所述待处理表中;Writing the second encrypted data into the to-be-processed table in an overlay manner;
将所述第三加密配置信息存储在所述加密信息表中;以及Storing the third encrypted configuration information in the encrypted information table;
删除所述临时表。Delete the temporary table.
可选的,上述方案中,所述获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区、所述第三目标桶中的一种或多种中的数据,获得查询数据,包括:Optionally, in the above solution, the acquiring the data query operation, and querying the third target table, the third target column, the third target partition, and the third according to the data query operation The data in one or more of the target buckets, the query data is obtained, including:
获取所述数据查询操作,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种;Acquiring the data query operation, where the data query operation carries the name of the third target table, the name of the third target column, the name of the third target partition, and the name of the third target bucket One or more
根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据;Reading one or more of the third target table, the third target column, the third target partition, and the third target bucket from a distributed file system according to the data query operation The data;
判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三 目标桶中的一种或多种中是否存在已加密数据;Determining the third target table, the third target column, the third target partition, and the third Whether there is encrypted data in one or more of the target buckets;
若是,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法;以及And if so, obtaining, from the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located;
利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。Using the decryption algorithm, the encrypted data is decrypted to obtain the query data.
可选的,上述方案中,判断所述操作请求的操作类型之后,所述方法还包括:Optionally, in the foregoing solution, after determining the operation type of the operation request, the method further includes:
若所述操作请求为授权操作,则判断所述授权操作中携带的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户;以及If the operation request is an authorization operation, determining whether one or more of the table, column, partition, and bucket in which the user carried in the authorization operation authorizes other users to operate belongs to the user;
若是,则执行所述授权操作,若不是,取消所述授权操作。If yes, the authorization operation is performed, and if not, the authorization operation is cancelled.
可选的,上述方案中,所述方法还包括:Optionally, in the foregoing solution, the method further includes:
采用预设加密算法,对所述加密信息表中的数据进行加密。The data in the encrypted information table is encrypted by using a preset encryption algorithm.
本公开实施例还提供了一种数据仓库中的数据保护装置,包括:The embodiment of the present disclosure further provides a data protection device in a data warehouse, including:
接收模块,设置为接收用户请求,其中,所述用户请求中携带有用户身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种;The receiving module is configured to receive a user request, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level. One or more of the operational requests;
第一判断模块,设置为判断所述用户身份信息及所述操作请求是否合法;a first determining module, configured to determine whether the user identity information and the operation request are legal;
第二判断模块,设置为当所述第一判断模块的判断所述身份信息和所述操作请求均合法时,判断所述操作请求的操作类型,其中,所述操作类型包括授权操作和非授权操作;以及a second determining module, configured to determine, when the first determining module determines that the identity information and the operation request are both legal, the operation type of the operation request, where the operation type includes an authorized operation and an unauthorized Operation;
加密模块,设置为当所述第二判断模块判断所述操作请求为非授权操作时,对所述数据仓库的新增数据或已有数据进行加密。The encryption module is configured to encrypt the newly added data or the existing data of the data warehouse when the second determining module determines that the operation request is an unauthorized operation.
可选的,上述方案中,所述第一判断模块包括:Optionally, in the above solution, the first determining module includes:
第一判断单元,设置为根据所述用户身份信息,判断所述用户身份信息对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法;以及The first determining unit is configured to determine, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
第二判断单元,设置为判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。 The second determining unit is configured to determine whether the permission to execute the operation request is a preset operation authority of the user, and if yes, the operation request is legal.
可选的,上述方案中,所述非授权操作包括:创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。Optionally, in the foregoing solution, the unauthorized operation includes: creating a table operation, a data import operation, a modify table operation, an encryption operation on the existing data, and a data query operation.
可选的,上述方案中,所述加密模块包括:Optionally, in the foregoing solution, the encryption module includes:
第一获取单元,设置为获取所述创建表操作,并根据所述创建表操作中携带的表的结构信息创建表;a first obtaining unit, configured to acquire the create table operation, and create a table according to the structural information of the table carried in the create table operation;
第三判断单元,设置为判断所述创建表操作中是否携带有第一加密配置信息,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或多种,加密算法以及解密算法;The third determining unit is configured to determine whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. One or more, an encryption algorithm and a decryption algorithm;
第一存储单元,设置为当所述第三判断单元的判断所述创建表操作中携带有第一加密配置信息时,将所述第一加密配置信息存储在加密信息表中;a first storage unit, configured to: when the third determining unit determines that the creation table operation carries the first encryption configuration information, storing the first encryption configuration information in the encryption information table;
第二获取单元,设置为获取所述数据导入操作,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶;a second obtaining unit, configured to acquire the data import operation, where the data import operation carries a name of a first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target One or more of the names of the buckets, the first target table comprising one or more tables in the created table, the first target column comprising one or more columns of the created table, the first The target partition includes one or more partitions of the created table, the first target bucket including one or more buckets of the created table;
第四判断单元,设置为判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法;a fourth determining unit, configured to determine whether one or more of the first target table, the first target column, the first target partition, and the first target bucket are in the encryption information table Storing a corresponding encryption algorithm;
第三获取单元,设置为当所述第四判断单元的判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中存储有对应的加密算法时,获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法;a third acquiring unit, configured to: when the fourth determining unit determines one or more of the first target table, the first target column, the first target partition, and the first target bucket Acquiring one or more of the first target table, the first target column, the first target partition, and the first target bucket when a corresponding encryption algorithm is stored in the encryption information table Corresponding encryption algorithm;
第一加密单元,设置为获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据;以及a first encryption unit, configured to acquire data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and utilize the acquired The encryption algorithm encrypts the data to be imported to obtain the first encrypted data;
第一写入单元,设置为将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。a first writing unit, configured to write the first encrypted data correspondingly to one of the first target table, the first target column, the first target partition, and the first target bucket Or a variety of.
可选的,上述方案中,所述加密模块包括: Optionally, in the foregoing solution, the encryption module includes:
第四获取单元,设置为获取所述修改表操作;a fourth obtaining unit, configured to acquire the modified table operation;
第五判断单元,设置为判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,加密算法以及解密算法;The fifth determining unit is configured to determine whether the second encryption configuration information is carried in the operation of the modification table, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, and a second One or more of the name of the target partition and the name of the second target bucket, an encryption algorithm, and a decryption algorithm;
第二加密单元,设置为在所述第五判断单元的判断所述修改表操作中携带有第二加密配置信息时,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密;以及a second encryption unit, configured to: when the second determination unit carries the second encryption configuration information in the operation of the modification table, using the encryption algorithm carried in the modification table operation on the second target table, Encrypting data in one or more of the second target column, the second target partition, and the second target bucket;
第二存储单元,设置为将所述第二加密配置信息存储在所述加密信息表中。The second storage unit is configured to store the second encrypted configuration information in the encrypted information table.
可选的,上述方案中,所述加密模块包括:Optionally, in the foregoing solution, the encryption module includes:
第五获取单元,设置为获取所述对已有数据的加密操作,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,加密算法以及解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶;The fifth obtaining unit is configured to obtain the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encryption configuration information, where the third encryption configuration information includes One or more of an encrypted third target table name, a third target column name, a third target partition name, and a third target bucket name, an encryption algorithm, and a decryption algorithm, the third target table including One or more tables in the to-be-processed table, the third target column includes one or more columns of the to-be-processed table, and the third target partition includes one or more partitions of the to-be-processed table The third target bucket includes one or more buckets of the to-be-processed table;
创建单元,设置为创建与所述待处理表具有相同结构的临时表;Creating a unit, set to create a temporary table having the same structure as the pending table;
查询单元,设置为获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据;a query unit, configured to acquire the data query operation, and query one of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more kinds, obtaining query data;
第二写入单元,设置为将所述查询数据对应写入所述临时表中;a second writing unit, configured to write the query data into the temporary table correspondingly;
第三加密单元,设置为利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据;The third encryption unit is configured to encrypt the query data written in the temporary table by using the encryption algorithm carried in the encryption operation on the existing data to obtain the second encrypted data;
第三写入单元,设置为将所述第二加密数据以覆盖方式对应写入所述待处理表中;a third writing unit, configured to write the second encrypted data into the to-be-processed table in an overlay manner;
第三存储单元,设置为将所述第三加密配置信息存储在所述加密信息表中;以及 a third storage unit configured to store the third encrypted configuration information in the encrypted information table;
删除单元,设置为删除所述临时表。Delete the unit, set to delete the temporary table.
可选的,上述方案中,所述查询单元包括:Optionally, in the foregoing solution, the query unit includes:
第一获取子单元,设置为获取所述数据查询操作,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种;a first obtaining subunit, configured to acquire the data query operation, where the data query operation carries a name of the third target table, a name of the third target column, a name of the third target partition, and a location One or more of the names of the third target buckets;
查询子单元,设置为根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据;Querying a subunit, configured to read from the distributed file system the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more;
判断子单元,设置为判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中是否存在已加密数据;a determining subunit, configured to determine whether the encrypted data exists in one or more of the third target table, the third target column, the third target partition, and the third target bucket;
第二获取子单元,设置为当所述判断子单元的判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中存在所述已加密数据时,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法;以及a second obtaining subunit, configured to determine one or more of the third target table, the third target column, the third target partition, and the third target bucket when the determining subunit determines And obtaining, in the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located; and
解密子单元,设置为利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。The decryption subunit is configured to decrypt the encrypted data using the decryption algorithm to obtain the query data.
可选的,上述方案中,所述装置还包括:Optionally, in the above solution, the device further includes:
授权模块,设置为当所述第二判断模块判断所述操作请求为授权操作,判断所述授权操作中携带的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户,若是,则执行所述授权操作,若不是,取消所述授权操作。And an authorization module, configured to: when the second determining module determines that the operation request is an authorization operation, determine one or more of a table, a column, a partition, and a bucket that the user carried in the authorization operation authorizes other users to operate Whether it belongs to the user, if yes, the authorization operation is performed, and if not, the authorization operation is cancelled.
可选的,上述方案中,所述加密模块还包括:Optionally, in the above solution, the encryption module further includes:
第四加密单元,设置为采用预设加密算法,对所述加密信息表中的数据进行加密。本公开还提供了一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述方法。The fourth encryption unit is configured to encrypt the data in the encrypted information table by using a preset encryption algorithm. The present disclosure also provides a non-transitory computer readable storage medium storing computer executable instructions arranged to perform the above method.
本公开还提供了一种电子设备,包括:The present disclosure also provides an electronic device, including:
至少一个处理器;以及At least one processor;
与所述至少一个处理器通信连接的存储器;其中, a memory communicatively coupled to the at least one processor; wherein
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行上述任一方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform any of the methods described above.
本公开实施例的数据仓库中的数据保护方法,通过对接收的用户请求中携带的用户身份信息及操作请求的合法性判断,可以对数据仓库用户的接入进行控制,防止非法用户的访问;通过表级别、列级别、分区级别以及桶级别的操作请求可以对用户数据进行灵活操作;以及通过对用户数据的加密,可以对数据仓库中的数据进行保护。因此,本公开实施例的数据仓库中的数据保护方法,能够对数据仓库中的数据进行保护。The data protection method in the data warehouse of the embodiment of the present disclosure can control the access of the data warehouse user by preventing the legality of the user identity information and the operation request carried in the received user request, and preventing the access of the illegal user; User data can be flexibly manipulated through table-level, column-level, partition-level, and bucket-level operational requests; and data in the data warehouse can be protected by encrypting user data. Therefore, the data protection method in the data warehouse of the embodiment of the present disclosure can protect data in the data warehouse.
附图说明DRAWINGS
图1表示本公开第一实施例的数据仓库中的数据保护方法的流程图;1 is a flow chart showing a data protection method in a data warehouse of a first embodiment of the present disclosure;
图2表示本公开第二实施例的数据仓库中的数据保护装置的结构框图之一;2 is a block diagram showing the structure of a data protection device in a data warehouse according to a second embodiment of the present disclosure;
图3表示本公开第二实施例的数据仓库中的数据保护装置的结构框图之二;3 is a second structural block diagram of a data protection device in a data warehouse according to a second embodiment of the present disclosure;
图4表示本公开第三实施例的数据仓库中的数据保护装置的结构框图;4 is a block diagram showing the structure of a data protection device in a data warehouse according to a third embodiment of the present disclosure;
图5表示本公开第三实施例的数据仓库中的数据保护装置的应用主流程图;5 is a flowchart showing an application of a data protection device in a data warehouse in a third embodiment of the present disclosure;
图6表示本公开第三实施例的数据仓库中的数据保护装置的应用的第一分支流程图;6 is a first branch flow chart showing an application of a data protection device in a data warehouse of a third embodiment of the present disclosure;
图7表示本公开第三实施例的数据仓库中的数据保护装置的应用的第二分支流程图;Figure 7 is a flow chart showing a second branch of the application of the data protection device in the data warehouse of the third embodiment of the present disclosure;
图8表示本公开第三实施例的数据仓库中的数据保护装置的应用的第三分支流程图;8 is a third branch flow chart showing an application of a data protection device in a data warehouse of a third embodiment of the present disclosure;
图9表示本公开第三实施例的数据仓库中的数据保护装置的应用的第四分支流程图;以及Figure 9 is a flowchart showing a fourth branch of the application of the data protection device in the data warehouse of the third embodiment of the present disclosure;
图10表示本公开第五实施例的电子设备的硬件结构示意图。FIG. 10 is a block diagram showing the hardware structure of an electronic device according to a fifth embodiment of the present disclosure.
具体实施方式detailed description
下面将参照附图详细地描述本公开的示例性实施例。在不冲突的情况下,以下实施例以及实施例中的技术特征可以相互任意组合。 Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. The technical features in the following embodiments and the embodiments may be arbitrarily combined with each other without conflict.
相关技术往往通过第三方工具对数据进行简单加密来进行数据保护或者通过引入安全认证(Kerberos)组件对用户进行认证。Related technologies often use a third-party tool to simply encrypt data for data protection or to authenticate users by introducing a secure authentication (Kerberos) component.
在第一种相关方法中,Hive的权限控制流程不变,在数据导入Hive的表之前,对原始数据采用第三方加密工具进行加密,将加密后的数据导入到Hive的表中,收到查询请求后将数据从Hive表中导出,手工对导出的数据进行解密。In the first related method, Hive's permission control process is unchanged. Before the data is imported into the Hive table, the original data is encrypted by a third-party encryption tool, and the encrypted data is imported into the Hive table, and the query is received. After the request, the data is exported from the Hive table, and the exported data is manually decrypted.
第一种相关方法只能有限的解决问题;由于要反复进行数据的导入导出,操作繁琐且耗时;数据在Hive之外通过部署第三方加密工具实现加密,增加了系统的复杂性,而且,数据加密后,数据长度一般会增加,将改变长度的数据导入到Hive中时,会降低Hive系统的导入效率。由于数据是在进入Hive之前加密的,无法借用Hive的MapReduce处理功能,只能按照指定方式(通常是一个表)进行数据加密,无法灵活的选择加密对象,比如选择一个或几个表、列、分区或桶等。The first related method can only solve the problem in a limited way; because the data is imported and exported repeatedly, the operation is cumbersome and time consuming; the data is encrypted by deploying a third-party encryption tool outside the Hive, which increases the complexity of the system, and After the data is encrypted, the data length is generally increased. When the data of the changed length is imported into the Hive, the import efficiency of the Hive system is reduced. Because the data is encrypted before entering Hive, Hive's MapReduce processing function cannot be borrowed. Data encryption can only be performed according to a specified method (usually a table). It is not possible to flexibly select an encrypted object, such as selecting one or several tables, columns, and Partition or bucket, etc.
在第二种相关方法中,在Hive权限控制中引入第三方网络认证协议(Kerberos)组件,在Hive的权限控制模块中可以直接接入第三方的Kerberos组件,作为权限控制模块的一部分,通过Kerberos组件对用户进行认证,防止恶意用户伪造已存储的用户的信息。In the second related method, a third-party network authentication protocol (Kerberos) component is introduced in the Hive privilege control, and the Kerberos component of the third party can be directly accessed in the privilege control module of the Hive, as part of the privilege control module, through Kerberos The component authenticates the user and prevents malicious users from forging the information of the stored user.
第二种相关方法中,部署第三方的Kerberos组件的成本比较高,而且非常复杂,Kerberos组件生成证书和配置的步骤相当繁琐,首次配置的步骤也许不太繁琐,但是用户权限的修改和机器的减容扩容,需要重新生成证书、分发证书以及重启系统。而且Kerberos的宕机可能导致整个由多个节点构成的集群无法服务的风险,Kerberos的自身配置也比较复杂,导致Hadoop集群运行作业的性能下降,因此,Kerberos在大数据上的应用比较少。另外,第二种相关方法无法解决授权的级别问题,且这种方法无法对底层HDFS的数据施加保护。In the second related method, the cost of deploying third-party Kerberos components is relatively high, and it is very complicated. The steps of generating certificates and configuration for Kerberos components are quite cumbersome. The first configuration steps may not be too cumbersome, but the user rights are modified and the machine is To reduce capacity and capacity, you need to regenerate the certificate, distribute the certificate, and restart the system. Moreover, the Kerberos downtime may lead to the risk that the entire cluster consisting of multiple nodes cannot be serviced. The configuration of Kerberos itself is also complicated, which leads to the performance degradation of the Hadoop cluster running jobs. Therefore, Kerberos is rarely used in big data. In addition, the second related method cannot solve the authorization level problem, and this method cannot protect the data of the underlying HDFS.
综上所述,相关技术中的数据保护方法,存在安全机制不健全,操作繁琐以及耗时,无法对表、列、分区及桶进行灵活操作,保密程度不高,Hadoop集群运行作业的性能低下及部署成本较高的问题。In summary, the data protection method in the related art has a poor security mechanism, is cumbersome to operate, and is time consuming, and cannot perform flexible operations on tables, columns, partitions, and buckets, and the degree of confidentiality is not high, and the performance of Hadoop cluster running operations is low. And the problem of higher deployment costs.
第一实施例First embodiment
本公开实施例提供了一种数据仓库中的数据保护方法。Embodiments of the present disclosure provide a data protection method in a data warehouse.
该方法中,接收用户输入的用户请求,其中,所述用户请求中携带有用户 身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种;判断所述用户身份信息及所述操作请求是否合法;若所述身份信息和所述操作请求均合法,则判断所述操作请求的操作类型;以及若所述操作请求为非授权操作,则对所述数据仓库中的新增数据或已有数据进行加密。In the method, a user request input by a user is received, where the user request carries a user Identity information and an operation request, the operation request including one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request; determining the user identity information and the location Whether the operation request is legal; if the identity information and the operation request are both legal, determining the operation type of the operation request; and if the operation request is an unauthorized operation, adding to the data warehouse Data or existing data is encrypted.
因此,本公开实施例的数据仓库中的数据保护方法,通过对数据仓库用户实施接入控制及列、分区或桶级别的权限控制,对用户数据进行加密,可以对数据仓库中的数据进行保护。Therefore, the data protection method in the data warehouse of the embodiment of the present disclosure can protect the data in the data warehouse by performing access control on the data warehouse user and permission control at the column, partition, or bucket level to encrypt the user data. .
如图1所示,该方法包括以下步骤。As shown in Figure 1, the method includes the following steps.
在步骤110中,接收用户输入的用户请求。In step 110, a user request entered by the user is received.
其中,所述用户请求中携带有用户身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种。The user request carries user identity information and an operation request, and the operation request includes one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request. Kind.
当用户需要接入Hive系统,进行特定数据操作时,该用户会向Hive系统输入一用户请求。为了方便对用户的身份信息进行核查,判断该用户是否是合法用户,可以在该用户请求中携带用户的身份信息,例如用户名称以及用户接入Hive系统的互联网协议(Internet Protocol,IP)地址等。When a user needs to access the Hive system for specific data operations, the user inputs a user request to the Hive system. In order to facilitate checking the identity information of the user and determining whether the user is a legitimate user, the user request may carry the identity information of the user, such as the user name and the Internet Protocol (IP) address of the user accessing the Hive system. .
在步骤120中,判断所述用户身份信息及所述操作请求是否合法。In step 120, it is determined whether the user identity information and the operation request are legal.
可选的,步骤120包括:Optionally, step 120 includes:
根据所述用户身份信息,判断所述用户身份信息对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法;以及Determining, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。Determining whether the permission to execute the operation request is a preset operation authority of the user, and if so, the operation request is legal.
在本公开实施例中,在Hive系统中存储有白名单,该白名单中保存着具有数据访问权限的用户以及具有数据访问权限的用户的身份信息,例如用户名称以及用户的接入Hive系统的IP地址等。当收到用户请求时,通过读取白名单来判断携带在用户请求中的用户身份信息是否合法,即判断该用户是否是白名单中记录的用户,若是,则该用户身份信息合法,允许该用户接入Hive系统,若不是,拒绝该用户接入Hive系统。 In the embodiment of the present disclosure, a whitelist is stored in the Hive system, and the whitelist stores identity information of a user having data access rights and a user having data access rights, such as a user name and a user accessing the Hive system. IP address, etc. When receiving the user request, the whitelist is read to determine whether the user identity information carried in the user request is legal, that is, whether the user is a user recorded in the whitelist, and if so, the user identity information is legal, and the user is allowed to The user accesses the Hive system. If not, the user is denied access to the Hive system.
在Hive系统中还存储有合法用户对Hive系统中的数据进行操作的预设操作权限。例如,将每个合法用户所具有的预设操作权限存储在操作权限表中。当判断接入Hive系统的用户属于合法用户后,通过读取操作权限表,判断执行操作请求的权限是否是该用户所具有的预设操作权限,即判断该用户是否具有对操作请求对应的表、列、分区或桶的操作权限(例如判断该用户是否具有创建表的权限),若是,则该操作请求合法,若不是,则拒绝执行该操作请求,流程结束。The Hive system also stores preset operation rights for legitimate users to operate on data in the Hive system. For example, the default operational authority that each legitimate user has is stored in the operation permission table. After determining that the user accessing the Hive system belongs to the legal user, the read operation permission table determines whether the permission to execute the operation request is the preset operation authority of the user, that is, whether the user has a table corresponding to the operation request. Operation authority of a column, partition, or bucket (for example, determining whether the user has permission to create a table), and if so, the operation request is legal, and if not, the operation request is refused, and the process ends.
在步骤130中,若所述身份信息和所述操作请求均合法,则判断所述操作请求的操作类型。In step 130, if the identity information and the operation request are both legal, the operation type of the operation request is determined.
其中,操作请求的操作类型分为授权操作和非授权操作。授权操作可以是Hive系统的合法用户可以授权其他用户访问该合法用户的一个列、一个分区或一个桶。非授权操作可以包括创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。The operation type of the operation request is divided into an authorization operation and an unauthorized operation. The authorization operation may be that a legitimate user of the Hive system can authorize other users to access a column, a partition, or a bucket of the legitimate user. Unauthorized operations can include creating table operations, data import operations, modifying table operations, encrypting existing data, and data query operations.
当操作请求为授权操作时,可以判断所述授权操作中携带的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户;若是,则执行所述授权操作,若不是,取消所述授权操作。本公开实施例,提供基于表、列、分区和桶级别的灵活授权,允许Hive用户授权其他用户访问一个列、一个分区或一个桶。When the operation request is an authorization operation, it may be determined whether one or more of the table, the column, the partition, and the bucket that the user carried in the authorization operation authorizes other users to operate belong to the user; if yes, execute the Authorization operation, if not, cancel the authorization operation. Embodiments of the present disclosure provide flexible authorization based on table, column, partition, and bucket levels, allowing Hive users to authorize other users to access a column, a partition, or a bucket.
当操作请求为非授权操作时,可以执行步骤140。When the operation request is an unauthorized operation, step 140 may be performed.
在步骤140中,若所述操作请求为非授权操作,则对数据仓库中的新增数据或已有数据进行加密。In step 140, if the operation request is an unauthorized operation, the new data or the existing data in the data warehouse is encrypted.
其中,对数据仓库中的新增数据进行加密的过程包括在创建表和数据导入的过程中。可选地,对数据仓库中的新增数据进行加密包括:Among them, the process of encrypting new data in the data warehouse is included in the process of creating tables and data import. Optionally, encrypting new data in the data warehouse includes:
获取所述创建表操作,并根据所述创建表操作中携带的表的结构信息创建表;Obtaining the create table operation, and creating a table according to the structural information of the table carried in the create table operation;
判断所述创建表操作中是否携带有第一加密配置信息,其中,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或多种,加密算法以及解密算法;Determining whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes one or more of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. Kind, encryption algorithm and decryption algorithm;
若所述创建表操作中携带有第一加密配置信息,则将所述第一加密配置信 息存储在加密信息表中;If the creation table operation carries the first encryption configuration information, the first encryption configuration letter is The information is stored in the encrypted information table;
获取所述数据导入操作,其中,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶;Obtaining the data import operation, wherein the data import operation carries a name of a first target table that needs to import data, a name of a first target column, a name of a first target partition, and a name of a first target bucket One or more, the first target table includes one or more tables in the created table, the first target column includes one or more columns of the created table, and the first target partition includes the created One or more partitions of the table, the first target bucket including one or more buckets of the created table;
判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法;Determining whether one or more of the first target table, the first target column, the first target partition, and the first target bucket store a corresponding encryption algorithm in the encryption information table;
若是,则获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法;If yes, obtaining an encryption algorithm corresponding to one or more of the first target table, the first target column, the first target partition, and the first target bucket;
获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据;以及Obtaining data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and using the obtained encryption algorithm to compare data to be imported Encrypting to obtain the first encrypted data;
将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。And correspondingly writing the first encrypted data into one or more of the first target table, the first target column, the first target partition, and the first target bucket.
例如,可以创建一个表X,且对位于表X中的A列和B列中的数据加密。可以根据创建表操作中携带的表的结构信息,例如,字段名称以及数据类型等,创建一个只具有结构的空表X。由于该表X中存在需要加密的数据,所以在创建表操作中携带有对表X的加密配置信息,该加密配置信息可以包括需要加密的A列的名称和B列的名称,加密算法以及解密算法。For example, you can create a table X and encrypt the data in columns A and B located in table X. An empty table X with only a structure can be created according to the structural information of the table carried in the table creation operation, for example, the field name and the data type. Since there is data to be encrypted in the table X, the encryption table configuration information is carried in the table creation operation, and the encryption configuration information may include the name of the column A and the name of the column B to be encrypted, the encryption algorithm, and the decryption. algorithm.
例如A列的名称为id,B列的名称为name,采用SQL语句创建一个表X,且对位于表X中的A列和B列中的数据加密,可以如下:For example, the name of column A is id, the name of column B is name, a table X is created by using SQL statement, and the data in column A and column B located in table X is encrypted, which can be as follows:
create table encode_test(id INT,name STRING)ROW FORMAT SERDECreate table encode_test(id INT,name STRING)ROW FORMAT SERDE
′org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe′'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES(′column.encode.columns′=′id,name′,WITH SERDEPROPERTIES('column.encode.columns'='id,name',
′column.encode.classname′=′com.zte.encode.DESRewriter′)'column.encode.classname'='com.zte.encode.DESRewriter')
STORED AS TEXTFILE;STORED AS TEXTFILE;
其中,column.encode.columns指示了加密的列,如果为空,则表示将表中 所有的列加密,上面的示例中需要对2个列进行加密:id和name;column.encode.classname指示了加密的算法(类库),加密的算法可以是数据加密标准(Data Encryption Standard,DES)。本公开实施例中可以采用开放式的插件架构,即Hive系统中包括加密器类库和解密器类库,因此,可以调用对应于加密算法的加密器对数据进行加密,以及调用对应于解密算法的解密器对数据进行解密,而且替换加密算法和解密算法十分方便。Where column.encode.columns indicates the encrypted column, if it is empty, it means that it will be in the table. For all column encryption, the above example requires two columns to be encrypted: id and name; column.encode.classname indicates the encryption algorithm (class library), and the encryption algorithm can be Data Encryption Standard (DES). ). In the embodiment of the present disclosure, an open plug-in architecture may be adopted, that is, the Hive system includes an encryptor class library and a decryptor class library. Therefore, the encryptor corresponding to the encryption algorithm may be called to encrypt the data, and the call corresponds to the decryption algorithm. The decryptor decrypts the data and it is convenient to replace the encryption algorithm and the decryption algorithm.
为了方便在数据导入时对需要加密的数据进行加密,可以将对应于表X的加密配置信息存储在加密信息表中。其中,加密信息表中包括基于表、列、分区或桶进行加密时所采用的加密算法。In order to facilitate encrypting data that needs to be encrypted at the time of data import, the encrypted configuration information corresponding to Table X can be stored in the encrypted information table. The encryption information table includes an encryption algorithm used when encrypting based on a table, a column, a partition, or a bucket.
创建表后,可以向创建的空表中导入数据。例如,向表X中的A列导入数据,则数据导入操作中会携带有A列的名称。从加密信息表中读取与A列对应的加密算法,并采用该加密算法对需要导入的数据进行加密,将加密后的数据写入到表X的A列中。Once the table is created, you can import data into the empty table you created. For example, if you import data into column A in table X, the name of column A will be carried in the data import operation. The encryption algorithm corresponding to the A column is read from the encryption information table, and the data to be imported is encrypted by the encryption algorithm, and the encrypted data is written into the column A of the table X.
由于Hive中的数据对应的数据文件是存放在HDFS上,所以创建表X,就是在HDFS中创建一个对应与表X的空目录,并将该表X的结构信息(字段名称以及字段类型等),也称作字典信息,存储在物理数据库中(如MySQL、PostgreSQL等)。Since the data file corresponding to the data in the Hive is stored on the HDFS, the table X is created, that is, an empty directory corresponding to the table X is created in the HDFS, and the structure information (field name, field type, etc.) of the table X is created. , also known as dictionary information, stored in a physical database (such as MySQL, PostgreSQL, etc.).
对数据仓库中的已有数据进行加密的过程可通过修改表的方式进行加密,也可通过创建临时表的方式实现。The process of encrypting existing data in the data warehouse can be encrypted by modifying the table, or by creating a temporary table.
在方式一中,采用修改表的方式,对数据仓库中的已有数据进行加密,可以包括:In the first method, the existing data in the data warehouse is encrypted by modifying the table, which may include:
获取所述修改表操作;Obtaining the modification table operation;
判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,加密算法以及解密算法;Determining whether the modification table operation carries the second encryption configuration information, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, a name of the second target partition, and a second One or more of the names of the target buckets, an encryption algorithm, and a decryption algorithm;
若所述修改表操作中携带有第二加密配置信息,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密;以及And if the modification table operation carries the second encryption configuration information, using the encryption algorithm carried in the modification table operation, the second target table, the second target column, the second target partition, and the Encrypting data in one or more of the second target buckets;
将所述第二加密配置信息存储在所述加密信息表中。 The second encrypted configuration information is stored in the encrypted information table.
将加密配置信息作为修改表操作的携带信息,当对一个表、一个列、一个分区或一个桶的数据进行加密时,可在修改表操作中携带该表、该列、该分区和该桶中的至少一个的名称,加密算法以及解密算法。在执行修改表操作时,可以调用与加密算法对应的加密器,对需要加密的表、列、分区或桶进行加密。修改加密时采用的加密算法,可以将相应的表、列、分区或桶的加密算法修改为需要的算法,比如将分区DS=’2015’的加密算法改为DES,SQL语句如下:The encrypted configuration information is used as the carrying information of the modified table operation. When the data of a table, a column, a partition, or a bucket is encrypted, the table, the column, the partition, and the bucket may be carried in the modify table operation. At least one of the names, encryption algorithms, and decryption algorithms. When performing the modify table operation, the encryptor corresponding to the encryption algorithm may be invoked to encrypt the table, column, partition, or bucket that needs to be encrypted. Modify the encryption algorithm used in encryption to modify the encryption algorithm of the corresponding table, column, partition or bucket to the required algorithm. For example, change the encryption algorithm of partition DS=’2015’ to DES. The SQL statement is as follows:
ALTER TABLE C SET PARTITION(DS=’2015’)′partition.encode.classname′=′com.zte.encode.DESRewriter′。其中,对于修改表操作中携带的加密配置信息可以保存在加密信息表中。ALTER TABLE C SET PARTITION (DS = '2015') 'partition.encode.classname' = 'com.zte.encode.DESRewriter'. The encryption configuration information carried in the modification table operation may be saved in the encryption information table.
在方式二中,创建临时表方式,对数据仓库中的已有数据进行加密,可以包括:In the second mode, the temporary table mode is created, and the existing data in the data warehouse is encrypted, which may include:
获取所述对已有数据的加密操作,其中,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,加密算法以及解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶;Acquiring the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encryption configuration information, where the third encryption configuration information includes the third target that needs to be encrypted. One or more of a name of a table, a name of a third target column, a name of a third target partition, and a name of a third target bucket, an encryption algorithm, and a decryption algorithm, the third target table including the to-be-processed table One or more of the tables, the third target column comprising one or more columns of the table to be processed, the third target partition comprising one or more partitions of the table to be processed, the third The target bucket includes one or more buckets of the to-be-processed table;
创建与所述待处理表具有相同结构的临时表;Creating a temporary table having the same structure as the pending table;
获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据;Obtaining the data query operation, and querying one or more of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data, obtain query data;
将所述查询数据对应写入所述临时表中;Correspondingly writing the query data into the temporary table;
利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据;Encrypting the query data written in the temporary table by using the encryption algorithm carried in the encryption operation of the existing data to obtain the second encrypted data;
将所述第二加密数据以覆盖方式对应写入所述待处理表中;Writing the second encrypted data into the to-be-processed table in an overlay manner;
将所述第三加密配置信息存储在所述加密信息表中;以及Storing the third encrypted configuration information in the encrypted information table;
删除所述临时表。Delete the temporary table.
其中,覆盖方式是清除原文件的内容,写入新的内容。 The coverage method is to clear the contents of the original file and write new content.
可选的,所述获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区、所述第三目标桶中的一种或多种中的数据,获得查询数据,包括:Optionally, the acquiring the data query operation, and querying, in the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more of the following, obtaining query data, including:
获取所述数据查询操作,其中,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种;Acquiring the data query operation, wherein the data query operation carries the name of the third target table, the name of the third target column, the name of the third target partition, and the third target bucket One or more of the names;
根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据;Reading one or more of the third target table, the third target column, the third target partition, and the third target bucket from a distributed file system according to the data query operation The data;
判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中是否存在已加密数据;Determining whether there is encrypted data in one or more of the third target table, the third target column, the third target partition, and the third target bucket;
若是,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法;以及And if so, obtaining, from the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located;
利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。Using the decryption algorithm, the encrypted data is decrypted to obtain the query data.
例如,对表Y中M分区、N分区以及H分区中的已有数据进行加密,包括:创建一个与表Y具有相同结构的临时表L,即创建一个与表Y数据字典相同,但不加密的临时表L;以及以数据查询的方式将M分区、N分区和H分区中的数据从HDFS中读出。其中,数据字典是一种用户可以访问的记录数据库和应用程序元数据的目录。For example, encrypting existing data in the M partition, the N partition, and the H partition in the table Y includes: creating a temporary table L having the same structure as the table Y, that is, creating a same as the table Y data dictionary, but not encrypting The temporary table L; and the data in the M partition, the N partition, and the H partition are read out from the HDFS in a data query manner. Among them, the data dictionary is a directory of record database and application metadata that the user can access.
在对表Y进行查询时,可以从数据查询操作中获取需要查询的M分区、N分区和H分区的名称;以及从HDFS中读取M分区、N分区和H分区中的数据。其中,读取的数据中可能存在有已加密数据,那么在将读出的数据写入到临时表L中之前,可以对已加密数据进行解密。在进行解密时,可以在加密信息表中查找对应于已加密数据所在的表、列、分区或桶的解密算法,查找到相应的解密算法时,应用对应于该解密算法的解密器对已加密数据进行解密,从而获得所查询的数据。When querying the table Y, the names of the M partition, the N partition, and the H partition that need to be queried can be obtained from the data query operation; and the data in the M partition, the N partition, and the H partition are read from the HDFS. Where the encrypted data may exist in the read data, the encrypted data may be decrypted before the read data is written into the temporary table L. When decrypting, the decryption algorithm corresponding to the table, column, partition or bucket where the encrypted data is located may be searched in the encrypted information table, and when the corresponding decryption algorithm is found, the decryptor pair corresponding to the decryption algorithm is applied to be encrypted. The data is decrypted to obtain the queried data.
当获得查询到的M分区、N分区和H分区的数据后,将这些数据对应写入到临时表L中。由于临时表L与表Y具有相同的结构,所以临时表L中存在相同的分区。因此,所查询的数据对应写入到临时表L中的M分区、N分区和H分区中。 After obtaining the data of the queried M partition, the N partition, and the H partition, the data is correspondingly written into the temporary table L. Since the temporary table L has the same structure as the table Y, the same partition exists in the temporary table L. Therefore, the queried data is correspondingly written into the M partition, the N partition, and the H partition in the temporary table L.
此时,可采用对已有数据加密操作中携带的加密算法对临时表L中M分区、N分区和H分区中的数据进行加密。加密后,将加密后的M分区、N分区和H分区中的数据以覆盖方式写入到表Y中的M分区、N分区和H分区中,并将之前创建的临时表L删除。为了方便后续用户对表Y中M分区、N分区和H分区中的数据进行一些数据操作,可以将M分区、N分区和H分区中所对应的加密算法和解密算法存储在加密信息表中。At this time, the data in the M partition, the N partition, and the H partition in the temporary table L may be encrypted by using an encryption algorithm carried in the existing data encryption operation. After encryption, the encrypted data in the M partition, the N partition, and the H partition are overwritten into the M partition, the N partition, and the H partition in the table Y, and the previously created temporary table L is deleted. In order to facilitate subsequent data operations on the data in the M partition, the N partition, and the H partition in the table Y, the encryption algorithm and the decryption algorithm corresponding to the M partition, the N partition, and the H partition may be stored in the encrypted information table.
当对Hive系统中的数据采用上述方法进行了加密后,在后续用户进行数据查询时,则可以从数据查询操作中获取需要查询的表的名称、列的名称、分区的名称和桶的名称中的一种或多种;从HDFS中读取所要查询的表、列、分区和桶中的一种或多种中的数据;以及从加密信息表中查找与所读取的表、列、分区和桶中的一种或多种对应的解密算法,若查找到对应的解密算法,则利用查找到的解密算法进行解密,并将解密后的数据返回给用户,若未查找到对应的解密算法,则可以将读取的数据返回给用户。After the data in the Hive system is encrypted by the above method, when the subsequent user performs data query, the name of the table to be queried, the name of the column, the name of the partition, and the name of the bucket may be obtained from the data query operation. One or more; reading data from one or more of the tables, columns, partitions, and buckets to be queried from HDFS; and looking up and reading the table, column, and partition from the encrypted information table And a decryption algorithm corresponding to one or more of the buckets, if the corresponding decryption algorithm is found, decrypting is performed by using the found decryption algorithm, and the decrypted data is returned to the user, if the corresponding decryption algorithm is not found. , the read data can be returned to the user.
在Hive系统中,加密信息表在元数据中可以采用隐藏表的形式保护加密信息表中的数据,也可采用对加密信息表进行加密的方式来保护加密信息表中的数据。将第一加密配置信息、第二加密配置信息或第三加密配置信息保存在加密信息表中之后,可采用预设加密算法,对加密信息表中的数据进行加密。In the Hive system, the encrypted information table may protect the data in the encrypted information table in the form of a hidden table in the metadata, and may also protect the data in the encrypted information table by encrypting the encrypted information table. After the first encrypted configuration information, the second encrypted configuration information, or the third encrypted configuration information is saved in the encrypted information table, the data in the encrypted information table may be encrypted by using a preset encryption algorithm.
操作权限表和白名单也属于元数据的一部分,对操作权限表和白名单进行加密同样可以保护操作权限表以及白名单中的数据的安全。The operation permission table and whitelist are also part of the metadata. Encrypting the operation permission table and the whitelist also protects the security of the operation permission table and the whitelist.
可以设置配置参数hive.metastore.encode.class(元数据(Metadata)加密类和解密类)为指定的加密算法和解密算法,比如高级加密标准(Advanced Encryption Standard,AES)加密算法和AES解密算法:com.zte.encode.AESRewriter。metastore为元数据存储。在元数据表(即加密信息表、操作权限表和白名单)写入到物理库之前,对表字段的所有内容调用加密算法进行加密。当元数据从物理数据库中读取出来后,则对表字段的所有内容调用解密算法进行解密。You can set the configuration parameter hive.metastore.encode.class (metadata encryption class and decryption class) to the specified encryption algorithm and decryption algorithm, such as Advanced Encryption Standard (AES) encryption algorithm and AES decryption algorithm: com.zte.encode.AESRewriter. The metastore is a metadata store. Before the metadata table (ie, the encryption information table, the operation permission table, and the whitelist) is written to the physical library, an encryption algorithm is used to encrypt all the contents of the table field. When the metadata is read from the physical database, the decryption algorithm is called to decrypt all the contents of the table field.
在本公开实施例中,如果Hive系统超级管理员没有配置权限控制,此时不能进行用户的接入控制,用户之间也不能相互授权,但还是可以对数据进行加密保护,此时,在进行加密数据查询时,只有数据的所有者(表、列、分区或桶数据的所有者)来查询时,才会进行数据解密展示,非数据的所有者则只能 看到加密后的数据,看不到解密后的数据。In the embodiment of the present disclosure, if the Hive system super administrator does not configure the access control, the user's access control cannot be performed at this time, and the users cannot mutually authorize each other, but the data can be encrypted and protected. When encrypting data queries, data decryption is only displayed when the owner of the data (the owner of the table, column, partition, or bucket data) is queried, and the non-data owner can only When you see the encrypted data, you can't see the decrypted data.
Hive系统的大规模数据处理是在编程模型(MapReduce)任务运行时实现的。在MapReduce运行时,中间过程会产生一些临时数据。因此,在一些情况下,可以对MapReduce的中间过程生成的临时数据做保护,MapReduce在执行过程中,可能需要将部分数据临时写到HDFS或磁盘上,这个数据就称为中间过程(如Map阶段)生成的临时数据,临时数据往往是未加密的数据的一个小小片段,如果要对临时数据也做保护,可以将加密算法和解密算法也应用到MapReduce的中间过程,中间过程的数据往HDFS或磁盘写入数据时,调用加密算法对写入数据加密,在中间过程从HDFS或磁盘读入数据时,调用解密算法对读入数据解密。The large-scale data processing of the Hive system is implemented when the programming model (MapReduce) task runs. When MapReduce runs, the intermediate process produces some temporary data. Therefore, in some cases, temporary data generated by the intermediate process of MapReduce can be protected. During the execution of MapReduce, some data may need to be temporarily written to HDFS or disk. This data is called an intermediate process (such as the Map stage). Temporary data generated, temporary data is often a small fragment of unencrypted data. If you want to protect temporary data, you can also apply the encryption algorithm and decryption algorithm to the intermediate process of MapReduce, and the intermediate process data to HDFS. Or when the disk writes data, the encryption algorithm is called to encrypt the write data, and when the intermediate process reads data from the HDFS or the disk, the decryption algorithm is called to decrypt the read data.
可以设置配置参数hive.intermediate.compression.codec(中间数据编码方式)为指定的加密算法和解密算法,比如AES加密算法和AES解密算法(com.zte.encode.AESRewriter),可实现中间生成数据的加密保护。You can set the configuration parameter hive.intermediate.compression.codec (intermediate data encoding) to the specified encryption algorithm and decryption algorithm, such as AES encryption algorithm and AES decryption algorithm (com.zte.encode.AESRewriter), which can realize intermediate generation of data. Encryption protection.
第二实施例Second embodiment
本公开实施例还提供了一种数据仓库中的数据保护装置,如图2所示,数据仓库中的数据保护装置20包括:接收模块21、第一判断模块23、第二判断模块25以及加密模块27。The embodiment of the present disclosure further provides a data protection device in a data warehouse. As shown in FIG. 2, the data protection device 20 in the data warehouse includes: a receiving module 21, a first determining module 23, a second determining module 25, and encryption. Module 27.
接收模块21设置为接收用户输入的用户请求,其中,所述用户请求中携带有用户身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种。The receiving module 21 is configured to receive a user request input by the user, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, and a partition level operation request. And one or more of the bucket level operation requests.
第一判断模块23设置为判断所述用户身份信息及所述操作请求是否合法。The first determining module 23 is configured to determine whether the user identity information and the operation request are legal.
第二判断模块25设置为当所述第一判断模块23的判断所述身份信息和所述操作请求均合法时,判断所述操作请求的操作类型,其中,所述操作类型包括授权操作和非授权操作。The second determining module 25 is configured to determine, when the first determining module 23 determines that the identity information and the operation request are both legal, the operation type of the operation request, where the operation type includes an authorization operation and a non-operation Authorized operation.
加密模块27设置为当所述第二判断模块25判断所述操作请求为非授权操作时,对所述数据仓库中的新增数据或已有数据进行加密。The encryption module 27 is configured to encrypt the newly added data or the existing data in the data warehouse when the second determination module 25 determines that the operation request is an unauthorized operation.
可选地,如图3所示,所述第一判断模块23包括:第一判断单元231和第二判断单元232。Optionally, as shown in FIG. 3, the first determining module 23 includes: a first determining unit 231 and a second determining unit 232.
第一判断单元231设置为根据所述用户身份信息,判断所述用户身份信息 对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法。The first determining unit 231 is configured to determine the user identity information according to the user identity information. Whether the corresponding user exists in a pre-stored white list, and if so, the user identity information is legal.
第二判断单元232设置为判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。The second determining unit 232 is configured to determine whether the authority to execute the operation request is a preset operation authority possessed by the user, and if so, the operation request is legal.
可选地,所述非授权操作包括:创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。Optionally, the unauthorized operation includes: creating a table operation, a data import operation, modifying a table operation, encrypting an existing data, and a data query operation.
可选地,如图3所示,所述加密模块27包括:第一获取单元271、第三判断单元272、第一存储单元273、第二获取单元274、第四判断单元275、第三获取单元276、第一加密单元277以及第一写入单元278。Optionally, as shown in FIG. 3, the encryption module 27 includes: a first obtaining unit 271, a third determining unit 272, a first storage unit 273, a second obtaining unit 274, a fourth determining unit 275, and a third acquiring. Unit 276, first encryption unit 277, and first write unit 278.
第一获取单元271设置为获取创建表操作,并根据所述创建表操作中携带的表的结构信息创建表。The first obtaining unit 271 is configured to acquire a create table operation, and create a table according to the structure information of the table carried in the create table operation.
第三判断单元272设置为判断所述创建表操作中是否携带有第一加密配置信息,其中,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或多种,加密算法以及解密算法。The third determining unit 272 is configured to determine whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a bucket. One or more of the names, encryption algorithms, and decryption algorithms.
第一存储单元273设置为当所述第三判断单元272的判断所述创建表操作中携带有第一加密配置信息时,将所述第一加密配置信息存储在加密信息表中。The first storage unit 273 is configured to store the first encryption configuration information in the encryption information table when the third determination unit 272 determines that the creation table operation carries the first encryption configuration information.
第二获取单元274设置为获取数据导入操作,其中,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶。The second obtaining unit 274 is configured to acquire a data import operation, where the data import operation carries a name of the first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target. One or more of the names of the buckets, the first target table comprising one or more tables in the created table, the first target column comprising one or more columns of the created table, the first The target partition includes one or more partitions of the created table, the first target bucket including one or more buckets of the created table.
第四判断单元275,设置为判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法。The fourth determining unit 275 is configured to determine that one or more of the first target table, the first target column, the first target partition, and the first target bucket are in the encrypted information table. Whether a corresponding encryption algorithm is stored.
第三获取单元276,设置为当所述第四判断单元275的判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中存储有对应的加密算法时,获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法。 The third obtaining unit 276 is configured to: when the fourth determining unit 275 determines one of the first target table, the first target column, the first target partition, and the first target bucket or Acquiring one of the first target table, the first target column, the first target partition, and the first target bucket when a plurality of corresponding encryption algorithms are stored in the encryption information table Or a variety of corresponding encryption algorithms.
第一加密单元277设置为获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据。The first encryption unit 277 is configured to acquire data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and utilize the acquired The encryption algorithm encrypts the data to be imported to obtain the first encrypted data.
第一写入单元278设置为将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。The first writing unit 278 is configured to write the first encrypted data correspondingly to one of the first target table, the first target column, the first target partition, and the first target bucket Or a variety of.
可选地,如图3所示,所述加密模块27还包括:第四获取单元279、第五判断单元2710、第二加密单元2711和第二存储单元2712。Optionally, as shown in FIG. 3, the encryption module 27 further includes: a fourth obtaining unit 279, a fifth determining unit 2710, a second encrypting unit 2711, and a second storing unit 2712.
第四获取单元279设置为获取修改表操作;The fourth obtaining unit 279 is configured to acquire a modification table operation;
第五判断单元2710设置为判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,加密算法以及解密算法。The fifth determining unit 2710 is configured to determine whether the second encryption configuration information is carried in the modification table operation, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, and a second One or more of the name of the target partition and the name of the second target bucket, an encryption algorithm, and a decryption algorithm.
第二加密单元2711,设置为在所述第五判断单元2710的判断所述修改表操作中携带有第二加密配置信息时,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密。The second encryption unit 2711 is configured to use the encryption algorithm carried in the modification table operation to the second target when the second modification unit 2710 determines that the modification table operation carries the second encryption configuration information. Data in one or more of the table, the second target column, the second target partition, and the second target bucket is encrypted.
第二存储单元2712设置为将所述第二加密配置信息存储在所述加密信息表中。The second storage unit 2712 is configured to store the second encrypted configuration information in the encrypted information table.
可选地,如图3所示,所述加密模块27还包括:第五获取单元2713、创建单元2714、查询单元2715、第二写入单元2716、第三加密单元2717、第三写入单元2718、第三存储单元2719和删除单元2720。Optionally, as shown in FIG. 3, the encryption module 27 further includes: a fifth obtaining unit 2713, a creating unit 2714, a querying unit 2715, a second writing unit 2716, a third encrypting unit 2717, and a third writing unit. 2718, a third storage unit 2719, and a deletion unit 2720.
第五获取单元2713设置为获取对已有数据的加密操作,其中,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,加密算法以及解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶。The fifth obtaining unit 2713 is configured to obtain an encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encrypted configuration information, where the third encrypted configuration information includes One or more of an encrypted third target table name, a third target column name, a third target partition name, and a third target bucket name, an encryption algorithm, and a decryption algorithm, the third target table including One or more tables in the to-be-processed table, the third target column includes one or more columns of the to-be-processed table, and the third target partition includes one or more partitions of the to-be-processed table The third target bucket includes one or more buckets of the to-be-processed table.
创建单元2714设置为创建与所述待处理表具有相同结构的临时表。 The creating unit 2714 is set to create a temporary table having the same structure as the to-be-processed table.
查询单元2715设置为获取数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据。The query unit 2715 is configured to acquire a data query operation, and query one of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation or A variety of data, get query data.
第二写入单元2716设置为将所述查询数据对应写入所述临时表中。The second writing unit 2716 is arranged to write the query data correspondingly into the temporary table.
第三加密单元2717设置为利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据。The third encryption unit 2717 is configured to encrypt the query data written in the temporary table by using the encryption algorithm carried in the encryption operation on the existing data to obtain the second encrypted data.
第三写入单元2718设置为将所述第二加密数据以覆盖方式对应写入所述待处理表中。The third writing unit 2718 is configured to write the second encrypted data into the to-be-processed table in an overlay manner.
第三存储单元2719设置为将所述第三加密配置信息存储在所述加密信息表中。The third storage unit 2719 is configured to store the third encrypted configuration information in the encrypted information table.
删除单元2720设置为删除所述临时表。The deleting unit 2720 is set to delete the temporary table.
可选地,所述查询单元2715包括:第一获取子单元、查询子单元、判断子单元、第二获取子单元以及解密子单元。Optionally, the query unit 2715 includes: a first obtaining subunit, a query subunit, a determining subunit, a second acquiring subunit, and a decrypting subunit.
第一获取子单元设置为获取所述数据查询操作,其中,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种。The first obtaining subunit is configured to acquire the data query operation, wherein the data query operation carries a name of the third target table, a name of the third target column, a name of the third target partition, and One or more of the names of the third target buckets.
查询子单元设置为根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据。The query subunit is configured to read one of the third target table, the third target column, the third target partition, and the third target bucket from the distributed file system according to the data query operation Data in one or more.
判断子单元设置为判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中是否存在已加密数据。The determining subunit is configured to determine whether the encrypted data exists in one or more of the third target table, the third target column, the third target partition, and the third target bucket.
第二获取子单元设置为当所述判断子单元的判断结果为是时,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法。The second obtaining subunit is configured to: when the judgment result of the determining subunit is YES, acquire one or more of a table, a column, a partition, and a bucket in which the encrypted data is located from the encrypted information table. Corresponding decryption algorithm.
解密子单元设置为利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。The decryption subunit is configured to decrypt the encrypted data using the decryption algorithm to obtain the query data.
可选地,如图3所示,所述装置还包括:授权模块29。Optionally, as shown in FIG. 3, the apparatus further includes: an authorization module 29.
授权模块29设置为当所述第二判断模块判断所述操作请求为授权操作,判 断所述授权操作中携带的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户,若是,则执行所述授权操作,若不是,取消所述授权操作。The authorization module 29 is configured to: when the second determining module determines that the operation request is an authorized operation, Determining whether one or more of the tables, columns, partitions, and buckets that the user carried in the authorization operation authorizes other users to operate belong to the user, and if so, performing the authorization operation, if not, canceling the Authorized operation.
可选地,如图3所示所述加密模块27还包括:第四加密单元2721。第四加密单元2721设置为所述第一存储单元将所述第一加密配置信息存储在加密信息表中之后,采用预设加密算法,对所述加密信息表中的数据进行加密;所述第二存储单元将所述第二加密配置信息存储在加密信息表中之后,采用预设加密算法,对所述加密信息表中的数据进行加密;以及所述第三存储单元将所述第三加密配置信息存储在加密信息表中之后,采用预设加密算法,对所述加密信息表中的数据进行加密。Optionally, the encryption module 27 further includes a fourth encryption unit 2721 as shown in FIG. The fourth encryption unit 2721 is configured to: after the first storage unit stores the first encryption configuration information in the encryption information table, encrypt the data in the encrypted information table by using a preset encryption algorithm; After storing the second encryption configuration information in the encryption information table, the second storage unit encrypts the data in the encrypted information table by using a preset encryption algorithm; and the third storage unit encrypts the third encryption After the configuration information is stored in the encrypted information table, the data in the encrypted information table is encrypted by using a preset encryption algorithm.
第三实施例Third embodiment
图4为本公开第三实施例的数据仓库中的数据保护装置的结构框图。数据保护装置包括:4 is a structural block diagram of a data protection device in a data warehouse according to a third embodiment of the present disclosure. Data protection devices include:
接入控制模块431设置为提供白名单功能,接入控制模块431的白名单功能被权限控制模块43调用,在白名单之中的用户才被允许接入Hive系统,未在白名单中的用户不允许接入Hive系统。白名单由Hive系统的超级管理员根据指定的接口配置,可以以配置表(白名单表)的形式存储在元数据模块41,也可以以配置文件的形式存储在本地硬盘。接入控制模块431是插件式组件,用户可以继承指定的接口,开发自定义的白名单功能。The access control module 431 is configured to provide a whitelist function. The whitelist function of the access control module 431 is invoked by the rights control module 43. The users in the whitelist are allowed to access the Hive system, and the users in the whitelist are not allowed. It is not allowed to access the Hive system. The whitelist is configured by the super administrator of the Hive system according to the specified interface. It can be stored in the metadata module 41 in the form of a configuration table (whitelist) or in the form of a configuration file on the local hard disk. The access control module 431 is a plug-in component, and the user can inherit the specified interface and develop a customized whitelist function.
权限控制模块43设置为在Hive原有权限控制模块的基础上,提供Hive系统超级管理员功能,调用接入控制模块431提供的白名单功能,对Hive系统进行接入控制,并提供基于列、分区、桶级别的灵活授权,允许Hive用户授权其他用户访问一个列、一个分区或一个桶。The privilege control module 43 is configured to provide a Hive system super administrator function on the basis of the Hive original privilege control module, invoke the whitelist function provided by the access control module 431, perform access control on the Hive system, and provide column-based, Flexible authorization at the partition and bucket level allows Hive users to authorize other users to access a column, a partition, or a bucket.
语句解析模块44设置为对SQL语句进行解析。本公开的加密信息是在创建表或修改表的SQL语句时定义的,所以在原有Hive语句解析模块的基础上,增加对加密信息的解析,并将加密信息作为一张表写入到元数据模块41中。加密信息可以包括表是否加密,加密的列、分区和桶的名称以及加密器名称。The statement parsing module 44 is configured to parse the SQL statement. The encrypted information of the present disclosure is defined when the table is created or the SQL statement of the table is modified. Therefore, based on the original Hive statement parsing module, the parsing of the encrypted information is added, and the encrypted information is written as a table to the metadata. Module 41. The encrypted information can include whether the table is encrypted, the name of the encrypted column, partition and bucket, and the name of the encryptor.
序列化模块45设置为提供将数据写入到HDFS 47中的功能。在序列化的过程中,数据长度往往会扩大,为了减少MapReduce网络传输的开销,可以在数据写入HDFS 47的时候,进行数据加密。 The serialization module 45 is arranged to provide functionality to write data into the HDFS 47. In the process of serialization, the data length tends to expand. In order to reduce the overhead of MapReduce network transmission, data encryption can be performed when data is written to HDFS 47.
反序列化模块46设置为提供将从HDFS 47上将表文件读入的功能,由于数据解密长度会减少,为了减少MapReduce网络传输的开销,可以在反序列化的过程中,进行数据解密。The deserialization module 46 is arranged to provide a function for reading the table file from the HDFS 47. Since the data decryption length is reduced, in order to reduce the overhead of the MapReduce network transmission, the data can be decrypted during the deserialization process.
HDFS 47设置为存储Hive系统中的数据对应的数据文件。其中,HDFS 47中的数据可通过反序列化模块46读取,通过序列化模块45将数据写入HDFS 47中。The HDFS 47 is set to store data files corresponding to data in the Hive system. The data in the HDFS 47 can be read by the deserialization module 46, and the data is written into the HDFS 47 by the serialization module 45.
元数据模块41设置为定义加密信息表、白名单表和操作权限表,存储表的加密信息、白名单信息和用户所具有的预设操作权限。表的加密信息包括表是否加密,加密的列、分区桶的名称以及加密器的名称。为了对用户SQL程序的影响减到最低,即不对用户已有的SQL程序进行修改,使得用户在加密和不加密的情况下,执行的SQL程序都一样,本公开在定义加密信息表、白名单表和操作权限表时,将加密信息作为表数据字典的一部分,持久化到元数据模块41中,当有数据操作时,要做任何变动,由序列化模块45或者反序列化模块46根据加密信息判断是否需要对数据进行加密或解密。白名单表则存储用户名称以及用户接入Hive系统的IP地址等。The metadata module 41 is configured to define an encryption information table, a white list table, and an operation authority table, and store the encrypted information of the table, the whitelist information, and the preset operation authority of the user. The encrypted information of the table includes whether the table is encrypted, the encrypted column, the name of the partition bucket, and the name of the encryptor. In order to minimize the impact on the user's SQL program, that is, the user's existing SQL program is not modified, so that the user executes the same SQL program in the case of encryption and non-encryption, and the disclosure defines the encryption information table and the white list. When the table and the operation permission table are used, the encrypted information is persisted to the metadata module 41 as part of the table data dictionary. When there is data operation, any changes are to be made, and the serialization module 45 or the deserialization module 46 is encrypted according to the encryption. The information determines whether the data needs to be encrypted or decrypted. The whitelist table stores the user name and the IP address of the user accessing the Hive system.
数据加密和解密模块42为一个独立的插件式组件,分别被序列化模块45和反序列化模块46调用,提供数据加密算法和解密算法,实现数据加密和解密功能。本公开采用AES加密算法和AES解密算法,AES在密码学中又称Rijndael加密法,是美国联邦政府采用的一种区块加密标准。AES可以替代DES。AES已成为对称密钥加密中最流行的算法之一。AES具有加密速度快,编码紧凑,安全性极高的特点。同时,该加密算法生成的加密数据只与原始数据和加密密钥相关,也就是只要加密的原始数据和密钥确定,那么在任何情况下,生成的加密数据都是相同的。对于解密也是同样的,只要获取到密钥,就能对加密数据进行解密,可以将该算法使用到MapReduce这样的多主机以及多节点的运算环境中。同时由于本公开采用一种插件式架构,用户只要通过继承本公开定义的接口,也可以开发自定义的加密或解密插件。The data encryption and decryption module 42 is a separate plug-in component that is respectively called by the serialization module 45 and the deserialization module 46 to provide a data encryption algorithm and a decryption algorithm to implement data encryption and decryption functions. The present disclosure adopts an AES encryption algorithm and an AES decryption algorithm. AES is also called Rijndael encryption in cryptography, and is a block encryption standard adopted by the US federal government. AES can replace DES. AES has become one of the most popular algorithms for symmetric key encryption. AES features fast encryption, compact coding and high security. At the same time, the encrypted data generated by the encryption algorithm is only related to the original data and the encryption key, that is, as long as the encrypted original data and the key are determined, in any case, the generated encrypted data is the same. The same is true for decryption. As long as the key is obtained, the encrypted data can be decrypted, and the algorithm can be used in a multi-host and multi-node computing environment such as MapReduce. At the same time, since the present disclosure adopts a plug-in architecture, a user can develop a custom encryption or decryption plugin by inheriting the interface defined by the present disclosure.
密钥配置模块421为数据加密和解密模块42的子模块,可以用于合成加密的密钥和解密的密钥,在加密过程中,Hive系统为每个用户分配一个32位长度的用于加密的加密密钥(key),该分配的key与用户自定义的key合成用户的加密密钥,并通过Java数据库连接(Java Data Base Connectivity,JDBC)接口写 入到元数据模块41中。The key configuration module 421 is a sub-module of the data encryption and decryption module 42 and can be used to synthesize the encrypted key and the decrypted key. In the encryption process, the Hive system allocates a 32-bit length for each user for encryption. Encryption key (key), the assigned key and the user-defined key synthesize the user's encryption key, and write through the Java Database Connectivity (JDBC) interface The metadata module 41 is entered.
基于图4的结构框图,当有用户接入时,图4所示的多个模块按照如图5所示的流程图执行。Based on the structural block diagram of FIG. 4, when there is a user access, the plurality of modules shown in FIG. 4 are executed in accordance with the flowchart shown in FIG.
在步骤510中,用户请求,即用户输入用户请求。In step 510, the user requests that the user enter a user request.
在步骤520中,通过元数据模块41读取用户白名单。因为白名单中存储着具有合法接入权限的用户的身份信息,为了便于对输入用户请求的用户的身份信息进行判断,可以从元数据模块41中读取白名单。In step 520, the user whitelist is read by the metadata module 41. Because the whitelist stores the identity information of the user with the legal access authority, the whitelist can be read from the metadata module 41 in order to facilitate the determination of the identity information of the user who inputs the user request.
在步骤530中,判断该用户是否在白名单内。权限控制模块43调用接入控制模块431提供的白名单功能,判断用户是否在白名单之内,若用户在白名单之内,则允许用户接入Hive系统,并执行步骤540,若用户不在白名单之内,则不允许用户接入Hive系统,执行步骤5130,即结束流程。因此,只有通过权限控制模块43认证的用户才允许接入Hive系统并参与后面的操作。In step 530, it is determined whether the user is in the white list. The privilege control module 43 invokes the whitelist function provided by the access control module 431 to determine whether the user is in the whitelist. If the user is in the whitelist, the user is allowed to access the Hive system, and step 540 is performed. Within the list, the user is not allowed to access the Hive system. Step 5130 is executed to complete the process. Therefore, only the user authenticated by the rights control module 43 is allowed to access the Hive system and participate in subsequent operations.
在步骤540中,权限控制模块43获取用户权限,将获取的用户权限与操作中要求的权限对比。权限控制模块43从元数据模块41中读取操作权限控制表。In step 540, the rights control module 43 obtains the user rights and compares the obtained user rights with the rights required in the operation. The authority control module 43 reads the operation authority control table from the metadata module 41.
在步骤550中,判断用户请求中的操作请求是否是授权操作,若操作请求是授权操作,执行步骤560,若操作请求不是授权操作,执行步骤570。In step 550, it is determined whether the operation request in the user request is an authorization operation. If the operation request is an authorization operation, step 560 is performed. If the operation request is not an authorization operation, step 570 is performed.
在步骤560中,若授权的表、列、分区或桶属于该用户,则成功授权,若授权的表、列、分区或桶不属于该用户,则未成功授权,结束流程。In step 560, if the authorized table, column, partition, or bucket belongs to the user, the authorization is successful. If the authorized table, column, partition, or bucket does not belong to the user, the authorization is not successfully performed, and the process ends.
在步骤570中,判断用户是否具备相应操作权限,若具备,执行步骤580,若不具备,执行步骤5130。In step 570, it is determined whether the user has the corresponding operation authority. If yes, step 580 is executed. If not, step 5130 is performed.
在步骤580中,判断用户的操作请求的操作类型。其中,操作类型包括创建表、修改表、数据导入、数据查询以及对已有数据加密保护。In step 580, the type of operation of the user's operation request is determined. Among them, the operation types include creating a table, modifying a table, importing data, querying data, and encrypting and protecting existing data.
当操作请求为创建表或修改表时,执行步骤590,参见图6的第一分支流程。When the operation request is to create a table or modify the table, step 590 is performed, see the first branch flow of FIG.
当操作请求为数据导入时,执行步骤5100,参见图7的第二分支流程。When the operation request is data import, step 5100 is performed, see the second branch flow of FIG.
当操作请求为数据查询时,执行步骤5110,参见图8的第三分支流程。When the operation request is a data query, step 5110 is performed, see the third branch flow of FIG.
当操作请求为对已有数据加密保护时,执行步骤5120,参见图9的第四分支流程。When the operation request is to encrypt and protect the existing data, step 5120 is performed, see the fourth branch flow of FIG.
如图6所示,当操作请求为创建表或修改表时,执行步骤5910~5950。 As shown in FIG. 6, when the operation request is to create a table or modify the table, steps 5910 to 5950 are performed.
在步骤5910中,创建表或修改表语句。如果需要创建加密的表、列、分区或桶,则可以在数据字典中定义加密的表、列、分区或桶。In step 5910, a table or a modified table statement is created. If you need to create an encrypted table, column, partition, or bucket, you can define an encrypted table, column, partition, or bucket in the data dictionary.
在步骤5920中,将表的数据字典信息通过元数据模块41写入到物理数据库中。In step 5920, the data dictionary information for the table is written to the physical database by the metadata module 41.
在步骤5930中,判断创建表或修改表的语句中是否携带了加密配置信息,若是,执行步骤5940,若不是,执行步骤5950。In step 5930, it is determined whether the statement of the creation table or the modification table carries the encryption configuration information. If yes, step 5940 is performed, and if not, step 5950 is performed.
在步骤5940中,将加密配置信息通过元数据模块41写入到物理数据库中的加密信息表中。In step 5940, the encrypted configuration information is written by the metadata module 41 to the encrypted information table in the physical database.
在步骤S5950中,结束流程。In step S5950, the flow is ended.
如图7所示,当操作请求为数据导入时,执行步骤51010~51070。As shown in FIG. 7, when the operation request is data import, steps 51010 to 51070 are performed.
在步骤51010中,执行MapReduce流程。In step 51010, a MapReduce process is executed.
在步骤51020中,调用序列化模块45向HDFS 47序列化数据。In step 51020, the serialization module 45 is invoked to serialize the data to the HDFS 47.
在步骤51030中,通过元数据模块41读取元数据表信息。元数据表信息包括加密信息。In step 51030, the metadata table information is read by the metadata module 41. The metadata table information includes encrypted information.
在步骤51040中,判断导入的目标表、列、分区或桶是否加密,若是,执行步骤51050,若不是,执行步骤51060。In step 51040, it is determined whether the imported target table, column, partition or bucket is encrypted. If yes, step 51050 is performed, and if not, step 51060 is performed.
在步骤51050中,在序列化过程中,调用加密和解密模块42提供的加密算法进行数据加密。序列化模块45调用密钥配置模块421合成密钥,调用元数据模块41获取待加密的列、分区或桶的名称及加密算法的名称和解密算法的名称,加载指定的加密算法和解密算法,根据指定的加密数据(待加密的列、分区或桶)和密钥完成数据加密。In step 51050, during the serialization process, the encryption algorithm provided by the encryption and decryption module 42 is invoked for data encryption. The serialization module 45 invokes the key configuration module 421 to synthesize a key, and the calling metadata module 41 acquires the name of the column, partition or bucket to be encrypted, the name of the encryption algorithm, and the name of the decryption algorithm, and loads the specified encryption algorithm and decryption algorithm. Data encryption is done based on the specified encrypted data (columns, partitions, or buckets to be encrypted) and the key.
在步骤51060中,数据写入HDFS 47中,MapReduce结束。In step 51060, data is written to HDFS 47 and MapReduce ends.
在步骤51070中,结束流程。将数据写入到HDFS 47中后,此部分的流程结束。In step 51070, the process ends. After the data is written to HDFS 47, the process in this section ends.
如图8所示,当操作请求为数据查询时,执行步骤51110~51180。As shown in FIG. 8, when the operation request is a data query, steps 51110 to 51180 are performed.
在步骤51110中,执行MapReduce流程。In step 51110, a MapReduce process is executed.
在步骤51120中,调用反序列化模块46从HDFS 47上读取数据。 In step 51120, the deserialization module 46 is called to read data from the HDFS 47.
在步骤51130中,通过元数据模块41读取元数据信息。元数据信息包括加密表信息。In step 51130, the metadata information is read by the metadata module 41. The metadata information includes encrypted table information.
在步骤51140中,判断查询的表、列、分区或桶是否已加密,若已加密,执行步骤51150,若没有加密,执行步骤51160。In step 51140, it is determined whether the table, column, partition or bucket of the query is encrypted. If it is encrypted, step 51150 is performed. If there is no encryption, step 51160 is performed.
在步骤51150中,在反序列化过程中,调用加密和解密模块42提供的解密算法进行数据解密。In step 51150, during the deserialization process, the decryption algorithm provided by the encryption and decryption module 42 is invoked for data decryption.
在步骤51160中,数据读取完毕后,MapReduce结束。In step 51160, after the data is read, MapReduce ends.
在步骤51170中,将查询数据返回给用户。In step 51170, the query data is returned to the user.
在步骤51180中,结束流程。将查询数据返回给用户后,流程结束。In step 51180, the flow ends. After the query data is returned to the user, the process ends.
如图9所示,当操作请求为对已有数据加密保护时,执行步骤5121~51211。As shown in FIG. 9, when the operation request is to encrypt and protect the existing data, steps 5121 to 51211 are performed.
在步骤5121中,对源表(列、分区或桶)执行查询。In step 5121, a query is performed on the source table (columns, partitions, or buckets).
在步骤5122中,获取所要查询的数据。In step 5122, the data to be queried is obtained.
在步骤5123中,创建同源表数据字典相同,但不加密的普通临时表。In step 5123, a common temporary table with the same homologous table data dictionary but no encryption is created.
在步骤5124中,将上面查询的数据导入到临时表中。In step 5124, the data queried above is imported into the temporary table.
在步骤5125中,执行MapReduce。In step 5125, MapReduce is executed.
在步骤5126中,在MapReduce流程的最后,调用序列化模块45向HDFS 47序列化数据。In step 5126, at the end of the MapReduce flow, the serialization module 45 is invoked to serialize the data to the HDFS 47.
在步骤5127中、判断源表的一些表、列、分区或桶是否需要加密,若需要,则执行步骤5128,若不需要,则执行步骤5129。In step 5127, it is determined whether some tables, columns, partitions, or buckets of the source table need to be encrypted. If necessary, step 5128 is performed. If not, step 5129 is performed.
在步骤5128中,在序列化过程中,调用加密和解密模块42提供的加密算法进行数据加密。In step 5128, during the serialization process, the encryption algorithm provided by the encryption and decryption module 42 is invoked for data encryption.
在步骤5129中,数据写入HDFS 47中,MapReduce流程结束。将加密后的数据以覆盖方式写入源表中。In step 5129, the data is written to HDFS 47 and the MapReduce process ends. The encrypted data is written to the source table in an overwrite manner.
在步骤S51210中,删除普通临时表,以节省系统的存储空间。In step S51210, the ordinary temporary table is deleted to save the storage space of the system.
在步骤S51211中,结束流程。删除普通临时表后,该部分流程结束。In step S51211, the flow is ended. After the normal temporary table is deleted, the part of the process ends.
其中,创建临时表的步骤与对表的查询步骤的先后顺序并不局限于此。The order of creating the temporary table and the querying step of the table are not limited thereto.
在图4所示的结构框图中,由于数据加密和解密模块42的存在,使得加密 或解密算法已经加载到系统中。可以通过用户自定义函数(User Defined Functions,UDF)定义加密函数和解密函数,而无需再进行数据字典的定义,从而可以不需要通过语句解析模块44解析获取加密信息,以及将加密信息存储元数据库的过程。In the block diagram shown in FIG. 4, encryption is performed due to the presence of the data encryption and decryption module 42. Or the decryption algorithm has been loaded into the system. The encryption function and the decryption function can be defined by a User Defined Functions (UDF) without having to define the data dictionary, so that the encryption information can be obtained without parsing by the statement parsing module 44, and the encrypted information can be stored in the metabase. the process of.
比如:select a,encode(b)from tbl,通过加密算法中的encode接口对表tbl的b字段内容进行了加密,将加密后的结果a和encode(b)导入到预先定义的表tbl_encrypt,表tbl_encrypt可作为数据加密后的存放地。在收到查询请求时,则调用解密函数decode,如:select a,decode(c)from tbl_encrypt,c的内容即为encode(b),这种方式不需要像在创建表或修改表那样在数据字典中定义加密的表、列、分区或桶,可以对原始SQL语句进行改造,增加对用户自定义函数的调用,并且要人为的确保加密的表、列、分区或桶和解密时完全一致。For example: select a, encode(b) from tbl, encrypt the contents of the b field of the table tbl through the encode interface in the encryption algorithm, and import the encrypted result a and encode(b) into the predefined table tbl_encrypt, the table Tbl_encrypt can be used as a place to store data after encryption. When the query request is received, the decryption function decode is called, such as: select a, decode(c) from tbl_encrypt, and the content of c is encode(b). This way does not need to be in the data like creating a table or modifying a table. The dictionary defines the encrypted table, column, partition or bucket. It can modify the original SQL statement, increase the call to the user-defined function, and artificially ensure that the encrypted table, column, partition or bucket is exactly the same when decrypted.
综上所述,本公开通过用户权限控制和数据加密对Hive数据进行保护,通过对Hive系统的用户实施接入控制及列、分区或桶级别的权限控制,并结合对用户数据进行基于表、列、分区或桶级别的加密和解密,保护Hive数据仓库中的数据,减少了相关技术中存在的安全机制不健全,操作繁琐以及耗时,无法对表、列、分区及桶进行灵活加密,保密程度不高,效率低下及部署成本较高的现象。In summary, the present disclosure protects Hive data through user rights control and data encryption, and implements access control and column, partition, or bucket level authority control for users of the Hive system, and combines user data based on tables, Encryption and decryption at the column, partition or bucket level, protecting the data in the Hive data warehouse, reducing the security mechanism in the related technology, the operation is cumbersome and time-consuming, and the table, column, partition and bucket cannot be flexibly encrypted. The phenomenon of low confidentiality, inefficiency and high deployment costs.
第四实施例Fourth embodiment
本公开实施例还提供了一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述任一方法。Embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing computer executable instructions arranged to perform any of the methods described above.
第五实施例Fifth embodiment
本公开实施例提供了一种电子设备的硬件结构示意图。参见图10,该电子设备包括:Embodiments of the present disclosure provide a hardware structure diagram of an electronic device. Referring to FIG. 10, the electronic device includes:
至少一个处理器(processor)100,图10中以一个处理器100为例;和存储器(memory)101,还可以包括通信接口(Communications Interface)102和总线103。其中,处理器100、通信接口102、存储器101可以通过总线103完成相互间的通信。通信接口102可以用于信息传输。处理器100可以调用存储器101中的逻辑指令,以执行上述方法。At least one processor 100, which is exemplified by a processor 100 in FIG. 10; and a memory 101, may further include a communication interface 102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 can complete communication with each other through the bus 103. Communication interface 102 can be used for information transmission. The processor 100 can call logic instructions in the memory 101 to perform the above method.
此外,上述的存储器101中的逻辑指令可以通过软件功能单元的形式实现 并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。In addition, the logic instructions in the memory 101 described above can be implemented in the form of software functional units. And when sold or used as a stand-alone product, it can be stored on a computer readable storage medium.
存储器101作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令或模块。处理器100通过运行存储在存储器101中的软件程序、指令或模块,从而执行功能应用以及数据处理。The memory 101 is a computer readable storage medium, and can be used to store a software program, a computer executable program, a program instruction or a module corresponding to the method in the embodiment of the present disclosure. The processor 100 performs functional applications and data processing by running software programs, instructions or modules stored in the memory 101.
存储器101可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器101可以包括高速随机存取存储器,还可以包括非易失性存储器。The memory 101 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal, and the like. Further, the memory 101 may include a high speed random access memory, and may also include a nonvolatile memory.
本公开的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(Read-only Memory,ROM)、随机存储存储器(Random-Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。The technical solution of the present disclosure may be embodied in the form of a software product stored in a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) Performing all or part of the steps of the method of the embodiments of the present disclosure. The foregoing storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random storage memory (RAM), a magnetic disk, or an optical disk. A medium that can store program code, or a transitory storage medium.
工业实用性Industrial applicability
本公开提供的数据仓库中的数据保护方法及装置,通过对数据仓库中的用户实施接入控制及列、分区或桶级别的权限控制,对用户数据进行加密,可以对数据仓库中的数据进行保护。 The data protection method and device in the data warehouse provided by the disclosure can perform user access control on the data warehouse and permission control at the column, partition or bucket level to encrypt the user data, and can perform data on the data warehouse. protection.

Claims (19)

  1. 一种数据仓库中的数据保护方法,包括:A data protection method in a data warehouse, comprising:
    接收用户请求,其中,所述用户请求中携带有用户身份信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种;Receiving a user request, where the user request carries user identity information and an operation request, where the operation request includes a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request. One or more;
    判断所述用户身份信息及所述操作请求是否合法;Determining whether the user identity information and the operation request are legal;
    若所述身份信息和所述操作请求均合法,则判断所述操作请求的操作类型,其中,所述操作类型包括授权操作和非授权操作;以及If the identity information and the operation request are both legal, determining an operation type of the operation request, where the operation type includes an authorization operation and an unauthorized operation;
    若所述操作请求为所述非授权操作,则对所述数据仓库中的新增数据或已有数据进行加密。If the operation request is the unauthorized operation, the new data or the existing data in the data warehouse is encrypted.
  2. 如权利要求1所述的方法,其中,所述判断所述用户身份信息及所述操作请求是否合法,包括:The method of claim 1, wherein the determining the user identity information and the operation request are legal, comprising:
    根据所述用户身份信息,判断所述用户身份信息对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法;以及Determining, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
    判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。Determining whether the permission to execute the operation request is a preset operation authority of the user, and if so, the operation request is legal.
  3. 如权利要求2所述的方法,其中,所述非授权操作包括:创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。The method of claim 2, wherein the unauthorized operation comprises: creating a table operation, a data import operation, modifying a table operation, encrypting an existing data, and a data query operation.
  4. 如权利要求3所述的方法,其中,对所述数据仓库中的所述新增数据进行加密,包括:The method of claim 3, wherein encrypting the new data in the data repository comprises:
    获取所述创建表操作,并根据所述创建表操作中携带的表的结构信息创建表;Obtaining the create table operation, and creating a table according to the structural information of the table carried in the create table operation;
    判断所述创建表操作中是否携带有第一加密配置信息,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或 多种,加密算法以及解密算法;Determining whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes one of a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. Multiple, encryption algorithms and decryption algorithms;
    若所述创建表操作中携带有第一加密配置信息,则将所述第一加密配置信息存储在加密信息表中;If the creation of the table operation carries the first encryption configuration information, storing the first encryption configuration information in the encryption information table;
    获取所述数据导入操作,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶;Obtaining the data import operation, where the data import operation carries one of a name of a first target table, a name of a first target column, a name of a first target partition, and a name of a first target bucket that need to import data. Or a plurality, the first target table includes one or more tables in the created table, the first target column includes one or more columns of the created table, and the first target partition includes the created table One or more partitions, the first target bucket including one or more buckets of the created table;
    判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法;Determining whether one or more of the first target table, the first target column, the first target partition, and the first target bucket store a corresponding encryption algorithm in the encryption information table;
    若是,则获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法;If yes, obtaining an encryption algorithm corresponding to one or more of the first target table, the first target column, the first target partition, and the first target bucket;
    获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据;以及Obtaining data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and using the obtained encryption algorithm to compare data to be imported Encrypting to obtain the first encrypted data;
    将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。And correspondingly writing the first encrypted data into one or more of the first target table, the first target column, the first target partition, and the first target bucket.
  5. 如权利要求4所述的方法,其,对所述数据仓库中的所述已有数据进行加密,包括:The method of claim 4, wherein encrypting the existing data in the data repository comprises:
    获取所述修改表操作;Obtaining the modification table operation;
    判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,加密算法以及解密算法; Determining whether the modification table operation carries the second encryption configuration information, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, a name of the second target column, a name of the second target partition, and a second One or more of the names of the target buckets, an encryption algorithm, and a decryption algorithm;
    若所述修改表操作中携带有第二加密配置信息,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密;以及And if the modification table operation carries the second encryption configuration information, using the encryption algorithm carried in the modification table operation, the second target table, the second target column, the second target partition, and the Encrypting data in one or more of the second target buckets;
    将所述第二加密配置信息存储在所述加密信息表中。The second encrypted configuration information is stored in the encrypted information table.
  6. 如权利要求4所述的方法,其中,对所述数据仓库中的所述已有数据进行加密,包括:The method of claim 4 wherein encrypting said existing data in said data repository comprises:
    获取所述对已有数据的加密操作,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,加密算法以及解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶;Acquiring the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the table to be processed and the third encryption configuration information, where the third encryption configuration information includes the third target table that needs to be encrypted. One or more of a name, a name of a third target column, a name of a third target partition, and a name of a third target bucket, an encryption algorithm, and a decryption algorithm, the third target table including in the to-be-processed table One or more tables, the third target column including one or more columns of the pending table, the third target partition including one or more partitions of the pending table, the third target bucket Include one or more buckets of the pending table;
    创建与所述待处理表具有相同结构的临时表;Creating a temporary table having the same structure as the pending table;
    获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据;Obtaining the data query operation, and querying one or more of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data, obtain query data;
    将所述查询数据对应写入所述临时表中;Correspondingly writing the query data into the temporary table;
    利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据;Encrypting the query data written in the temporary table by using the encryption algorithm carried in the encryption operation of the existing data to obtain the second encrypted data;
    将所述第二加密数据以覆盖方式对应写入所述待处理表中;Writing the second encrypted data into the to-be-processed table in an overlay manner;
    将所述第三加密配置信息存储在所述加密信息表中;以及Storing the third encrypted configuration information in the encrypted information table;
    删除所述临时表。 Delete the temporary table.
  7. 如权利要求6所述的方法,其中,所述获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区、所述第三目标桶中的一种或多种中的数据,获得查询数据,包括:The method of claim 6, wherein the obtaining the data query operation and querying the third target table, the third target column, the third target partition, and the location according to the data query operation Data in one or more of the third target buckets, obtaining query data, including:
    获取所述数据查询操作,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种;Acquiring the data query operation, where the data query operation carries the name of the third target table, the name of the third target column, the name of the third target partition, and the name of the third target bucket One or more
    根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据;Reading one or more of the third target table, the third target column, the third target partition, and the third target bucket from a distributed file system according to the data query operation The data;
    判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中是否存在已加密数据;Determining whether there is encrypted data in one or more of the third target table, the third target column, the third target partition, and the third target bucket;
    若是,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法;以及And if so, obtaining, from the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located;
    利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。Using the decryption algorithm, the encrypted data is decrypted to obtain the query data.
  8. 如权利要求1所述的方法,判断所述操作请求的操作类型之后,所述方法还包括:The method of claim 1, after determining the type of operation of the operation request, the method further comprises:
    若所述操作请求为授权操作,则判断所述授权操作中携带的所述用户身份信息对应的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户;以及If the operation request is an authorization operation, determining whether one or more of the table, column, partition, and bucket in which the user corresponding to the user identity information carried in the authorization operation authorizes other users to operate belongs to the User;
    若是,则执行所述授权操作,若不是,取消所述授权操作。If yes, the authorization operation is performed, and if not, the authorization operation is cancelled.
  9. 如权利要求4~6任意一项所述的方法,还包括:The method of any one of claims 4 to 6, further comprising:
    采用预设加密算法,对所述加密信息表中的数据进行加密。The data in the encrypted information table is encrypted by using a preset encryption algorithm.
  10. 一种数据仓库中的数据保护装置,包括:A data protection device in a data warehouse, comprising:
    接收模块,设置为接收用户请求,其中,所述用户请求中携带有用户身份 信息和操作请求,所述操作请求包括表级别的操作请求、列级别的操作请求、分区级别的操作请求和桶级别的操作请求中的一种或多种;a receiving module, configured to receive a user request, where the user request carries a user identity Information and operation requests, the operation request including one or more of a table level operation request, a column level operation request, a partition level operation request, and a bucket level operation request;
    第一判断模块,设置为判断所述用户身份信息及所述操作请求是否合法;a first determining module, configured to determine whether the user identity information and the operation request are legal;
    第二判断模块,设置为当所述第一判断模块的判断所述身份信息和所述操作请求均合法时,判断所述操作请求的操作类型,其中,所述操作类型包括授权操作和非授权操作;以及a second determining module, configured to determine, when the first determining module determines that the identity information and the operation request are both legal, the operation type of the operation request, where the operation type includes an authorized operation and an unauthorized Operation;
    加密模块,设置为当所述第二判断模块判断所述操作请求为非授权操作时,对所述数据仓库中的新增数据或已有数据进行加密。The encryption module is configured to encrypt new data or existing data in the data warehouse when the second determining module determines that the operation request is an unauthorized operation.
  11. 如权利要求10所述的装置,其中,所述第一判断模块包括:The apparatus of claim 10, wherein the first determining module comprises:
    第一判断单元,设置为根据所述用户身份信息,判断所述用户身份信息对应的用户是否存在于预先存储的白名单中,若存在,则所述用户身份信息合法;以及The first determining unit is configured to determine, according to the user identity information, whether the user corresponding to the user identity information exists in a pre-stored white list, and if yes, the user identity information is legal;
    第二判断单元,设置为判断执行所述操作请求的权限是否是所述用户所具有的预设操作权限,若是,则所述操作请求合法。The second determining unit is configured to determine whether the permission to execute the operation request is a preset operation authority of the user, and if yes, the operation request is legal.
  12. 如权利要求11所述的装置,其中,所述非授权操作包括:创建表操作、数据导入操作、修改表操作、对已有数据的加密操作以及数据查询操作。The apparatus of claim 11, wherein the unauthorized operation comprises: creating a table operation, a data import operation, modifying a table operation, encrypting an existing data, and a data query operation.
  13. 如权利要求12所述的装置,其中,所述加密模块包括:The apparatus of claim 12 wherein said encryption module comprises:
    第一获取单元,设置为获取所述创建表操作,并根据所述创建表操作中携带的表的结构信息创建表;a first obtaining unit, configured to acquire the create table operation, and create a table according to the structural information of the table carried in the create table operation;
    第三判断单元,设置为判断所述创建表操作中是否携带有第一加密配置信息,所述第一加密配置信息包括需要加密的表的名称、列的名称、分区的名称和桶的名称中的一种或多种,加密算法以及解密算法;The third determining unit is configured to determine whether the first encryption configuration information is carried in the creation table operation, where the first encryption configuration information includes a name of a table to be encrypted, a name of a column, a name of a partition, and a name of a bucket. One or more, an encryption algorithm and a decryption algorithm;
    第一存储单元,设置为当所述第三判断单元的判断所述创建表操作中携带 有第一加密配置信息时,将所述第一加密配置信息存储在加密信息表中;a first storage unit, configured to be carried in the operation of determining the creation table by the third determining unit When the first encryption configuration information is available, the first encryption configuration information is stored in the encryption information table;
    第二获取单元,设置为获取所述数据导入操作,所述数据导入操作中携带有需要导入数据的第一目标表的名称、第一目标列的名称、第一目标分区的名称和第一目标桶的名称中的一种或多种,所述第一目标表包括创建的表中的一个或多个表,所述第一目标列包括创建的表的一个或多个列,所述第一目标分区包括创建的表的一个或多个分区,所述第一目标桶包括创建的表的一个或多个桶;a second obtaining unit, configured to acquire the data import operation, where the data import operation carries a name of a first target table that needs to import data, a name of the first target column, a name of the first target partition, and a first target One or more of the names of the buckets, the first target table comprising one or more tables in the created table, the first target column comprising one or more columns of the created table, the first The target partition includes one or more partitions of the created table, the first target bucket including one or more buckets of the created table;
    第四判断单元,设置为判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中是否存储有对应的加密算法;a fourth determining unit, configured to determine whether one or more of the first target table, the first target column, the first target partition, and the first target bucket are in the encryption information table Storing a corresponding encryption algorithm;
    第三获取单元,设置为当所述第四判断单元的判断所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种在所述加密信息表中存储有对应的加密算法时,获取与所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种对应的加密算法;a third acquiring unit, configured to: when the fourth determining unit determines one or more of the first target table, the first target column, the first target partition, and the first target bucket Acquiring one or more of the first target table, the first target column, the first target partition, and the first target bucket when a corresponding encryption algorithm is stored in the encryption information table Corresponding encryption algorithm;
    第一加密单元,设置为获取所要导入所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中一个或多个中的数据,并利用获取的加密算法对所要导入的数据进行加密,获得第一加密数据;以及a first encryption unit, configured to acquire data to be imported into one or more of the first target table, the first target column, the first target partition, and the first target bucket, and utilize the acquired The encryption algorithm encrypts the data to be imported to obtain the first encrypted data;
    第一写入单元,设置为将所述第一加密数据对应写入到所述第一目标表、所述第一目标列、所述第一目标分区和所述第一目标桶中的一种或多种中。a first writing unit, configured to write the first encrypted data correspondingly to one of the first target table, the first target column, the first target partition, and the first target bucket Or a variety of.
  14. 如权利要求13所述的装置,其中,所述加密模块包括:The apparatus of claim 13 wherein said encryption module comprises:
    第四获取单元,设置为获取所述修改表操作;a fourth obtaining unit, configured to acquire the modified table operation;
    第五判断单元,设置为判断所述修改表操作中是否携带有第二加密配置信息,所述第二加密配置信息包括需要加密的第二目标表的名称、第二目标列的 名称、第二目标分区的名称和第二目标桶的名称中的一种或多种,以及加密或解密算法;a fifth determining unit, configured to determine whether the second encryption configuration information is carried in the modification table operation, where the second encryption configuration information includes a name of the second target table that needs to be encrypted, and a second target column One or more of a name, a name of a second target partition, and a name of a second target bucket, and an encryption or decryption algorithm;
    第二加密单元,设置为在所述第五判断单元的判断所述修改表操作中携带有第二加密配置信息时,利用所述修改表操作中携带的加密算法对所述第二目标表、所述第二目标列、所述第二目标分区、所述第二目标桶中的一种或多种中的数据进行加密;以及a second encryption unit, configured to: when the second determination unit carries the second encryption configuration information in the operation of the modification table, using the encryption algorithm carried in the modification table operation on the second target table, Encrypting data in one or more of the second target column, the second target partition, and the second target bucket;
    第二存储单元,设置为将所述第二加密配置信息存储在所述加密信息表中。The second storage unit is configured to store the second encrypted configuration information in the encrypted information table.
  15. 如权利要求13所述的装置,其中,所述加密模块包括:The apparatus of claim 13 wherein said encryption module comprises:
    第五获取单元,设置为获取所述对已有数据的加密操作,所述对已有数据的加密操作中携带有待处理表的名称以及第三加密配置信息,所述第三加密配置信息包括需要加密的第三目标表的名称、第三目标列的名称、第三目标分区的名称和第三目标桶的名称中的一种或多种,以及加密或解密算法,所述第三目标表包括所述待处理表中的一个或多个表,所述第三目标列包括所述待处理表的一个或多个列,所述第三目标分区包括所述待处理表的一个或多个分区,所述第三目标桶包括所述待处理表的一个或多个桶;The fifth obtaining unit is configured to obtain the encryption operation on the existing data, where the encryption operation on the existing data carries the name of the to-be-processed table and the third encryption configuration information, where the third encryption configuration information includes One or more of a name of the encrypted third target table, a name of the third target column, a name of the third target partition, and a name of the third target bucket, and an encryption or decryption algorithm, the third target table including One or more tables in the to-be-processed table, the third target column includes one or more columns of the to-be-processed table, and the third target partition includes one or more partitions of the to-be-processed table The third target bucket includes one or more buckets of the to-be-processed table;
    创建单元,设置为创建与所述待处理表具有相同结构的临时表;Creating a unit, set to create a temporary table having the same structure as the pending table;
    查询单元,设置为获取所述数据查询操作,并依据所述数据查询操作查询所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据,获得查询数据;a query unit, configured to acquire the data query operation, and query one of the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more kinds, obtaining query data;
    第二写入单元,设置为将所述查询数据对应写入所述临时表中;a second writing unit, configured to write the query data into the temporary table correspondingly;
    第三加密单元,设置为利用所述对已有数据的加密操作中携带的加密算法,对写入所述临时表中的查询数据进行加密,获得第二加密数据;The third encryption unit is configured to encrypt the query data written in the temporary table by using the encryption algorithm carried in the encryption operation on the existing data to obtain the second encrypted data;
    第三写入单元,设置为将所述第二加密数据以覆盖方式对应写入所述待处 理表中;a third writing unit, configured to write the second encrypted data into the to-be-received manner in an overlay manner In the table;
    第三存储单元,设置为将所述第三加密配置信息存储在所述加密信息表中;以及a third storage unit configured to store the third encrypted configuration information in the encrypted information table;
    删除单元,设置为删除所述临时表。Delete the unit, set to delete the temporary table.
  16. 如权利要求15所述的装置,其中,所述查询单元包括:The apparatus of claim 15 wherein said querying unit comprises:
    第一获取子单元,设置为获取所述数据查询操作,所述数据查询操作中携带所述第三目标表的名称、所述第三目标列的名称、所述第三目标分区的名称以及所述第三目标桶的名称中的一种或多种;a first obtaining subunit, configured to acquire the data query operation, where the data query operation carries a name of the third target table, a name of the third target column, a name of the third target partition, and a location One or more of the names of the third target buckets;
    查询子单元,设置为根据所述数据查询操作,从分布式文件系统中读取所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中的数据;Querying a subunit, configured to read from the distributed file system the third target table, the third target column, the third target partition, and the third target bucket according to the data query operation Data in one or more;
    判断子单元,设置为判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中是否存在已加密数据;a determining subunit, configured to determine whether the encrypted data exists in one or more of the third target table, the third target column, the third target partition, and the third target bucket;
    第二获取子单元,设置为当所述判断子单元的判断所述第三目标表、所述第三目标列、所述第三目标分区以及所述第三目标桶中的一种或多种中存在所述已加密数据时,从所述加密信息表中获取与所述已加密数据所在的表、列、分区和桶中的一种或多种相对应的解密算法;以及a second obtaining subunit, configured to determine one or more of the third target table, the third target column, the third target partition, and the third target bucket when the determining subunit determines And obtaining, in the encrypted information table, a decryption algorithm corresponding to one or more of a table, a column, a partition, and a bucket in which the encrypted data is located; and
    解密子单元,设置为利用所述解密算法,对所述已加密数据进行解密,获得所述查询数据。The decryption subunit is configured to decrypt the encrypted data using the decryption algorithm to obtain the query data.
  17. 如权利要求10所述的装置,所述装置还包括:The device of claim 10, the device further comprising:
    授权模块,设置为当所述第二判断模块判断所述操作请求为授权操作,判断所述授权操作中携带的所述用户身份信息对应的用户授权其他用户进行操作的表、列、分区和桶中的一种或多种是否属于所述用户,若是,则执行所述授 权操作,若不是,取消所述授权操作。And an authorization module, configured to: when the second determining module determines that the operation request is an authorization operation, determining that the user corresponding to the user identity information carried in the authorization operation authorizes other users to operate the table, column, partition, and bucket Whether one or more of the users belong to the user, and if so, perform the grant Right operation, if not, cancel the authorization operation.
  18. 如权利要求13~15任意一项所述的装置,其中,所述加密模块还包括:The device according to any one of claims 13 to 15, wherein the encryption module further comprises:
    第四加密单元,设置为采用预设加密算法,对所述加密信息表中的数据进行加密。The fourth encryption unit is configured to encrypt the data in the encrypted information table by using a preset encryption algorithm.
  19. 一种非暂态计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-9中任一项的方法。 A non-transitory computer readable storage medium storing computer executable instructions arranged to perform the method of any of claims 1-9.
PCT/CN2017/072699 2016-01-26 2017-01-25 Data protection method and apparatus in data warehouse WO2017129138A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610053324.0 2016-01-26
CN201610053324.0A CN106997368A (en) 2016-01-26 2016-01-26 Data guard method and device in a kind of data warehouse

Publications (1)

Publication Number Publication Date
WO2017129138A1 true WO2017129138A1 (en) 2017-08-03

Family

ID=59397402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/072699 WO2017129138A1 (en) 2016-01-26 2017-01-25 Data protection method and apparatus in data warehouse

Country Status (2)

Country Link
CN (1) CN106997368A (en)
WO (1) WO2017129138A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188573A (en) * 2019-05-27 2019-08-30 深圳前海微众银行股份有限公司 Subregion authorization method, device, equipment and computer readable storage medium
CN110222046A (en) * 2019-04-28 2019-09-10 阿里巴巴集团控股有限公司 Processing method, device, server and the storage medium of table data
CN110457307A (en) * 2018-05-03 2019-11-15 阿里巴巴集团控股有限公司 Metadata management system, user's cluster creation method, device, equipment and medium
CN113468552A (en) * 2021-05-31 2021-10-01 珠海大横琴科技发展有限公司 Data processing method and device
CN110188573B (en) * 2019-05-27 2024-06-04 深圳前海微众银行股份有限公司 Partition authorization method, partition authorization device, partition authorization equipment and computer readable storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532787A (en) * 2019-07-26 2019-12-03 苏州浪潮智能科技有限公司 It is a kind of for strengthening the method and apparatus of the safety of the confidential resources in cluster
CN110636043A (en) * 2019-08-16 2019-12-31 中国人民银行数字货币研究所 File authorization access method, device and system based on block chain
CN110717153B (en) * 2019-09-30 2021-08-24 新华三大数据技术有限公司 Authority verification method and device
CN111177753B (en) * 2019-12-24 2021-03-23 广州极点三维信息科技有限公司 Encryption processing method, device and equipment for Java content warehouse data
CN111324799B (en) * 2020-02-05 2021-05-04 星辰天合(北京)数据科技有限公司 Search request processing method and device
CN111191268B (en) * 2020-04-10 2020-08-07 支付宝(杭州)信息技术有限公司 Storage method, device and equipment capable of verifying statement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN102917006A (en) * 2012-08-31 2013-02-06 杭州斯凯网络科技有限公司 Method and device for achieving uniform control management of computing resource and object authority
US20130246476A1 (en) * 2007-10-11 2013-09-19 Varonis Systems Inc. Visualization of access permission status
CN105144159A (en) * 2013-02-13 2015-12-09 脸谱公司 HIVE table links

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100568251C (en) * 2006-03-23 2009-12-09 沈明峰 The guard method of security files under cooperative working environment
US9148285B2 (en) * 2013-01-21 2015-09-29 International Business Machines Corporation Controlling exposure of sensitive data and operation using process bound security tokens in cloud computing environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130246476A1 (en) * 2007-10-11 2013-09-19 Varonis Systems Inc. Visualization of access permission status
CN101866360A (en) * 2010-06-28 2010-10-20 北京用友政务软件有限公司 Data warehouse authentication method and system based on object multidimensional property space
CN102917006A (en) * 2012-08-31 2013-02-06 杭州斯凯网络科技有限公司 Method and device for achieving uniform control management of computing resource and object authority
CN105144159A (en) * 2013-02-13 2015-12-09 脸谱公司 HIVE table links

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457307A (en) * 2018-05-03 2019-11-15 阿里巴巴集团控股有限公司 Metadata management system, user's cluster creation method, device, equipment and medium
CN110457307B (en) * 2018-05-03 2023-10-24 阿里巴巴集团控股有限公司 Metadata management system, user cluster creation method, device, equipment and medium
CN110222046A (en) * 2019-04-28 2019-09-10 阿里巴巴集团控股有限公司 Processing method, device, server and the storage medium of table data
CN110222046B (en) * 2019-04-28 2023-11-03 北京奥星贝斯科技有限公司 List data processing method, device, server and storage medium
CN110188573A (en) * 2019-05-27 2019-08-30 深圳前海微众银行股份有限公司 Subregion authorization method, device, equipment and computer readable storage medium
CN110188573B (en) * 2019-05-27 2024-06-04 深圳前海微众银行股份有限公司 Partition authorization method, partition authorization device, partition authorization equipment and computer readable storage medium
CN113468552A (en) * 2021-05-31 2021-10-01 珠海大横琴科技发展有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN106997368A (en) 2017-08-01

Similar Documents

Publication Publication Date Title
WO2017129138A1 (en) Data protection method and apparatus in data warehouse
KR101238848B1 (en) Versatile Content Control With Partitioning
KR101214497B1 (en) Memory System with versatile content control
US9515832B2 (en) Process authentication and resource permissions
US8613103B2 (en) Content control method using versatile control structure
US8140843B2 (en) Content control method using certificate chains
US10666647B2 (en) Access to data stored in a cloud
JP4857284B2 (en) Control structure generation system for multi-purpose content control
US20080034440A1 (en) Content Control System Using Versatile Control Structure
US20080022395A1 (en) System for Controlling Information Supplied From Memory Device
US20080010458A1 (en) Control System Using Identity Objects
JP7438607B2 (en) Secure multilevel access to obfuscated data for analytics
KR20120093375A (en) Content control method using certificate revocation lists
WO2014207554A2 (en) Method and apparatus for providing database access authorization
CN106575342A (en) Kernel program including relational data base, and method and device for executing said program
JP2008524758A5 (en)
KR20090052321A (en) Content control system and method using versatile control structure
KR20070087175A (en) Control structure for versatile content control and method using structure
KR20090026357A (en) Content control system and method using certificate chains
WO2016112799A1 (en) File processing method and apparatus
KR20090028806A (en) Content control system and method using certificate revocation lists
WO2008008243A2 (en) Control system and method using identity objects
KR20210143846A (en) encryption systems
NZ618683B2 (en) Access control to data stored in a cloud

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17743753

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17743753

Country of ref document: EP

Kind code of ref document: A1