CN115146245A - Hive series data encryption method and system with dynamically managed key authority - Google Patents

Hive series data encryption method and system with dynamically managed key authority Download PDF

Info

Publication number
CN115146245A
CN115146245A CN202211085531.6A CN202211085531A CN115146245A CN 115146245 A CN115146245 A CN 115146245A CN 202211085531 A CN202211085531 A CN 202211085531A CN 115146245 A CN115146245 A CN 115146245A
Authority
CN
China
Prior art keywords
key
user
hive
authority
hql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211085531.6A
Other languages
Chinese (zh)
Other versions
CN115146245B (en
Inventor
卢薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bizhi Technology Co ltd
Original Assignee
Hangzhou Bizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bizhi Technology Co ltd filed Critical Hangzhou Bizhi Technology Co ltd
Priority to CN202211085531.6A priority Critical patent/CN115146245B/en
Publication of CN115146245A publication Critical patent/CN115146245A/en
Application granted granted Critical
Publication of CN115146245B publication Critical patent/CN115146245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/06Network architectures or network communication protocols for network security for supporting key management in a packet data network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a Hive series data encryption method and a Hive series data encryption system with dynamically managed key authorities, wherein the method comprises the following steps: s1, applying for the hive table authority; s2, submitting HQL; s3, acquiring a user authentication certificate; s4, kerberos authentication; s5, submitting an HQL task; s6, creating a table and a table key; s7, updating table metadata information; s8, submitting MapReduce operation; s9, acquiring a table key; s10, acquiring user table field authority; s11, field decryption based on serialization; and S12, field encryption based on deserialization. The Hive series data encryption scheme with dynamically managed key authorities can realize automatic data encryption and decryption and key management based on user authorities, meet the safety requirements in the actual production environment, and improve the encryption efficiency and the data safety.

Description

Hive series data encryption method and system with dynamically managed key authority
Technical Field
The invention relates to the field of computer communication and data security, in particular to a Hive series data encryption method and system with dynamically managed key authorities.
Background
With the rapid increase of the data volume of enterprises from TB level to PB level, a big data assembly with Hadoop as a core becomes a key technology for building a modern data warehouse and a data analysis platform. Hive is an indispensable tool, mainly because it provides an SQL (Structured Query Language) dialect (called HiveQL or HQL) that can Query mass data stored in a Hadoop Distributed File System (HDFS) and an HBase (a non-relational database System based on column storage), thereby reducing a barrier to migrating a traditional data warehouse implemented by an SQL-based relational database to Hadoop. Unlike traditional relational databases, which already provide a sophisticated set of data security solutions, hive solves the data security problem in an open manner. Although Hive provides authority control based on roles, hive only provides an open interface for enterprise P0 level data security requirements of fine-grained authority control, desensitization, encryption and the like of data, and a specific solution needs to be realized by enterprises as required. Therefore, it becomes a technical problem to be solved by many enterprises to realize Hive rank data encryption with dynamically managed key authority.
Data encryption can guarantee data storage safety, and data decryption based on authority can guarantee data access safety. In most production scenarios, hive data will be stored in the HDFS. For data storage security, HDFS provides a transparent encryption function, which is a file-level data encryption scheme. By "transparent encryption" is meant that the encryption and decryption of the file is transparent and imperceptible to the HDFS client. Therefore, the transparent encryption of the HDFS can only prevent an illegal user from destroying the confidentiality of data by stealing a disk and the like, but cannot provide data access security guarantee based on user rights for the Hive upper-layer application program. In addition, compared with data encryption at a file level, most enterprises prefer to implement column-level (or field-level) encryption on table data in Hive, that is, only a part of secret-related columns in the table are encrypted, so as to alleviate the negative impact on business performance caused by data encryption and decryption. At present, hive column-level encryption commonly used by enterprises is mostly realized based on Hive custom Functions (UDFs). They first implement encryption and decryption functions using the UDF interface provided by Hive, and then implement encryption or decryption of table fields in Hive by binding encryption or decryption functions over the encryption column.
Existing data encryption schemes suffer from two types of problems: 1) Encryption and decryption functions must be added manually in the process of writing the service HQL, which affects the service development efficiency on one hand, and destroys "select" on the other hand, so that the MapReduce task can not be triggered by the "select" originally; 2) The key is stored in the encryption and decryption functions, cannot be dynamically acquired, is easy to crack, and has great potential safety hazard. Therefore, the invention makes up the defects of the existing Hive column-level data encryption scheme in the aspects of efficiency and safety by realizing the technical innovation of 1) automatic data encryption and decryption and 2) key management based on user authority.
As described above, in the existing Hive data encryption scheme, one part is implemented based on HDFS transparent encryption, and the other part is implemented based on Hive custom function. The first scheme can only encrypt files of the HDFS layer, and once data are obtained through the HDFS client, the data are in a plaintext state, so that user-level data confidentiality protection cannot be provided at a Hive upper application program layer. This encryption scheme can only be used as an underlying secondary scheme for Hive data encryption. The second scheme requires manual participation in 1) custom encryption and decryption of UDF function development, 2) registration of UDF functions as Hive permanent functions, and 3) binding of encryption and decryption of UDF functions for secret-related fields in the service HQL. The second scheme can realize Hive-series data encryption, but has obvious defects in both efficiency and safety. In the aspect of efficiency, manual participation is needed for encrypting and decrypting the UDF, the development and execution efficiency of the HQL is influenced, and automatic data encryption and decryption cannot be achieved. In the aspect of safety, the secret key is embedded in the encryption and decryption functions, so that the secret key is easy to crack, and great potential safety hazards exist.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to carry out technical innovation from two aspects of efficiency and safety, provides a Hive series data encryption scheme with dynamically managed key authority, can realize 1) automatic data encryption and decryption and 2) key management based on user authority, and meets the safety requirement in the actual production environment.
In order to achieve the above object, the present invention provides a Hive series data encryption method with dynamically managed key authority, which includes the following steps:
s1, hive table authority application: the method comprises the steps that a user applies for a table to establish authority to a user authority management unit, the user authority management unit establishes a corresponding Kerberos authentication main body and a Keytab for the user, and synchronously applies for access control and key authority establishment for the Kerberos authentication main body of the user to a key management service, and the steps correspond to dynamic key authority management;
s2, submitting HQL: a user submits a DDL statement for creating a table to an HQL submission agent unit;
s3, obtaining a user authentication certificate: the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of the user from the user authority management unit;
s4, kerberos authentication: the HQL submitting agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task: after the Kerberos authentication is passed, the DDL statement of the table is submitted to Hive as an HQL task;
s6, creating a table and a table key: hive converts the DDL statement into a DDL task, connects a key management service unit in the table building process, and creates a table key by using the table key name in the DDL statement;
s7, updating table metadata information: before the table building is finished, hive updates the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation: after the table is built, after other users submit the table query and the HQL task inserted in the table according to the steps S1-S5, hive respectively generates a feed task and a MapRed task for the HQL task, converts the HQL task into a MapReduce task and submits the MapReduce task to horn for execution;
s9, acquiring a table key: leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before performing serialization;
s10, acquiring user table field authority: if the table key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decryption based on the serialized fields: the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization: and (3) outputting the corresponding file to a leaf node on the MapRecuce operation operator tree inserted into the table by the aid of the leaf node, performing deserialization operation by the file output operator before the line data is written into the table file, repeating the step (S9) by the serializer when deserializing is performed, acquiring the table key, and encrypting the field to be encrypted in the table according to the table key.
Further, in the step S1, before submitting the HQL, the user needs to apply for Hive table authority to ensure that the HQL can be successfully executed; and the table authority application module in the user authority management unit is responsible for processing the table authority application of the user and maintaining the table authority information of the user.
Further, in step S2, the DDL statement includes a table name, a field type, an encryption field list, a key management server address, a table key name, an encryption algorithm, and serialization and deserialization mode information; and the user sends the HQL to the HQL submission agent, and the HQL submission agent submits the HQL task.
Further, in step S3, the HQL submission agent obtains a Kerberos authentication body and a keytab file of the user from the user authority management before submitting the HQL task; the Kerberos credential management module and the user authentication credential management module of the user rights management unit are responsible for maintaining these information.
Further, in step S4, the HQL submission agent conducts Kerberos authentication according to the obtained Kerberos authentication main body and the Keytab file, and submits an HQL task to Hive after the authentication is passed; after the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity.
Further, in step S5, after the HQL submission agent submits the HQL task to Hive, hive generates a corresponding task according to the HQL type; for table creation, hive generates a DDL creation table task; for table insertion, hive generates a MapRed task; for table lookup, hive generates a Fetch task.
Further, in step S6, the HQL specifies the serialization, serialization attribute, and table attribute information for the DDL create table task. The Hive converts the HQL into a DDL task, and rewrites createTable and createTableLike methods in DDLTask corresponding to DDLTask in Hive Java source codes; the rewritten Hive list building process is added with the process of building the list key by connecting the key management server; only if the table key is successfully established, the table key can be continuously executed, otherwise, the table establishing process is stopped; and rewriting the creation table of the DDL task in the Hive source code, so that the table automatically creates a table key to the key management service unit in the creation process, and updates table metadata information in the Hive metadata base, wherein fields needing desensitization are maintained in the metadata information.
Further, in step S6, after the table is successfully created, the metadata information of the Hive metadata database update table is automatically connected; in step S8, when the table is successfully created, and the user submits the table insertion and the table query HQL task, hive generates corresponding MapReduce and feed tasks, and finally translates the MapReduce job into a MapReduce job and submits the MapReduce job to Yarn for execution.
Further, in step S9, in both the serialization process when reading out the table and the deserialization process when writing in the table, the table key needs to be acquired from the key management service according to the attribute hive.kms.uri and hive.encrypt.keyname; firstly, a sequencer provides a key acquisition request to a key management service unit; then, the user authority management module acquires a user authentication main body carried in the request, and judges whether the user has the authority to create the key according to the user key access authority control in the table; the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain;
on the other hand, the invention also provides a Hive-rank-level data encryption system with dynamically managed key authorities, which is used for realizing the Hive-rank-level data encryption method with dynamically managed key authorities, and comprises a user authority management unit, a key management service unit and an HQL submission agent unit; the user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module; the table authority application module is responsible for processing a table authority application request of a user and maintaining table authority information of the user; the Kerberos voucher management module is responsible for managing Kerberos keytab files of the access users; the key management service unit comprises a user access authority management module and a key management module; the user access authority management module manages access control of a Kerberos authentication main body corresponding to a user; for a Kerberos authentication subject with access control authority, a key management module manages the security of the authentication subject for creating and acquiring keys; the HQL submission agent unit is responsible for receiving the HQL submitted by the user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
The technical scheme of the invention sets interactive logic and functional responsibility among user authority management, key management service, HQL submission agent and serialization/deserialization in the architectural design; hive creates a table scheme, serializers for automatic field encryption and decryption based on user permissions. The invention has the beneficial effects that: 1) The technical scheme of the invention can meet the Hive series data encryption requirement in an enterprise; 2) The invention can realize key management and column data encryption and decryption based on user authority, and provides safer data protection; 3) The Hive table creation scheme provided by the invention can transparentize the table key creation process; 4) The serializer com.startdt.datablack.hive.ql.captoserpe provided by the invention can realize automatic field encryption and decryption based on user rights in the serialization/deserialization process.
Drawings
Fig. 1 is a schematic diagram illustrating a design architecture of a Hive rank data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
FIG. 2 shows a schematic diagram of comparison before and after modification of a Hive list-level data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a process of creating a key in a Hive-rank data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
fig. 4 shows a flowchart for acquiring a Hive table key in a Hive rank-level data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The following describes in detail a specific embodiment of the present invention with reference to fig. 1 to 4. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Fig. 1 shows a general technical scheme architecture design diagram of a Hive rank data encryption method and system with dynamically managed key authorities according to the present invention. Based on Hive, yarn and HDFS, the encryption system comprises a user right management unit, a key management service unit, an HQL submission agent unit and functions of table creation rewriting, field encryption and decryption based on serialization/deserialization and the like.
In the Hive series data encryption system with dynamically managed key authorities, a user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module. The table authority application module is responsible for processing a table authority application request of a user and maintaining table authority information of the user. The Kerberos voucher management module is responsible for managing Kerberos keyab files of access users, each access user has a corresponding Kerberos authentication subject (Principal), and one authentication subject corresponds to one Keytab file. When Kerberos authentication is carried out, an authentication subject and a keytab file need to be provided at the same time. The user authentication credential management module maintains a mapping relationship between the user and the Kerberos credential. A Key Management Service (KMS) unit comprises a user access authority Management module and a Key Management module. And the user access authority management module manages the access control of a Kerberos authentication main body corresponding to the user. For Kerberos authentication principals with access control rights, the key management module manages the security of these authentication principals to create and obtain keys. The HQL submission agent unit is responsible for receiving the HQL submitted by the user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
Table creation rewrite module is an improved technique of the present invention. The invention rewrites the creation table of the DDL task in the Hive source code, so that the table automatically creates the table key to the key management service unit in the creation process, and updates the table metadata information in the Hive metadata base, wherein the metadata information maintains the field needing desensitization, the key management server address, the table key name, the encryption algorithm and the serialization and deserialization modes. Field encryption and decryption based on serialization and deserialization is another improved technique of the present invention. The invention provides specific implementations of serialization and deserialization.
When the HQL submission agent unit inserts and queries tasks into the Hive submission table, the Hive generates corresponding MapRed and Fetch tasks, and the tasks are converted into MapReduce jobs to be submitted to the horn to be executed. The MapReduce job is essentially an operator tree, wherein the table is read by a table scanning operator, the table is written by a file output operator, the data communication mode between the operators is a Row object, and the Row object is structured data and represents a Row of records in the table. The table scanning operator is used for reading the table files stored by the HDFS in rows, and the implementation process is as follows:
firstly, converting a file byte stream into a Writable object by means of an InputFormat object;
serialization then converts the Writable objects into Row objects that the table scan operator can handle.
Wherein, the serialization process is as follows:
firstly, acquiring user information of a current session;
then, according to the user information, obtaining the table key from the key management service unit and obtaining the user table field authority from the user authority management unit;
and finally, decrypting the field data of each column of the Writable object according to the acquired table key and the user field authority, and outputting the field data in a ciphertext to finally convert the field data into a Row object without decrypting the field data under the condition of no table key and/or field authority.
The file output operator is used for inserting the Row object into a table file stored in the HDFS, in the process, the Row object needs to be deserialized into a writeable object, and then the writeable object is converted into a byte stream by means of the outputFormat object and added into the table file on the HDFS.
The deserialization flow is as follows:
firstly, acquiring user information of a current session;
then, acquiring a table key from the key management service unit according to the table key name according to the user information;
and then, encrypting the fields needing encryption in the table by using the table key, and finally converting the fields into a writeable object.
In summary, in the technical scheme of the present invention, the Hive-series data encryption method with dynamically manageable key permissions includes the following steps:
s1, applying for Hive table authority; a user applies for a table of a user authority management unit to establish authority, the user authority management unit establishes a corresponding Kerberos authentication main body and a Keytab for the user, and synchronously applies for access control and key authority establishment for the Kerberos authentication main body of the user from a key management service, and the step corresponds to dynamic key authority management;
s2, submitting HQL; a user submits DDL statements for creating a table to an HQL submission agent unit, wherein the DDL statements comprise information such as a table name, a field type, an encryption field list, a key management server address, a table key name, an encryption algorithm, a serialization/deserialization mode and the like;
s3, obtaining a user authentication certificate; the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of the user from the user authority management unit;
s4, kerberos authentication; the HQL submitting agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task; after the Kerberos authentication is passed, the DDL statement of the built table can be submitted to Hive as an HQL task;
s6, creating a table and a table key; the Hive converts the DDL statement into a DDL task, and the DDL task is connected with a key management service unit in the table building process and creates a table key by using a table key name in the DDL statement;
s7, updating table metadata information; before the table building is finished, the Hive can update the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation; after the table is built, after other users submit the table query and the HQL task inserted in the table according to the steps S1-S5, hive respectively generates a feed task and a MapRed task for the HQL task and converts the HQL task into a MapReduce task to be submitted to horn for execution;
s9, acquiring a table key; leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before serialization;
s10, acquiring user table field authority; if the key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decrypting based on the serialized fields; the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization; the leaf nodes on the MapRecuce operation operator tree inserted into the table output operators corresponding to the files, the file output operators need to perform deserialization operation before writing the line data into the table files, the serializers also need to acquire table keys through the step S9 when performing deserialization, and then fields needing to be encrypted in the table are encrypted according to the table keys.
The detailed technical content in each step is as follows:
in step S1, before submitting the HQL, the user needs to apply for Hive table authority to ensure that the HQL can be successfully executed. Three rights will be exemplified here for creating tables, table look-up and table insertion in connection with field encryption and decryption. And a table authority application module in the user authority management unit is responsible for processing the table authority application of the user and maintaining the table authority information of the user.
As shown in table 1 below, the admin @ startdt.com user has the creation and insertion authority of the table "safe _ test", the worker1@ startdt.com user has the partial field query authority of the table "safe _ test", the field capable of query is { id, name, religion }, and the worker2@ startdt.com user has the full field query authority of the table "safe _ test". Note that the permissions in Table 1 are the result of the user's application and approval from the table permission application module.
TABLE 1
Figure 444382DEST_PATH_IMAGE001
The table structure design of table "safe _ test" is shown in table 2 below, in which field names and field types are shown, and three fields, { idcard, name, email } need to be encrypted.
TABLE 2
Figure 498926DEST_PATH_IMAGE002
In step S2, the user sends the HQL to the HQL submission agent, and the HQL submission agent submits the HQL task. The HQL submission agent can prevent the user certificate information from being falsely used on one hand, and can block a part of illegal users on the other hand, so that network attack is avoided.
In step S3, the HQL submission agent needs to acquire the Kerberos authentication subject and keytab file of the user from the user authority management before submitting the HQL task. And the Kerberos certificate management module and the user authentication certificate management module of the user authority management unit are responsible for maintaining the Kerberos authentication subject and the Keytab file information of the user. As shown in table 3, three authentication principals are created in Kerberos credential management, the corresponding keytab file and krb5.Conf file are already placed on the physical server where the HQL submission agent is located, path information is given in the table, and linux users corresponding to the three authentication principals are also given in the table. Here, the linux user refers to a linux user on a physical host where the HQL submission agent, hive, yann, and HDFS services are located, mainly because the linux user is used as an authentication object in a big data component with "Hadoop" as a core by default.
TABLE 3
Figure 335295DEST_PATH_IMAGE003
Table 4 shows the mapping relationship between the three user accounts in table 1 and the three authentication subjects in table 3. Com user submits HQL to HQL submission agent, the Kerberos authentication subject and keytab file acquired by the agent will be simba @ start tdt.
TABLE 4
Figure 842500DEST_PATH_IMAGE004
In step S4, the HQL submitting agent conducts Kerberos authentication according to the obtained Kerberos authentication main body and the Keytab file, and submits an HQL task to Hive after the authentication is passed. After the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity. For example, the HQL submission agent uses simba @ start.com authentication principal to perform Kerberos authentication, hive, yarn, and HDFS all use the linux user simba corresponding to simba @ start.com as the current user, and the access control authority check of the key management service is also for the simba user.
In step S5, after the HQL submission agent submits the HQL task to Hive, hive generates a corresponding task according to the HQL type. For example, for table creation, hive generates a DDL create table task; for table insertion, hive generates a MapRed task; for a table query, hive generates a Fetch task.
In step S6, for the DDL form creation task, information such as serialization, serialization attribute, and form attribute needs to be specified in the HQL. The following code is for table HQL for "safe _ test" in table 2, where the serialization way, serialization attributes, and table attributes are specified.
Figure 309167DEST_PATH_IMAGE005
The following table 5 explains in detail the serialization and attributes in the above code. Wherein a key management server address, a key name, an encryption column, an encryption algorithm, and a serialization manner are specified. The serialization mode must specify com.static.digital.hive.ql.CryptoSerde realized by the invention, and other configuration parameters are all com.static.digital.hive.ql.CryptoSerde which provide necessary information in the field encryption and decryption process, and can be flexibly set according to actual conditions.
TABLE 5
Figure 93583DEST_PATH_IMAGE006
Hive converts the HQL into DDL task, and rewrites the createTable and createTableLike method in DDLTask corresponding to DDLTask in Hive Java source code, and the flow chart before and after rewriting is as shown in FIG. 2. It can be seen from the figure that, the modified Hive list creation process adds a process of creating a list key by the join key management server (steps S3-S5 on the right side), and the process can be continued only if the list key is successfully created, otherwise, the list creation process is terminated.
Fig. 3 shows a process of creating a table key from the table attributes hive. The DDL create table task first makes a create key request with simba @ STARTDT. Then, the user authority management module obtains the user authentication subject carried in the request, and judges whether the user has the authority to create the key according to the user key access authority control in the following table 6.
TABLE 6
Figure 959908DEST_PATH_IMAGE007
COM has the authority of creating the key, the request is continuously forwarded to a key management module, otherwise, the user management authority directly refuses the request, and the task of creating the table is suspended. The key management module creates a corresponding key according to the key name in the request and sets access control of the key, as shown in fig. 3, simba, tent 1, and tent 2 have access right of the key "safetest". The two-layer key access control authority design can better protect the security of the key and the flexible configurability of the key authority. Finally, if the table key creation is successful, the DDL create table task continues to execute downwards until the table creation is successful.
And in the step S6, after the table is successfully created, the metadata information of the Hive metadata database updating table is automatically connected.
In step S8, when the table is successfully created, and the user submits the table insertion and the table query HQL task, the Hive generates corresponding MapReduce and feed tasks, and finally translates the task into a MapReduce job and submits the MapReduce job to the Yarn for execution. The MapReduce job is essentially an operator tree, wherein the table reading needs to be performed through a table scanning operator, the table writing needs to be performed through a file output operator, and the data communication mode between the operators is a Row object (a type of structured data, which represents a Row of records in the table). The table scan operator is used for reading out table files stored by the HDFS in rows, in the process, file byte streams are firstly converted into Writable objects through an InputFormat object, and then the Writable objects are converted into Row objects which can be processed by the table scan operator through serialization. The file output operator is used for inserting the Row object into a table file stored in the HDFS, in the process, the Row object needs to be deserialized into a writeable object, and then the writeable object is converted into a byte stream by means of the outputFormat object and added into the table file on the HDFS.
In step S9, in both the serialization process when reading out the table and the deserialization process when writing in the table, the table key needs to be acquired from the key management service according to the attributes hive. Fig. 4 shows the get table key flow. First, the sequencer makes a get key request with tent1 @ start. Then, the user authority management module obtains the user authentication subject carried in the request, and judges whether the user has the authority for creating the key according to the user key access authority control in the table. Because tenant1@ start.com or tenant2@ start.tdt.com has the right to acquire the key, the request is continuously forwarded to the key management module, otherwise, the user management right directly rejects the request, and the key acquisition fails. And the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain. Since tenant1@ start.com or tenant2@ start.com have the right to obtain the key named "safetest", they will successfully obtain the key.
In step S10, for table lookup, serialization during table reading also requires acquiring user table field authority from user authority management. As shown in table 2 above, the word 1@ startdt.com user has a partial field query authority of the table "safe _ test", the field that can be queried is { id, name, religion }, and the word 2@ startdt.com user has a full field query authority of the table "safe _ test". worker1@ startdt.com corresponds to tenant1@ STARTDT.COM, worker2@ startdt.com corresponds to tenant2@ STARTDT.COM. Since two users have different table field query authority, the two user query tables "safetest" will obtain different query results.
In step S11, serialization converts Writable objects into Row objects that can be processed by a table scan operator. In the serialization process, based on the table key and the user table field authority acquired in the steps S9 and S10, the field data is decrypted for each column of the Writable object, and for the condition that no table key or/and field authority exists, the field data is not decrypted, is output by the bottom-layer storage content and is finally converted into a Row object. In one embodiment, the data for "safe _ test" in Table 2 is shown in Table 7 below:
TABLE 7
Figure 95354DEST_PATH_IMAGE008
Because the fields { idcard, phone and email } are encrypted fields, only users with table keys and field rights can see the real data of the three fields during table query, otherwise, only the ciphertext data of the three fields can be seen. According to the authority design of the invention, the result of the tenant1 user through the Hive client Beeline query table 'safe _ test' is as follows:
Figure 377431DEST_PATH_IMAGE009
while the tennt 2 user lookup table "safe _ test" results are as follows:
Figure 406567DEST_PATH_IMAGE010
comparing the results of the two graphs, it can be seen that, for tenant1, since it only has three field rights of { id, name, region }, three columns of data of { idcard, phone, email } in the query result are all ciphertext; and for tenant2, since it has the query right of all fields, the data of all columns in the query result is consistent with the following data, and is in a plaintext state.
Figure 584739DEST_PATH_IMAGE011
In step S12, serialization converts the Row object processed by the file output operator into a writeable object, and then converts the writeable object into a byte stream via the OutputFormat object and appends the byte stream to the table file on the HDFS. And in the deserialization process, acquiring a table key based on the step nine, encrypting the field needing to be encrypted in the table by using the table key, and finally converting the field into a writeable object. After storing the table "safe _ test" in table 7 onto the HDFS, the table file content on the HDFS is as shown by the data in the above paragraph. Three columns needing encryption are all ciphertext stores in the HDFS table file.
The invention carries out technical innovation from two aspects of efficiency and safety, provides a Hive series data encryption scheme with dynamically managed key authority, can realize 1) automatic data encryption and decryption and 2) key management based on user authority, and meets the safety requirement in the actual production environment.
The invention has four key points: 1) the technical scheme architecture design of the invention can meet the Hive rank data encryption requirement in an enterprise, 2) the invention can realize key management and rank data encryption and decryption based on user authority, and provide safer data protection, 3) the Hive table creation scheme provided by the invention can transparentize the table key creation process, and 4) the sequencer com.startdt.datablack.hive.ql.cyprotesrde provided by the invention can realize automatic field encryption and decryption based on user authority in the serialization/deserialization process.
In the description herein, references to the description of the terms "embodiment," "example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, various embodiments or examples described in this specification and features thereof may be combined or combined by those skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described, it is understood that the above embodiments are illustrative and not to be construed as limiting the present invention, and that modifications, alterations, substitutions, and alterations may be made to the above embodiments by those of ordinary skill in the art without departing from the scope of the present invention.

Claims (10)

1. A Hive series data encryption method capable of dynamically managing key authority is characterized by comprising the following steps:
s1, hive table authority application: the user applies for the user authority management unit to establish authority, the user authority management unit establishes a corresponding Kerberos authentication main body and a Keytab for the user, and synchronously applies for access control and key authority establishment for the Kerberos authentication main body of the user from the key management service, so that dynamic management of the key authority is realized;
s2, submitting HQL: a user submits a DDL statement for creating a table to an HQL submission agent unit;
s3, obtaining a user authentication certificate: the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of a user from the user authority management unit;
s4, kerberos authentication: the HQL submitting agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task: after the Kerberos authentication is passed, the DDL statement of the table is submitted to Hive as an HQL task;
s6, creating a table and a table key: hive converts the DDL statement into a DDL task, connects a key management service unit in the table building process, and creates a table key by using the table key name in the DDL statement;
s7, updating table metadata information: before the table building is finished, hive updates the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation: after the table is built, after other users submit the HQL tasks for table query and table insertion according to the steps S1-S5, hive respectively generates a Fetch task and a MapReduce task for the HQL tasks and converts the HQL tasks into MapReduce tasks to be submitted to Yarn for execution;
s9, acquiring a table key: leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before serialization;
s10, acquiring user table field authority: if the table key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decryption based on the serialized fields: the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization: the leaf nodes on the MapRecue operation operator tree inserted into the table output the corresponding file, the file output operator needs to perform deserialization operation before writing the line data into the table file, the sequencer repeats the step S9 again to obtain the table key when performing deserialization, and then the fields needing to be encrypted in the table are encrypted according to the table key.
2. The Hive-rank-data encryption method for dynamically managing key permissions according to claim 1, wherein in step S1, a user needs to apply for Hive list permissions before submitting HQL to ensure that HQL can be successfully executed; a table authority application module in the user authority management unit is responsible for processing table authority application of a user and maintaining table authority information of the user, and synchronously applying access control and creating key authority for a Kerberos authentication main body of the user to a key management service unit; because the creation and access control of the key can be dynamically configured through the user authority management unit, the dynamic management of the key authority can be realized.
3. The Hive series data encryption method for dynamically managing key authority according to claim 2, wherein in step S2, the DDL statement includes a table name, a field type, an encrypted field list, a key management server address, a table key name, an encryption algorithm, and information of serialization and deserialization manners; and the user sends the HQL to the HQL submission agent, and the HQL submission agent submits the HQL task.
4. The Hive rank data encryption method capable of dynamically managing the key authority according to claim 3, wherein in step S3, the HQL submission agent obtains the user' S Kerberos authentication subject and keytab file from the user authority management before submitting the HQL task; the Kerberos credential management module and the user authentication credential management module of the user rights management unit are responsible for maintaining these information.
5. The Hive rank data encryption method capable of dynamically managing key authority according to claim 4, wherein in step S4, the HQL submitting agent performs Kerberos authentication according to the acquired Kerberos authentication main body and Keytab file, and submits an HQL task to Hive after the authentication is passed; after the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity.
6. The Hive-series data encryption method for dynamically managing key authority according to claim 5, wherein in step S5, after the HQL submission agent submits the HQL task to Hive, hive generates a corresponding task according to the HQL type; for table creation, hive generates a DDL creation table task; for table insertion, hive generates a MapRed task; for table lookup, hive generates a Fetch task.
7. The Hive series data encryption method capable of dynamically managing key authority according to claim 6, wherein in step S6, aiming at a DDL creation table task, a serialization attribute and table attribute information are specified in an HQL, the Hive converts the creation table HQL into the DDL task, and rewrites createTable and createTableLike methods in DDLTask corresponding to DDLTask in Hive Java source code; the rewritten Hive list building process is added with the process of building the list key by connecting the key management server; the execution can be continued only if the table key is successfully created, otherwise the table creation process is terminated.
8. The Hive series data encryption method capable of dynamically managing key authority according to claim 7, wherein the metadata information of the Hive metadata database update table is automatically connected after the table creation in step S6 is successful; in step S8, when the table is successfully created, and the user submits the table insertion and the table query HQL task, hive generates corresponding MapReduce and feed tasks, and finally translates the MapReduce job into a MapReduce job and submits the MapReduce job to Yarn for execution.
9. The Hive series data encryption method for dynamically managing key authority according to claim 8, wherein in step S9, during serialization when reading out the table and deserialization when writing in the table, it is necessary to obtain the table key from the key management service according to the attributes hive.kms.uri and hive.encrypt.keyname; firstly, a sequencer provides a key acquisition request to a key management service unit; then, the user authority management module acquires a user authentication main body carried in the request, and judges whether the user has the authority for creating the key according to the user key access authority control in the table; and the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain.
10. A Hive series data encryption system with dynamically managed key permissions, the system being configured to implement the Hive series data encryption method with dynamically managed key permissions according to any one of claims 1 to 9, and the system comprising a user permission management unit, a key management service unit, and an HQL submission agent unit; the user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module; the table authority application module is responsible for processing a table authority application request of a user and maintaining table authority information of the user; the Kerberos voucher management module is responsible for managing Kerberos keytab files of the access users; the key management service unit comprises a user access authority management module and a key management module; the user access authority management module manages access control of a Kerberos authentication main body corresponding to a user; for a Kerberos authentication subject with access control authority, a key management module manages the security of the authentication subject for creating and acquiring keys; the HQL submission agent unit is responsible for receiving HQL submitted by a user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
CN202211085531.6A 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority Active CN115146245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211085531.6A CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211085531.6A CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Publications (2)

Publication Number Publication Date
CN115146245A true CN115146245A (en) 2022-10-04
CN115146245B CN115146245B (en) 2022-11-18

Family

ID=83416156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211085531.6A Active CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Country Status (1)

Country Link
CN (1) CN115146245B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656903A (en) * 2016-01-15 2016-06-08 国家计算机网络与信息安全管理中心 Hive platform user safety management system and application
CN109995712A (en) * 2017-12-29 2019-07-09 中国移动通信集团湖北有限公司 Data encryption/decryption method, device, equipment and medium
CN112364377A (en) * 2020-11-11 2021-02-12 国网山东省电力公司电力科学研究院 Data classification and classification safety protection system suitable for power industry
CN113204776A (en) * 2021-04-30 2021-08-03 新华三大数据技术有限公司 Method, device, equipment and storage medium for realizing column encryption
US20210279357A1 (en) * 2020-03-04 2021-09-09 Sotero, Inc. System and methods for data encryption and application-agnostic querying of encrypted data
CN113590651A (en) * 2021-08-18 2021-11-02 四川新网银行股份有限公司 Cross-cluster data processing system and method based on HQL
CN114036126A (en) * 2021-10-20 2022-02-11 方盈金泰科技(北京)有限公司 Big data Hive transparent encryption and decryption method and system based on syntax parse tree

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656903A (en) * 2016-01-15 2016-06-08 国家计算机网络与信息安全管理中心 Hive platform user safety management system and application
CN109995712A (en) * 2017-12-29 2019-07-09 中国移动通信集团湖北有限公司 Data encryption/decryption method, device, equipment and medium
US20210279357A1 (en) * 2020-03-04 2021-09-09 Sotero, Inc. System and methods for data encryption and application-agnostic querying of encrypted data
CN112364377A (en) * 2020-11-11 2021-02-12 国网山东省电力公司电力科学研究院 Data classification and classification safety protection system suitable for power industry
CN113204776A (en) * 2021-04-30 2021-08-03 新华三大数据技术有限公司 Method, device, equipment and storage medium for realizing column encryption
CN113590651A (en) * 2021-08-18 2021-11-02 四川新网银行股份有限公司 Cross-cluster data processing system and method based on HQL
CN114036126A (en) * 2021-10-20 2022-02-11 方盈金泰科技(北京)有限公司 Big data Hive transparent encryption and decryption method and system based on syntax parse tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANTONOPOULOS,P ET AL.: "Azure SQL Database Always Encrypted", 《WEB OF SCIENCE》 *
SULTAN ALMAKDI,ET AL.: "Secure and Efficient Query Processing Technique for Encrypted Databases in Cloud", 《 2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS)》 *
金伟等: "支持高并发的Hadoop高性能加密方法研究", 《通信学报》 *

Also Published As

Publication number Publication date
CN115146245B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
WO2022126968A1 (en) Micro-service access method, apparatus and device, and storage medium
JP3426091B2 (en) System that provides password synchronization
US5838903A (en) Configurable password integrity servers for use in a shared resource environment
US5862323A (en) Retrieving plain-text passwords from a main registry by a plurality of foreign registries
JP2024099634A (en) Transferring and storing encrypted user data
EP1680727B1 (en) Distributed document version control
US7440962B1 (en) Method and system for management of access information
US7890643B2 (en) System and method for providing program credentials
US8549326B2 (en) Method and system for extending encrypting file system
US8116456B2 (en) Techniques for managing heterogeneous key stores
US7178163B2 (en) Cross platform network authentication and authorization model
US8959613B2 (en) System and method for managing access to a plurality of servers in an organization
JPH04310188A (en) Library service method for document/image library
US20130239230A1 (en) Document access auditing
US20040250098A1 (en) Desktop database data administration tool with row level security
US20080133617A1 (en) Change log handler for synchronzing data sources
US20110191858A1 (en) Offline access in a document control system
US20130212707A1 (en) Document control system
US6339827B1 (en) Method for securing sensitive data in a LDAP directory service utilizing a client and/or server control
CN105516059B (en) A kind of resource access control method and device
KR20110060674A (en) Method and apparetus for encoding/decoding partial of data and method for using the data
US11394542B2 (en) Deauthorization of private key of decentralized identity
Martinelli et al. Identity, authentication, and access management in openstack: implementing and deploying keystone
KR100218623B1 (en) Network system server
CN115146245B (en) Hive series data encryption method and system with dynamically managed key authority

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant