CN115146245B - Hive series data encryption method and system with dynamically managed key authority - Google Patents

Hive series data encryption method and system with dynamically managed key authority Download PDF

Info

Publication number
CN115146245B
CN115146245B CN202211085531.6A CN202211085531A CN115146245B CN 115146245 B CN115146245 B CN 115146245B CN 202211085531 A CN202211085531 A CN 202211085531A CN 115146245 B CN115146245 B CN 115146245B
Authority
CN
China
Prior art keywords
key
user
hive
authority
hql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211085531.6A
Other languages
Chinese (zh)
Other versions
CN115146245A (en
Inventor
卢薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bizhi Technology Co ltd
Original Assignee
Hangzhou Bizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bizhi Technology Co ltd filed Critical Hangzhou Bizhi Technology Co ltd
Priority to CN202211085531.6A priority Critical patent/CN115146245B/en
Publication of CN115146245A publication Critical patent/CN115146245A/en
Application granted granted Critical
Publication of CN115146245B publication Critical patent/CN115146245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/06Network architectures or network communication protocols for network security for supporting key management in a packet data network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords

Abstract

The invention discloses a Hive series data encryption method and a Hive series data encryption system with dynamically managed key authorities, wherein the method comprises the following steps: s1, applying for hive table authority; s2, submitting HQL; s3, acquiring a user authentication certificate; s4, kerberos authentication; s5, submitting an HQL task; s6, creating a table and a table key; s7, updating table metadata information; s8, submitting MapReduce operation; s9, acquiring a table key; s10, acquiring user table field authority; s11, field decryption based on serialization; and S12, field encryption based on deserialization. The Hive series data encryption scheme with dynamically managed key authorities can realize automatic data encryption and decryption and key management based on user authorities, meet the safety requirements in the actual production environment, and improve the encryption efficiency and the data safety.

Description

Hive series data encryption method and system with dynamically managed key authority
Technical Field
The invention relates to the field of computer communication and data security, in particular to a Hive series data encryption method and system with dynamically managed key authorities.
Background
With the rapid increase of the data volume of enterprises from TB level to PB level, a big data assembly with Hadoop as a core becomes a key technology for building a modern data warehouse and a data analysis platform. Hive is an indispensable tool, mainly because it provides an SQL (Structured Query Language) dialect (called HiveQL or HQL) that can Query mass data stored in a Hadoop Distributed File System (HDFS) and an HBase (a non-relational database System based on column storage), thereby reducing a barrier to migrating a traditional data warehouse implemented by an SQL-based relational database to Hadoop. Unlike traditional relational databases, which already provide a sophisticated set of data security solutions, hive solves the data security problem in an open manner. Although Hive provides authority control based on roles, hive only provides an open interface for enterprise P0 level data security requirements of fine-grained authority control, desensitization, encryption and the like of data, and a specific solution needs to be realized by enterprises as required. Therefore, it becomes a technical problem to be solved by many enterprises to realize Hive rank data encryption with dynamically managed key authority.
Data encryption can guarantee data storage safety, and data decryption based on authority can guarantee data access safety. In most production scenarios, hive data will be stored in the HDFS. For data storage security, the HDFS provides a transparent encryption function, which is a file-level data encryption scheme. By "transparent encryption" is meant that the encryption and decryption of the file is transparent and imperceptible to the HDFS client. Therefore, the transparent encryption of the HDFS can only prevent an illegal user from destroying the confidentiality of data by stealing a disk and the like, but cannot provide data access security guarantee based on user rights for the Hive upper-layer application program. In addition, compared with data encryption at a file level, most enterprises prefer to implement column-level (or field-level) encryption on table data in Hive, that is, only a part of secret-related columns in the table are encrypted, so as to alleviate the negative impact on business performance caused by data encryption and decryption. At present, hive column-level encryption commonly used by enterprises is mostly realized based on Hive custom Functions (UDFs). They first implement encryption and decryption functions using the UDF interface provided by Hive, and then implement encryption or decryption of table fields in Hive by binding encryption or decryption functions over the encryption column.
Existing data encryption schemes suffer from two types of problems: 1) Encryption and decryption functions must be added manually in the process of writing the service HQL, which affects the service development efficiency on one hand, and destroys "select" on the other hand, so that the MapReduce task can not be triggered by the "select" originally; 2) The key is stored in the encryption and decryption functions, cannot be dynamically acquired, is easy to crack, and has great potential safety hazard. Therefore, the invention makes up the defects of the existing Hive column-level data encryption scheme in the aspects of efficiency and safety by realizing the technical innovation of 1) automatic data encryption and decryption and 2) key management based on user authority.
As described above, in the existing Hive data encryption scheme, one part is implemented based on HDFS transparent encryption, and the other part is implemented based on Hive custom function. The first scheme can only encrypt files of the HDFS layer, and once the data is obtained by the HDFS client, the data is in a plaintext state, so that user-level data confidentiality protection cannot be provided on the Hive upper application program layer. This encryption scheme can only be used as an underlying secondary scheme for Hive data encryption. The second scheme requires manual participation in 1) custom encryption and decryption of UDF function development, 2) registration of UDF functions as Hive permanent functions, and 3) binding of encryption and decryption of UDF functions for secret-related fields in the service HQL. The second scheme can realize Hive-rank-level data encryption, but has obvious defects in both efficiency and safety. In the aspect of efficiency, manual participation is needed for encrypting and decrypting the UDF, the development and execution efficiency of the HQL is influenced, and automatic data encryption and decryption cannot be achieved. In the aspect of safety, the secret key is embedded in the encryption and decryption functions, so that the secret key is easy to crack, and great potential safety hazards exist.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to carry out technical innovation from two aspects of efficiency and safety, provides a Hive series data encryption scheme with dynamically managed key authority, can realize 1) automatic data encryption and decryption and 2) key management based on user authority, and meets the safety requirement in the actual production environment.
In order to achieve the above object, the present invention provides a Hive rank data encryption method with dynamically managed key authority, which includes the following steps:
s1, hive table authority application: the method comprises the steps that a user applies for a table to establish authority to a user authority management unit, the user authority management unit establishes a corresponding Kerberos authentication main body and a Keytab for the user, and synchronously applies for access control and key authority establishment for the Kerberos authentication main body of the user to a key management service, and the steps correspond to dynamic key authority management;
s2, submitting HQL: a user submits a DDL statement for creating a table to an HQL submission agent unit;
s3, obtaining a user authentication certificate: the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of the user from the user authority management unit;
s4, kerberos authentication: the HQL submitting agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task: after the Kerberos authentication is passed, the DDL statement of the table is submitted to Hive as an HQL task;
s6, creating a table and a table key: hive converts the DDL statement into a DDL task, connects a key management service unit in the table building process, and creates a table key by using the table key name in the DDL statement;
s7, updating table metadata information: before the table building is finished, hive updates the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation: after the table is built, after other users submit the table query and the HQL task inserted in the table according to the steps S1-S5, hive respectively generates a feed task and a MapRed task for the HQL task, converts the HQL task into a MapReduce task and submits the MapReduce task to horn for execution;
s9, acquiring a table key: leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before serialization;
s10, acquiring user table field authority: if the table key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decryption based on the serialized fields: the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization: and (3) outputting the corresponding file to a leaf node on the MapRecuce operation operator tree inserted into the table by the aid of the leaf node, performing deserialization operation by the file output operator before the line data is written into the table file, repeating the step (S9) by the serializer when deserializing is performed, acquiring the table key, and encrypting the field to be encrypted in the table according to the table key.
Further, in the step S1, a user needs to apply for Hive table authority before submitting the HQL, so as to ensure that the HQL can be successfully executed; and the table authority application module in the user authority management unit is responsible for processing the table authority application of the user and maintaining the table authority information of the user.
Further, in step S2, the DDL statement includes a table name, a field type, an encryption field list, a key management server address, a table key name, an encryption algorithm, and serialization and deserialization mode information; and the user sends the HQL to the HQL submission agent, and the HQL submission agent submits the HQL task.
Further, in step S3, before submitting the HQL task, the HQL submission agent obtains a Kerberos authentication subject and a keytab file of the user from the user authority management; the Kerberos credential management module and the user authentication credential management module of the user rights management unit are responsible for maintaining these information.
Further, in step S4, the HQL submitting agent carries out Kerberos authentication according to the obtained Kerberos authentication main body and the Keytab file, and submits an HQL task to Hive after the authentication is passed; after the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity.
Further, in step S5, after the HQL submission agent submits the HQL task to Hive, hive generates a corresponding task according to the HQL type; for table creation, hive generates a DDL creation table task; for table insertion, hive generates a MapRed task; for table lookup, hive generates a Fetch task.
Further, in step S6, listing, serialization attribute and table attribute information are specified in the HQL for the DDL create table task. The Hive converts the HQL into a DDL task, and rewrites createTable and createTableLike methods in DDLTask corresponding to DDLTask in Hive Java source codes; the rewritten Hive list building process is added with the process of building the list key by connecting the key management server; only if the table key is successfully established, the table key can be continuously executed, otherwise, the table establishing process is stopped; and rewriting the creation table of the DDL task in the Hive source code, so that the table automatically creates a table key to the key management service unit in the creation process, and updates table metadata information in the Hive metadata base, wherein fields needing desensitization are maintained in the metadata information.
Further, in step S6, after the table is successfully created, the metadata information of the Hive metadata database update table is automatically connected; in step S8, when the table is successfully created, and the user submits the table insertion and the table query HQL task, hive generates corresponding MapReduce and feed tasks, and finally translates the MapReduce job into a MapReduce job and submits the MapReduce job to Yarn for execution.
Further, in step S9, in both the serialization process when reading out the table and the deserialization process when writing in the table, the table key needs to be acquired from the key management service according to the attribute hive.kms.uri and hive.encrypt.keyname; firstly, a sequencer provides a key acquisition request to a key management service unit; then, the user authority management module acquires a user authentication main body carried in the request, and judges whether the user has the authority to create the key according to the user key access authority control in the table; the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain;
on the other hand, the invention also provides a Hive-rank-level data encryption system with dynamically managed key authorities, which is used for realizing the Hive-rank-level data encryption method with dynamically managed key authorities, and comprises a user authority management unit, a key management service unit and an HQL submission agent unit; the user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module; the table authority application module is responsible for processing a table authority application request of a user and maintaining table authority information of the user; the Kerberos voucher management module is responsible for managing Kerberos keytab files of the access users; the key management service unit comprises a user access authority management module and a key management module; the user access authority management module manages access control of a Kerberos authentication main body corresponding to a user; for a Kerberos authentication subject with access control authority, a key management module manages the security of the authentication subject for creating and acquiring keys; the HQL submission agent unit is responsible for receiving HQL submitted by a user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
The technical scheme of the invention sets interactive logic and functional responsibility among user authority management, key management service, HQL submission agent and serialization/deserialization in the architectural design; a Hive creates a table scheme, a sequencer that automates field encryption and decryption based on user permissions. The invention has the beneficial effects that: 1) The technical scheme of the invention can meet the Hive rank data encryption requirement in an enterprise; 2) The invention can realize key management and column data encryption and decryption based on user authority, and provides safer data protection; 3) The Hive table creation scheme provided by the invention can transparentize the table key creation process; 4) The sequencer com.startdt.datablack.hive.ql.CyptoSerde can realize automatic field encryption and decryption based on user rights in the serialization/deserialization process.
Drawings
Fig. 1 is a schematic diagram illustrating a design architecture of a Hive rank data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating comparison before and after a Hive list creation process is modified in a Hive-rank data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a process of creating a key in a Hive-rank data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention;
fig. 4 shows a flowchart for acquiring a Hive table key in a Hive rank-level data encryption method and system with dynamically manageable key permissions according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The following describes in detail a specific embodiment of the present invention with reference to fig. 1 to 4. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Fig. 1 shows a general technical scheme architecture design diagram of a Hive rank-level data encryption method and system with dynamically-managed key authority according to the invention. Based on Hive, yarn and HDFS, the encryption system comprises a user right management unit, a key management service unit, an HQL submission agent unit, and functions of creating and rewriting a table, encrypting and decrypting fields based on serialization/deserialization and the like.
In the Hive series data encryption system with dynamically managed key authorities, a user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module. The table authority application module is responsible for processing the table authority application request of the user and maintaining the table authority information of the user. The Kerberos voucher management module is responsible for managing Kerberos keyab files of access users, each access user has a corresponding Kerberos authentication subject (Principal), and each authentication subject corresponds to a Keytab file. When Kerberos authentication is carried out, an authentication subject and a keytab file need to be provided at the same time. The user authentication credential management module maintains a mapping between the user and the Kerberos credential. A Key Management Service (KMS) unit comprises a user access authority Management module and a Key Management module. And the user access authority management module manages the access control of a Kerberos authentication main body corresponding to the user. For Kerberos authentication principals with access control rights, a key management module manages the security of these authentication principals to create and obtain keys. The HQL submission agent unit is responsible for receiving HQL submitted by a user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
The table creation rewrite module is an improved technique of the present invention. The invention rewrites the creation table of the DDL task in the Hive source code, so that the table automatically creates the table key to the key management service unit in the creation process, and updates the table metadata information in the Hive metadata base, wherein the metadata information maintains the fields needing desensitization, the key management server address, the table key name, the encryption algorithm and the serialization and deserialization modes. Field encryption and decryption based on serialization and deserialization is another improved technique of the present invention. The invention provides specific implementations of serialization and deserialization.
When the HQL submission agent unit inserts and queries tasks into the Hive submission table, the Hive generates corresponding MapRed and Fetch tasks, and the tasks are converted into MapReduce jobs to be submitted to the horn to be executed. The MapReduce job is essentially an operator tree, wherein the table is read by a table scanning operator, the table is written by a file output operator, the data communication mode between the operators is a Row object, and the Row object is structured data and represents a Row of records in the table. The table scanning operator is used for reading the table files stored by the HDFS in rows, and the implementation process is as follows:
firstly, converting a file byte stream into a Writable object by means of an InputFormat object;
serialization then converts the writeable objects into Row objects that the table scan operator can handle.
Wherein, the serialization process is as follows:
firstly, acquiring user information of a current session;
then, according to the user information, obtaining the table key from the key management service unit and obtaining the user table field authority from the user authority management unit;
and finally, decrypting the field data of each column of the Writable object according to the acquired table key and the user field authority, and outputting the field data in a ciphertext to finally convert the field data into a Row object without decrypting the field data under the condition of no table key and/or field authority.
The file output operator is used for inserting the Row objects into a table file stored in the HDFS, in the process, the Row objects are required to be deserialized into writeable objects, and then the writeable objects are converted into byte streams to be added into the table file on the HDFS through the OutputFormat objects.
The deserialization flow is as follows:
firstly, acquiring user information of a current session;
then, acquiring a table key from the key management service unit according to the table key name according to the user information;
and then, encrypting the fields needing encryption in the table by using the table key, and finally converting the fields into a writeable object.
In summary, in the technical scheme of the present invention, the Hive-series data encryption method with dynamically manageable key permissions includes the following steps:
s1, applying for Hive table authority; a user applies for a table of a user authority management unit to establish authority, the user authority management unit establishes a corresponding Kerberos authentication main body and a Keytab for the user, and synchronously applies for access control and key authority establishment for the Kerberos authentication main body of the user from a key management service, and the step corresponds to dynamic key authority management;
s2, submitting HQL; a user submits DDL statements for creating a table to an HQL submission agent unit, wherein the DDL statements comprise information such as a table name, a field type, an encryption field list, a key management server address, a table key name, an encryption algorithm, a serialization/deserialization mode and the like;
s3, obtaining a user authentication certificate; the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of the user from the user authority management unit;
s4, kerberos authentication; the HQL submission agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task; after the Kerberos authentication is passed, the DDL statement of the built table can be submitted to Hive as an HQL task;
s6, creating a table and a table key; the Hive converts the DDL statement into a DDL task, and the DDL task is connected with a key management service unit in the table building process and creates a table key by using a table key name in the DDL statement;
s7, updating table metadata information; before the table building is finished, the Hive also updates the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation; after the table is built, after other users submit the table query and the HQL task inserted by the table according to the steps S1-S5, hive respectively generates a Fetch task and a MapRed task for the HQL task and converts the HQL task into a MapReduce task to be submitted to Yarn for execution;
s9, acquiring a table key; leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before serialization;
s10, acquiring user table field authority; if the key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decrypting based on the serialized fields; the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization; the leaf nodes on the MapRecue operation operator tree inserted into the table output the corresponding file, the file output operator needs to perform deserialization operation before writing the line data into the table file, the sequencer also needs to acquire the table key through the step S9 when performing deserialization, and then the fields needing to be encrypted in the table are encrypted according to the table key.
The detailed technical content in each step is as follows:
in step S1, before submitting the HQL, the user needs to apply for Hive table authority to ensure that the HQL can be successfully executed. Three rights will be exemplified here for creation of tables, table look-up and table insertion in connection with field encryption and decryption. And the table authority application module in the user authority management unit is responsible for processing the table authority application of the user and maintaining the table authority information of the user.
As shown in table 1 below, an admin @ startdt.com user has a creation and insertion authority of a table "safe _ test", a worker1@ startdt.com user has a partial field query authority of a table "safe _ test", a field capable of querying is { id, name, region }, and a worker2@ startdt.com user has a full field query authority of a table "safe _ test". Note that the permissions in Table 1 are the result of the user applying and approving to the table permission application module.
TABLE 1
Figure 444382DEST_PATH_IMAGE001
The table structure design of table "safe _ test" is shown in table 2 below, in which field names and field types are shown, and three fields, { idcard, name, email } need to be encrypted.
TABLE 2
Figure 498926DEST_PATH_IMAGE002
In step S2, the user sends the HQL to the HQL submission agent, and the HQL submission agent submits an HQL task. The HQL submission agent can prevent the falsifying of the user certificate information on one hand, and can block part of illegal users on the other hand to avoid network attacks.
In step S3, the HQL submission agent needs to acquire the Kerberos authentication subject and keytab file of the user from the user authority management before submitting the HQL task. And the Kerberos certificate management module and the user authentication certificate management module of the user authority management unit are responsible for maintaining the Kerberos authentication subject and the Keytab file information of the user. As shown in table 3, three authentication principals are created in the Kerberos credential management, the corresponding keytab file and krb5.Conf file are already placed on the physical server where the HQL submission agent is located, path information is given in the table, and linux users corresponding to the three authentication principals are also given in the table. Here, the linux user refers to a linux user on a physical host where the HQL submission agent, hive, yann, and HDFS services are located, mainly because the linux user is used as an authentication object in a big data component with "Hadoop" as a core by default.
TABLE 3
Figure 335295DEST_PATH_IMAGE003
Table 4 shows the mapping relationship between the three user accounts in table 1 and the three authentication subjects in table 3. Com user submits HQL to HQL submission agent, the Kerberos authentication subject and keytab file obtained by the agent will be simba @ startdt. Com and/opt/startdt/keytab/simba. Keytab, respectively.
TABLE 4
Figure 842500DEST_PATH_IMAGE004
In step S4, the HQL submitting agent conducts Kerberos authentication according to the obtained Kerberos authentication main body and the Keytab file, and submits an HQL task to Hive after the authentication is passed. After the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity. For example, the HQL submission agent uses simba @ start.com authentication principal to perform Kerberos authentication, hive, yarn, and HDFS all use the linux user simba corresponding to simba @ start.com as the current user, and the access control authority check of the key management service is also for the simba user.
In step S5, after the HQL submission agent submits the HQL task to Hive, hive generates a corresponding task according to the HQL type. For example, for table creation, hive generates a DDL create table task; for table insertion, hive generates a MapRed task; for a table query, hive generates a Fetch task.
In step S6, for the DDL form creation task, information such as serialization, serialization attribute, and form attribute needs to be specified in the HQL. The following code is for table HQL for "safe _ test" in table 2, where serialization, serialization attributes, and table attributes are specified.
Figure 309167DEST_PATH_IMAGE005
The following table 5 explains in detail the serialization and attributes in the above code. Wherein a key management server address, a key name, an encryption column, an encryption algorithm, and a serialization manner are specified. The serialization mode must specify com.
TABLE 5
Figure 93583DEST_PATH_IMAGE006
The Hive converts the creation table HQL into a DDL task, rewrites createTable and createTableLike methods in DDLTask corresponding to DDLTask in Hive Java source code, and the flow chart before and after rewriting is shown in figure 2. As can be seen from the figure, the modified Hive table creation process adds a process of creating a table key by the join key management server (steps S3 to S5 on the right), and the process can be continued only if the table key is successfully created, otherwise, the table creation process is terminated.
Fig. 3 shows a process of creating a table key from the table attributes hive. First, the DDL create table task makes a create key request with simba @ start tdt. Then, the user authority management module obtains the user authentication subject carried in the request, and judges whether the user has the authority to create the key according to the user key access authority control in the following table 6.
TABLE 6
Figure 959908DEST_PATH_IMAGE007
COM has the authority of creating the key, and the request is forwarded to the key management module continuously, otherwise, the user management authority directly refuses the request, and the task of creating the table is suspended. The key management module creates a corresponding key according to the key name in the request and sets access control of the key, as shown in fig. 3, simba, tent 1, and tent 2 have access to the key "safe". The two-layer key access control authority design can better protect the security of the key and the flexible configurability of the key authority. Finally, if the table key creation is successful, the DDL create table task continues to execute downwards until the table creation is successful.
And in the step S6, after the table is successfully created, the metadata information of the Hive metadata database updating table is automatically connected.
In step S8, after the table is successfully created, when the user submits the table insertion and the table query HQL task, the Hive generates corresponding MapRed and Fetch tasks, and finally translates the tasks into MapReduce jobs and submits the MapReduce jobs to the horn for execution. The MapReduce job is essentially an operator tree, wherein the table reading needs to be performed through a table scanning operator, the table writing needs to be performed through a file output operator, and the data communication mode between the operators is a Row object (a type of structured data, which represents a Row of records in the table). The table scan operator is used for reading out table files stored by the HDFS in rows, in the process, file byte streams are converted into writeable objects through the InputFormat objects firstly, and then the writeable objects are converted into Row objects which can be processed by the table scan operator through serialization. The file output operator is used for inserting the Row object into a table file stored in the HDFS, in the process, the Row object needs to be deserialized into a writeable object, and then the writeable object is converted into a byte stream by means of the outputFormat object and added into the table file on the HDFS.
In step S9, in both the serialization process when reading out the table and the deserialization process when writing in the table, the table key needs to be acquired from the key management service according to the attributes hive. Fig. 4 shows the get table key flow. First, the sequencer makes a get key request with tent1 @ start. Then, the user authority management module obtains the user authentication main body carried in the request, and judges whether the user has the authority of creating the key according to the user key access authority control in the table. Because tenant1@ start.com or tenant2@ start.com has the right to acquire the key, the request is forwarded to the key management module continuously, otherwise, the user management right directly refuses the request, and the key acquisition fails. And the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain. Since tenant1@ start.com or tenant2@ start.com have the right to obtain the key named "safetest", they will successfully obtain the key.
In step S10, for table lookup, serialization during table reading also requires acquiring user table field authority from user authority management. As shown in table 2 above, the word 1@ startdt.com user has a partial field query authority of the table "safe _ test", the field that can be queried is { id, name, religion }, and the word 2@ startdt.com user has a full field query authority of the table "safe _ test". worker1@ startdt.com corresponds to tenant1@ start.com, worker2@ startdt.com corresponds to tenant2@ start.com. Since two users have different table field query authority, the two user query tables "safetest" will obtain different query results.
In step S11, serialization converts the Writable object into a Row object that can be processed by the table scan operator. In the serialization process, based on the table key and the user table field authority acquired in the steps S9 and S10, field data decryption is carried out on each column of the Writable object, and for the case that no table key or/and field authority exists, the field data is not decrypted, output is carried out by using the bottom-layer storage content, and finally converted into a Row object. In one embodiment, the data for "safe _ test" in Table 2 is shown in Table 7 below:
TABLE 7
Figure 95354DEST_PATH_IMAGE008
Because the fields { idcard, phone and email } are encrypted fields, only users with table keys and field rights can see the real data of the three fields during table query, otherwise, only the ciphertext data of the three fields can be seen. According to the permission design of the invention, the result of the tenant1 user through the Hive client Beeline query table "safe _ test" is as follows:
Figure 377431DEST_PATH_IMAGE009
while the tennt 2 user lookup table "safe _ test" results are as follows:
Figure 406567DEST_PATH_IMAGE010
comparing the results of the two graphs, it can be seen that, for tenant1, since it only has three field rights of { id, name, region }, three columns of data of { idcard, phone, email } in the query result are all ciphertext; and for tenant2, since it has the query right of all fields, the data of all columns in the query result is consistent with the following data, and is in a plaintext state.
Figure 584739DEST_PATH_IMAGE011
In step S12, serialization converts the Row object processed by the file output operator into a writeable object, and then converts the writeable object into a byte stream via the OutputFormat object and appends the byte stream to the table file on the HDFS. And in the deserialization process, acquiring a table key based on the step nine, encrypting the field needing to be encrypted in the table by using the table key, and finally converting the field into a writeable object. After storing the table "safe _ test" in table 7 onto the HDFS, the table file content on the HDFS is as shown by the data in the above paragraph. Three columns needing encryption are all ciphertext stores in the HDFS table file.
The invention carries out technical innovation from two aspects of efficiency and safety, provides a Hive series data encryption scheme with dynamically managed key authority, can realize 1) automatic data encryption and decryption and 2) key management based on user authority, and meets the safety requirement in the actual production environment.
The invention has four key points: 1) the technical scheme architecture design of the invention can meet the Hive rank data encryption requirement in an enterprise, 2) the invention can realize key management and rank data encryption and decryption based on user authority, and provide safer data protection, 3) the Hive table creation scheme provided by the invention can transparentize the table key creation process, and 4) the sequencer com.startdt.datablack.hive.ql.cyprotesrde provided by the invention can realize automatic field encryption and decryption based on user authority in the serialization/deserialization process.
In the description herein, references to the description of the terms "embodiment," "example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, various embodiments or examples described in this specification and features thereof may be combined or combined by those skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described, it is understood that the above embodiments are illustrative and not to be construed as limiting the present invention, and that modifications, alterations, substitutions, and alterations may be made to the above embodiments by those of ordinary skill in the art without departing from the scope of the present invention.

Claims (10)

1. A Hive series data encryption method capable of dynamically managing key authority is characterized by comprising the following steps:
s1, hive table authority application: a user applies for a table to establish authority to a user authority management unit, the user authority management unit establishes a corresponding Kerberos authentication subject and a Keytab for the user, and synchronously applies for access control and key authority for the Kerberos authentication subject of the user to a key management service to realize dynamic management of the key authority;
s2, submitting HQL: a user submits a DDL statement for creating a table to an HQL submission agent unit;
s3, obtaining a user authentication certificate: the HQL submission agent unit acquires a Kerberos authentication main body and a Keytab file of a user from the user authority management unit;
s4, kerberos authentication: the HQL submitting agent unit carries out Kerberos authentication according to a Kerberos authentication main body and a Keytab file of the user;
s5, submitting an HQL task: after the Kerberos authentication is passed, the DDL statement of the table is established and submitted to Hive as an HQL task;
s6, creating a table and a table key: the Hive converts the DDL statement into a DDL task, connects a key management service unit in the table building process and creates a table key by using a table key name in the DDL statement;
s7, updating table metadata information: before the table building is finished, hive updates the metadata information of the table to a Hive metadata base;
s8, submitting MapReduce operation: after the table is built, after other users submit the table query and the HQL task inserted in the table according to the steps S1-S5, hive respectively generates a feed task and a MapRed task for the HQL task, converts the HQL task into a MapReduce task and submits the MapReduce task to horn for execution;
s9, acquiring a table key: leaf nodes on a MapReduce operation operator tree for table query correspond to table scanning operators, the table scanning operators need to perform serialization operation when reading row data of a table, and a sequencer acquires a table key from a key management service according to user information and a table key name before serialization;
s10, acquiring user table field authority: if the table key is successfully acquired, the sequencer acquires the field authority of the user from the user authority management;
s11, decryption based on serialized fields: the sequencer only decrypts the fields with the secret keys and the authorities, and the fields are directly transmitted according to the stored contents under other conditions without processing;
s12, field encryption based on deserialization: and (3) outputting the corresponding file to a leaf node on the MapRecuce operation operator tree inserted into the table by the aid of the leaf node, performing deserialization operation by the file output operator before the line data is written into the table file, repeating the step (S9) by the serializer when deserializing is performed, acquiring the table key, and encrypting the field to be encrypted in the table according to the table key.
2. The Hive-rank-data encryption method for dynamically managing key permissions according to claim 1, wherein in step S1, a user needs to apply for Hive list permissions before submitting HQL to ensure that HQL can be successfully executed; a table authority application module in the user authority management unit is responsible for processing table authority application of a user and maintaining table authority information of the user, and synchronously applies access control and key authority creation for a Kerberos authentication main body of the user to a key management service unit; because the creation and access control of the key can be dynamically configured through the user authority management unit, the dynamic management of the key authority can be realized.
3. The Hive series data encryption method for dynamically managing key authority according to claim 2, wherein in step S2, the DDL statement includes a table name, a field type, an encrypted field list, a key management server address, a table key name, an encryption algorithm, and information of serialization and deserialization modes; and the user sends the HQL to the HQL submission agent, and the HQL submission agent submits the HQL task.
4. The Hive rank data encryption method capable of dynamically managing the key authority according to claim 3, wherein in step S3, the HQL submission agent obtains the user' S Kerberos authentication subject and keytab file from the user authority management before submitting the HQL task; the Kerberos credential management module and the user authentication credential management module of the user rights management unit are responsible for maintaining these information.
5. The Hive series data encryption method capable of dynamically managing key authority according to claim 4, wherein in step S4, an HQL submission agent performs Kerberos authentication according to an obtained Kerberos authentication main body and a Keytab file, and submits an HQL task to Hive after the authentication is passed; after the Kerberos authentication is passed, the user information maintained in the Hive session is the information corresponding to the Kerberos authentication entity.
6. The Hive series data encryption method capable of dynamically managing key authority according to claim 5, wherein in step S5, after an HQL submission agent submits an HQL task to Hive, hive generates a corresponding task according to an HQL type; aiming at table creation, hive generates a DDL table creation task; for table insertion, hive generates a MapRed task; for table lookup, hive generates Fetch tasks.
7. The Hive series data encryption method capable of dynamically managing key authority according to claim 6, wherein in step S6, aiming at a DDL creation table task, a serialization attribute and table attribute information are specified in an HQL, the Hive converts the creation table HQL into the DDL task, and rewrites createTable and createTableLike methods in DDLTask corresponding to DDLTask in Hive Java source code; the rewritten Hive list building process is added with the process of building the list key by connecting the key management server; execution can continue only if the table key is successfully created, otherwise the table creation process is aborted.
8. The Hive series data encryption method capable of dynamically managing key authority according to claim 7, wherein the metadata information of the Hive metadata database update table is automatically connected after the table creation in step S6 is successful; in step S8, when the table is successfully created, and the user submits the table insertion and the table query HQL task, hive generates corresponding MapReduce and feed tasks, and finally translates the MapReduce job into a MapReduce job and submits the MapReduce job to Yarn for execution.
9. The Hive series data encryption method capable of dynamically managing key authority according to claim 8, wherein in step S9, in the serialization process when reading out the table and the deserialization process when writing in the table, the table key is acquired from the key management service according to the attributes hive.kms.uri and hive.encrypt.keyname; firstly, a sequencer provides a key acquisition request to a key management service unit; then, the user authority management module acquires a user authentication main body carried in the request, and judges whether the user has the authority to create the key according to the user key access authority control in the table; and the key management module checks whether the user has the key access authority or not according to the key name in the request, if so, the key is successfully obtained, otherwise, the key is failed to obtain.
10. A Hive series data encryption system with dynamically managed key permissions, the system being configured to implement the Hive series data encryption method with dynamically managed key permissions according to any one of claims 1 to 9, and the system comprising a user permission management unit, a key management service unit, and an HQL submission agent unit; the user authority management unit comprises three modules, namely a table authority application module, a Kerberos certificate management module and a user authentication certificate management module; the table authority application module is responsible for processing a table authority application request of a user and maintaining table authority information of the user; the Kerberos voucher management module is responsible for managing Kerberos keytab files of the access users; the key management service unit comprises a user access authority management module and a key management module; the user access authority management module manages access control of a Kerberos authentication main body corresponding to a user; for a Kerberos authentication agent with access control authority, a key management module manages the security of the authentication agent for creating and acquiring keys; the HQL submission agent unit is responsible for receiving HQL submitted by a user, then a Kerberos authentication main body and a Keytab file of the user are obtained from the user authority management unit, kerberos authentication is carried out, and after the authentication is passed, the HQL task is submitted to Hive to be executed.
CN202211085531.6A 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority Active CN115146245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211085531.6A CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211085531.6A CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Publications (2)

Publication Number Publication Date
CN115146245A CN115146245A (en) 2022-10-04
CN115146245B true CN115146245B (en) 2022-11-18

Family

ID=83416156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211085531.6A Active CN115146245B (en) 2022-09-06 2022-09-06 Hive series data encryption method and system with dynamically managed key authority

Country Status (1)

Country Link
CN (1) CN115146245B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656903A (en) * 2016-01-15 2016-06-08 国家计算机网络与信息安全管理中心 Hive platform user safety management system and application
CN109995712A (en) * 2017-12-29 2019-07-09 中国移动通信集团湖北有限公司 Data encryption/decryption method, device, equipment and medium
CN112364377A (en) * 2020-11-11 2021-02-12 国网山东省电力公司电力科学研究院 Data classification and classification safety protection system suitable for power industry
CN113204776A (en) * 2021-04-30 2021-08-03 新华三大数据技术有限公司 Method, device, equipment and storage medium for realizing column encryption
CN113590651A (en) * 2021-08-18 2021-11-02 四川新网银行股份有限公司 Cross-cluster data processing system and method based on HQL
CN114036126A (en) * 2021-10-20 2022-02-11 方盈金泰科技(北京)有限公司 Big data Hive transparent encryption and decryption method and system based on syntax parse tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222134B2 (en) * 2020-03-04 2022-01-11 Sotero, Inc. System and methods for data encryption and application-agnostic querying of encrypted data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656903A (en) * 2016-01-15 2016-06-08 国家计算机网络与信息安全管理中心 Hive platform user safety management system and application
CN109995712A (en) * 2017-12-29 2019-07-09 中国移动通信集团湖北有限公司 Data encryption/decryption method, device, equipment and medium
CN112364377A (en) * 2020-11-11 2021-02-12 国网山东省电力公司电力科学研究院 Data classification and classification safety protection system suitable for power industry
CN113204776A (en) * 2021-04-30 2021-08-03 新华三大数据技术有限公司 Method, device, equipment and storage medium for realizing column encryption
CN113590651A (en) * 2021-08-18 2021-11-02 四川新网银行股份有限公司 Cross-cluster data processing system and method based on HQL
CN114036126A (en) * 2021-10-20 2022-02-11 方盈金泰科技(北京)有限公司 Big data Hive transparent encryption and decryption method and system based on syntax parse tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Azure SQL Database Always Encrypted;Antonopoulos,P et al.;《Web of Science》;20200611;第1511-25页 *
Secure and Efficient Query Processing Technique for Encrypted Databases in Cloud;Sultan Almakdi,et al.;《 2019 2nd International Conference on Data Intelligence and Security (ICDIS)》;20191003;第120-127页 *
支持高并发的Hadoop高性能加密方法研究;金伟等;《通信学报》;20191225(第12期);第33-44页 *

Also Published As

Publication number Publication date
CN115146245A (en) 2022-10-04

Similar Documents

Publication Publication Date Title
WO2022126968A1 (en) Micro-service access method, apparatus and device, and storage medium
US11824970B2 (en) Systems, methods, and apparatuses for implementing user access controls in a metadata driven blockchain operating via distributed ledger technology (DLT) using granular access objects and ALFA/XACML visibility rules
JP7451565B2 (en) A system or method for enforcing the right to be forgotten on a metadata-driven blockchain using a shared secret and read agreement
JP3426091B2 (en) System that provides password synchronization
US9558228B2 (en) Client computer for querying a database stored on a server via a network
US7890643B2 (en) System and method for providing program credentials
US8925108B2 (en) Document access auditing
US8627489B2 (en) Distributed document version control
US8116456B2 (en) Techniques for managing heterogeneous key stores
US8959613B2 (en) System and method for managing access to a plurality of servers in an organization
US7519596B2 (en) Globally trusted credentials leveraged for server access control
EP1522167B1 (en) A method and an apparatus for retrieving a value secured in a key management system
US20090106549A1 (en) Method and system for extending encrypting file system
JPH09179827A (en) Method and device for providing password synchronism
JPH09185584A (en) Method and device for checking password constitution
CN106874461A (en) A kind of workflow engine supports multi-data source configuration security access system and method
WO2005119960A2 (en) Structure preserving database encryption method and system
CN112511599B (en) Civil air defense data sharing system and method based on block chain
CN110851127B (en) Universal evidence-storing method based on blockchain
US20210306151A1 (en) Deauthorization of private key of decentralized identity
JPH10232811A (en) Security management method for data base
JP2002297606A (en) Method and system for access to database enabling concealment of inquiry contents
EP0773489B1 (en) System for providing password synchronization and integrity in a DCE environment
CN115146245B (en) Hive series data encryption method and system with dynamically managed key authority
Reiher et al. Truffles—a secure service for widespread file sharing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant