CN107409040B - Code analysis tool for recommending data encryption without affecting program semantics - Google Patents

Code analysis tool for recommending data encryption without affecting program semantics Download PDF

Info

Publication number
CN107409040B
CN107409040B CN201680012395.4A CN201680012395A CN107409040B CN 107409040 B CN107409040 B CN 107409040B CN 201680012395 A CN201680012395 A CN 201680012395A CN 107409040 B CN107409040 B CN 107409040B
Authority
CN
China
Prior art keywords
application code
encryption
data elements
encryption scheme
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201680012395.4A
Other languages
Chinese (zh)
Other versions
CN107409040A (en
Inventor
A·S·曼彻帕利
于浩海
M·J·兹韦灵
K·瓦斯瓦尼
P·安拓诺波洛斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN107409040A publication Critical patent/CN107409040A/en
Application granted granted Critical
Publication of CN107409040B publication Critical patent/CN107409040B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/088Usage controlling of secret information, e.g. techniques for restricting cryptographic keys to pre-authorized uses, different access levels, validity of crypto-period, different key- or password length, or different strong and weak cryptographic algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0894Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
    • H04L9/0897Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage involving additional devices, e.g. trusted platform module [TPM], smartcard or USB
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms

Abstract

Systems, methods, and computer program products are described herein that analyze code of an application, identify whether data elements (e.g., columns) referenced by the code can be encrypted based on the analysis, and recommend an encryption scheme for those data elements that can be encrypted. The recommended encryption scheme for a given data element may be the highest level of encryption that can be applied to it without affecting the semantics of the application code. The output generated based on the analysis may include not only the mapping of each data element to the recommended encryption scheme, but may also include an explanation of the reason for making each recommendation for each data element. Such interpretation may include, for example, identification of application code that resulted in a recommendation for each data element.

Description

Code analysis tool for recommending data encryption without affecting program semantics
Background
Data leakage is of increasing concern as more and more data is stored digitally. For example, for applications employing cloud services for managing sensitive and business critical information, data leakage may be a major deterrent. On public clouds, applications must be protected from potentially malicious cloud administrators, malicious co-tenants, and other entities that may gain access to data through various legitimate means. Since the computing and storage platform itself cannot be trusted, any data that appears in clear text anywhere on the cloud platform (on disk, in memory, or on the transmission line) must be considered vulnerable to leakage or malicious corruption. In industries such as finance, banking, and healthcare, compliance mandates that strong protection be exercised against these types of threats. However, existing security solutions such as Transparent Data Encryption (TDE) and Transport Layer Security (TLS) only protect data in the static and in-transit, but the data during computation (data in use) is vulnerable.
One way to address this problem is to use a Partially Homomorphic Encryption (PHE) scheme. These are encryption schemes that permit restricted type computations to be performed directly on encrypted data. Other mechanisms also exist, such as secure hardware, which allows some computation to be performed on encrypted data. However, these solutions are not general purpose, as they cannot perform all types of operations on encrypted data. Thus, users of platforms employing these mechanisms must analyze their applications and decide whether a portion of their data can be encrypted while preserving application semantics. Making such a determination can be very difficult because each data element referenced by the application may be subject to complex constraints and dependency sets that may limit the type of encryption that may be applied thereto. Moreover, inaccurate determinations in this regard may result in the inability to apply the strongest possible encryption to certain data elements, as well as the failure of application logic due to the inability to perform certain operations on encrypted data.
A related system for solving the data security problem is CryptDB ("CryptDB: protective Configity with Encrypted query processing") in the Proceedings of the 23rd ACM Symposium on Operating System proteins (SOSP) of grape teeth Cascais at 10 months 2011 by Raluca Ada Popa et al. In general, CryptDB is a database that performs a query on encrypted data using the PHE scheme. CryptDB requires developers to specify the strongest encryption scheme that can be applied to each database column. Without any such designation, the CryptDB assumes that the column can be maintained in plaintext form. For the reasons discussed above, determining the strongest encryption scheme that may be applied to each database column may be very difficult to implement, and errors in this regard may result in insufficiently protected columns and the failure of application code that operates on columns that have already been encrypted.
Disclosure of Invention
Systems, methods, and computer program products are described herein that analyze code of an application and, based on the analysis, identify whether data elements (e.g., columns) referenced by the code can be encrypted and, for those data elements that can be encrypted, recommend an encryption scheme. The recommended encryption scheme for a given data element may be the highest level of encryption that can be applied to it without affecting the semantics of the application code. In conducting the analysis, embodiments described herein may consider factors such as: (1) the encryption scheme available; (2) whether any available encryption schemes enable operations to be performed on the encrypted data, and which operations are permitted; (3) whether operations performed on certain encrypted data elements can be deferred to clients outside of the database server, in accordance with a deferred evaluation scheme; (4) constraints that have been placed on the data elements (e.g., by means of type definitions, by means of operations performed on the data elements, etc.); and (5) dependencies between data elements (e.g., dependencies resulting from database schemas or from application code). The output generated based on the analysis may include not only the mapping of each data element to the recommended encryption scheme, but may also include an explanation of the reason for each recommendation being made for each data element. Such interpretation may include, for example, identification of application code that resulted in a recommendation for each data element.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Moreover, it is noted that claimed subject matter is not limited to the specific embodiments and/or to the specific examples described in other portions of this document. These embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
Drawings
The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
Fig. 1 is a block diagram of a system in which an untrusted data hosting platform uses one or more Partially Homomorphic Encryption (PHE) schemes to perform a restricted type of computation directly on encrypted data.
Fig. 2 is a block diagram of a system in which an untrusted data hosting platform uses one or more PHE schemes to perform a limited type of computation directly on encrypted data and also uses deferred evaluation to defer at least some of the computation performed on the encrypted data to a trusted client outside of a database server.
Fig. 3 is a block diagram of a code analysis tool that analyzes code of an application, identifies data elements referenced by the application code based on the analysis, identifies whether such data elements can be encrypted, and also identifies the strongest encryption scheme that can be applied to these data elements without changing the application semantics, according to an embodiment.
FIG. 4 depicts a flowchart of a method performed by a code analysis tool, according to an embodiment.
FIG. 5 depicts a flowchart of a method for performing deferred evaluation analysis as part of generating cryptographic recommendations for an application, according to an embodiment.
Fig. 6 depicts a flowchart of a method for performing a top-down traversal of an Abstract Syntax Tree (AST) as part of performing a deferred evaluation analysis, according to an embodiment.
Fig. 7 depicts a flowchart of a method for performing cryptographic recommendation analysis for an application, according to an embodiment.
Fig. 8 depicts a flowchart of a method for determining the strongest encryption scheme that may be applied to an expression and modifying the associated set and mapping accordingly as part of an applied encryption recommendation analysis, in accordance with an embodiment.
FIG. 9 depicts a flowchart of a method for performing an explain recommendation analysis to facilitate interpreting encrypted recommendations generated for an application, in accordance with an embodiment.
FIG. 10 is a block diagram of an example processor-based computer system that can be used to implement various embodiments.
The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.
Detailed Description
I. Brief introduction to the drawings
The following detailed description refers to the accompanying drawings that illustrate exemplary embodiments of the invention. The scope of the invention is not, however, limited to these embodiments, but by the appended claims. Accordingly, embodiments other than those shown in the drawings (such as modified versions of the illustrated embodiments) may still be encompassed by the present invention.
References in the specification to "one embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the relevant art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Embodiments described herein relate to protecting digitally stored data using encryption. For example, embodiments described herein relate to using encryption to protect data hosted on an untrusted platform. While many conventional encryption schemes preserve data confidentiality, such encryption schemes typically do not permit any computation to be performed on the encrypted data. This significantly reduces the benefits of hosting applications on the cloud platform. As discussed above in the background section, one way to address this problem is to use a Partially Homomorphic Encryption (PHE) scheme that permits a limited type of computation directly on encrypted data.
By way of example, fig. 1 is a block diagram of a system 100 in which an untrusted platform performs a restricted type of computation directly on encrypted data using one or more PHE schemes. As shown in FIG. 1, system 100 includes a plurality of end-user terminals 1041-104NEach end-user terminal executes a corresponding instance of an enterprise website or application. Various clients 1021-102NThese enterprise web sites or application instances may be interacted with to access customer-related data. For example, in one implementation of system 100, the enterprise includes a bank, and clients 102 include banks1-102NEach customer in the bank may interact with an instance of the bank's website or application to access his/her account data. The customer-related data is stored on an untrusted platform (e.g., a cloud platform) that includes database server 108. In particular, database server 108 stores client-related data in a table 120 that includes at least some encrypted columns. Database server 108 may include SQL published by Microsoft Windows of Redmond, Washington
Figure BDA0001389546940000051
Examples of (3).
Queries generated by various instances of the enterprise website or application are routed to the middle tier application server 106 via one or more networks. For example, the client 1021May be provided via a network at the end-user terminal 1041To an executing enterprise website or application instance to submit a request toCausing query 132 to be generated. The query 132 is sent to the data source interaction framework 112 executing on the middle tier application server 106. The data source interaction architecture 112 may include a class library that may be used to interact with data sources such as databases and XML files. For example, the data source interaction architecture may include an ado. It should be noted that the middle tier application server 106 shown in FIG. 1 is optional, and when it does not exist, a separate instance of the data source interaction architecture 112 may instead be installed at the end-user terminal 1041-104NOn each end-user terminal.
The data source interaction architecture 112 includes an encryption/decryption layer 114 that analyzes the query 132. If encryption/decryption layer 114 determines that query 132 includes parameters corresponding to the encrypted columns in table 120, encryption/decryption layer 114 encrypts the parameter data using the key from key store 110, thereby generating a query with encryption parameters 134. The query with encryption parameters 134 is sent to database server 108 via one or more networks.
At database server 108, query processor 124 processes the query with encryption parameters 134, where such processing includes accessing relevant data from table 120, including any encrypted data. In some cases, this processing includes utilizing one or more PHE schemes to perform computations directly on encrypted data retrieved from table 120. In this example, the query processor 124 processes the query with the encryption parameters 134 to produce query results with encrypted data 136. It should be noted that in system 100, the encrypted data and keys stored in column 120 never appear in the clear on database server 108. Database server 108 may store information such as encryption type, key Identifier (ID), and encryption keys used on a per-column basis in one or more metadata tables 122.
The query results with the encrypted data 136 are then sent back to the data source interaction architecture 112 on the middle tier application server 106. The encryption/decryption layer 114 within the data source interaction architecture 112 uses the appropriate key pair from the key store 110 to include in the key stream with the encrypted data 136The encrypted data in the query result is decrypted to generate query result 138. The middle tier application server 106 then returns the query results 138 to the client 102 in the form of readable information (i.e., plaintext information)1. In particular, such query results 138 are returned at the end-user terminal 1041An instance of a corporate web site or application executing on, and thus presented to, the client 1021
As can be seen from the foregoing description, system 100 includes an untrusted platform that performs some of the computations directly on encrypted data using one or more PHE schemes. However, before an enterprise can utilize the platform to support applications, it must first determine whether certain portions of the application data (e.g., some columns within table 120) can be encrypted while preserving application semantics. Making such a determination can be very difficult because each data element referenced by the application can be subject to complex constraints and dependency sets that may limit the types of encryption that can be applied to the data elements. Moreover, inaccurate determinations in this regard may result in the inability to apply the strongest possible encryption to certain data elements, as well as the failure of application logic due to the inability to perform certain operations on encrypted data (e.g., a query may fail if the query processor 124 is unable to perform the required calculations on certain encrypted data).
Assuming that the enterprise can determine the highest level of encryption that can be applied to each column, the enterprise can then make a mode change to the column that needs to contain the encrypted data, and in addition, encrypt the sensitive data using their keys from the key store. Once this is done, the enterprise is ready to allow customer transactions.
Because the encrypted data stored within table 120 can only be manipulated in encrypted form on database server 108, the types of database operations (e.g., SQL operations) that can be performed on the encrypted columns are limited and depend on the type of encryption used. For example, a column encrypted using a deterministic encryption scheme may perform operations such as equations, packets, and equation-based connections, but may not perform other operations such as concatenations, decisions as to whether empty (isnull), truncations, and sorting. This constraint may be mitigated if the computations that must be performed on the encrypted data may be deferred or delayed to a trusted client outside of the database server where the encryption key is located. This "deferred evaluation" approach to query processing may enable a greater number of columns to be encrypted within table 120. To enable deferred evaluation, intelligence must be added to the database server 108, which database server 108 can determine whether a given query can be satisfied by pushing some of the computations to the client.
Fig. 2 depicts an example system 200 in which an untrusted platform (a) performs a restricted type of computation directly on encrypted data using one or more PHE schemes and (b) also uses deferred evaluation to defer at least some of the computation performed on the encrypted data to a trusted client outside of a database server. As shown in fig. 2, system 200 has an architecture similar to system 100. In particular, system 200 includes a plurality of end-user terminals 2041-204NEach end-user terminal executes a corresponding instance of an enterprise website or application. Various clients 2021-202NThese enterprise web sites or application instances may be interacted with to access customer-related data. The customer-related data is stored on an untrusted platform (e.g., a cloud platform) that includes database server 208. In particular, database server 208 stores client-related data in a table 220 that includes at least some encrypted columns.
Queries generated by various instances of the enterprise website or application are routed to the middle tier application server 206 via one or more networks. For example, customer 2021May be provided via a network at end-user terminal 2041The instance of the enterprise website or application executing thereon submits the query, causing query 232 to be generated. The query 232 is transmitted to the data source interaction architecture 212 executing on the middle tier application server 206. As with system 100, middle tier application server 206 is optional. When not present, a separate instance of the data source interaction framework 212 may instead be installed at the end-user terminal 2041-204NOn each end-user terminal.
The data source interaction architecture 212 includes an encryption/decryption layer 214 that analyzes the query 232. If encryption/decryption layer 214 determines that query 232 includes parameters corresponding to the encrypted columns in table 220, encryption/decryption layer 214 encrypts the parameter data using the key from key store 210, thereby generating a query with encryption parameters 234. The query with encryption parameters 234 is sent to database server 208 via one or more networks.
At database server 208, query processor 224 processes the query with encryption parameters 234, where such processing includes accessing relevant data from table 220, including any encrypted data. In some cases, this processing includes performing computations directly on the encrypted data retrieved from table 220 using one or more PHE schemes. The deferred evaluation generator 226 within the query processor 224 also operates to determine whether certain aspects of the query processing involving the computation of encrypted data may be deferred or pushed to the middle tier application server 206. In this case, the query processor 224 will send back the encrypted data, the deferred evaluation steps, and the sequence for performing these steps, all represented in FIG. 2 by reference numeral 236, to the data source interaction architecture 212 on the middle tier application server 206.
An encryption/decryption layer 214 within the data source interaction architecture 212 decrypts the encrypted data using the appropriate key from the key store 210. The expression evaluator and execution engine 216 within the data source interaction architecture 212 then applies the deferred evaluation steps received from the database server 208 in the specified order to the decrypted data to generate the query result 238. The middle tier application server 206 then returns the query results 238 in the form of readable information (i.e., plaintext information) to the client 2021. In particular, such query results 238 are returned at the end-user terminal 2041To an enterprise website or application executing thereon, and thereby presented to the customer 2021
As can be seen from the foregoing description, system 200 includes an untrusted platform that advantageously utilizes one or more PHE schemes to perform certain calculations directly on encrypted data and also utilizes deferred evaluation to increase the number of database columns that may be encrypted. However, as with system 100, before an enterprise can utilize the platform to support applications, it must first be determined whether portions of the application data can be encrypted while preserving application semantics. As noted above with reference to system 100, because each data element referenced by an application may be subject to complex constraints and dependency sets that may limit the types of encryption that may be applied thereto, making such a determination may be very difficult. In addition, when making this determination with respect to system 200, the enterprise must also consider that certain operations may be delayed to the client while others may not, thereby increasing the complexity of the analysis and the likelihood of error.
In the foregoing description of systems 100 and 200, reference is made to a limited set of computations using the PHE scheme to perform computations directly on encrypted data. As will be appreciated by those skilled in the relevant art, there are other mechanisms, such as secure hardware, that allow some computation to be performed on encrypted data. However, like PHEs, these solutions are not generally versatile because they cannot perform all types of operations on encrypted data. Thus, parties deploying an application to a platform that utilizes one of these alternative solutions also need to determine whether portions of the application data can be encrypted while preserving application semantics.
Systems, methods, and computer program products that address the above-mentioned issues are described herein. In particular, the systems, methods, and computer program products described herein analyze code of an application and, based on the analysis, identify whether data elements (e.g., columns) referenced by the code can be encrypted and, for those data elements that can be encrypted, recommend an encryption scheme. The recommended encryption scheme for a given data element may represent the highest level of encryption that can be applied to it without affecting the semantics of the application code. In conducting the analysis, embodiments described herein may consider factors such as: (1) the encryption scheme available; (2) whether any available encryption schemes enable operations to be performed on the encrypted data, and which operations are permitted; (3) whether operations performed on certain encrypted data elements can be deferred to a client external to the database server, in accordance with a deferred evaluation scheme; (4) constraints that have been imposed on the data elements (e.g., by means of type definitions, by means of operations performed on the data elements, etc.); and (5) dependencies between data elements (e.g., due to database schema or due to application code dependencies). The output generated based on the analysis may include not only the mapping of each data element to the recommended encryption scheme, but may also include an explanation of the reason for making each recommendation for each data element. Such interpretation may include, for example, identification of application code that resulted in a recommendation for each data element.
By automatically generating such recommendations, prior to deploying the application on a platform hosting encrypted data, embodiments described herein may advantageously provide an indication to a developer or publisher of the application of which data elements may be encrypted and the highest level of encryption that may be applied thereto without affecting the program semantics. Such data hosting platforms may include, but are not limited to, a data hosting platform that may perform limited types of computations on encrypted data (e.g., the platform of system 100) or a data hosting platform that may perform limited types of computations on encrypted data and also support deferred evaluation (e.g., the platform of system 200). Further, because embodiments described herein may also identify aspects of an application that cause a particular encryption scheme recommendation, such embodiments provide useful guidance to application developers and publishers on how to best modify the application in order to increase the level of encryption that may be applied to one or more data elements.
In the following sections, embodiments of the above-described systems, methods, and computer program products are described more fully. In particular, section II describes systems, methods, and computer program products that analyze application code and recommend encryption of data elements referenced therein based on such analysis, wherein the encryption recommendation does not affect program semantics. Section III describes an example processor-based computer system that can be used to implement various embodiments. Section IV describes some additional exemplary embodiments. Section V provides some conclusion comments.
Code analysis for recommending encryption of data without affecting program semantics
FIG. 3 is a block diagram of a code analysis tool 300 that may be used to guide application developers and publishers in selecting security policies for protecting application data. In particular, the analysis performed by code analysis tool 300 identifies data elements referenced by application code, identifies whether such data elements can be encrypted, and also identifies the strongest encryption scheme that can be applied to such data elements without changing the application semantics. As will be discussed herein, the analysis may also identify or explain why a data element must be encrypted using some encryption scheme. For example, the analytics may identify certain expressions within the application code that cannot be supported if the data is encrypted using a stronger encryption type than the recommended encryption type.
The analysis performed by code analysis tool 300 may be used to greatly simplify the process of migrating applications to untrusted data hosting platforms. For example, if the encryption level recommended for analysis of a particular data element (e.g., a particular column in a database table) is deemed to be sufficiently strong, the application developer or publisher may apply the recommended data encryption level to that data element. Otherwise, the application developer or publisher may use the analysis to identify portions of the application that should be altered to allow more data elements to be encrypted. However, it should be noted that the features of code analysis tool 300 are not limited to analysis of applications to be deployed to untrusted data hosting platforms, and the features described herein may also be applied to other types of applications.
As shown in FIG. 3, the inputs to code analysis tool 300 include application code 310 and database schema information 320. In this particular example, application code 310 includes a number of queries 312 and a number of stored procedures 314. However, this example is not intended to be limiting, and other types of application code (e.g., source code) may be analyzed in accordance with the techniques described herein.
Query 312 includes queries generated by one or more instances of an application over one or more time periods, and may also be referred to as a "workload. In one embodiment, query 312 comprises an SQL query (e.g., a query expressed in a language such as T-SQL), although this example is in no way limiting and any type of query may constitute query 312. Query 312 may be obtained in a variety of ways. For example, during normal operation of one or more instances of an application, query 312 may be captured at the client-side (e.g., at a machine where the application instance is being executed) or at the server-side (e.g., at a database server where the query is received for processing). Query 312 may be captured within a time period that is long enough to ensure that all types of queries, substantially all types of queries, or at least a majority of types of queries that may be generated by an application are represented therein. Query 312 may also be provided to code analysis tool 300 in a variety of ways. For example, but not limiting of, the query 312 may be provided as part of an electronic document (e.g., an XML document) or file.
Stored procedures 314 include procedures that may be invoked by an application by passing input parameters and output parameters to the application. As will be appreciated by those skilled in the art, each of the stored procedures 314 includes a set of queries that operate to perform certain operations as logical units. Each stored PROCEDURE in stored PROCEDUREs 314 can be created by an application developer (e.g., by using a CREATE PROCEDURE command in SQL) and form part of a database schema. Like query 312, stored procedures 314 may also be provided to code analysis tool 300 in a variety of ways. For example, but not limiting of, the stored procedures 314 may be provided as part of an electronic document (e.g., an XML document) or file.
Database schema information 320 includes information about the logical organization or structure of the database to which query 312 and stored procedure 314 are directed. Database schema information 320 may include, for example and without limitation, descriptions of database tables, columns, views, relationships, indexes, primary/foreign key constraints, and the like. Like query 312 and stored process 314, database schema information 320 may be provided as part of an electronic document or file, or in some other manner.
Code analysis tool 300 analyzes application code 310 and database schema information 320 in a manner to be described herein, and generates cryptographic recommendations and associated interpretations 330 based on such analysis. In the embodiment illustrated in FIG. 3, the analysis performed by code analysis tool 300 includes at least three components: a deferred evaluation analysis performed by deferred evaluation analysis logic 302, an encrypted recommendation analysis performed by encrypted recommendation analysis logic 304, and an interpreted recommendation analysis performed by interpreted recommendation analysis logic 306. Each of these components will be described in detail herein. Each of these components may be implemented in software, executing instructions by one or more general or special purpose processors, in hardware using analog and/or digital circuitry, or as a combination of software and hardware. Although each of these components is illustrated in fig. 3 as a discrete entity, it should be understood that the logic associated with these components may be an integrated part of some larger components (e.g., where the components are implemented in software, the software instructions associated with each component may include a portion of the same source code, library, executable file, etc.).
As noted above, the code analysis tool 300 outputs an encrypted recommendation and an associated interpretation 330. The encrypted recommendation and associated interpretation 330 include an identification of one or more data elements referenced by the application code 310; for each data element so identified, an indication of whether the data element can be encrypted; and for data elements that may be encrypted, a recommended highest level of encryption may be applied thereto. Although this example is not intended to be limiting and other types of data elements may be identified, the identified data elements may include, for example, columns in one or more database tables. The level of encryption that may be applied may vary depending on implementation, but may include, for example and without limitation, randomized encryption, deterministic encryption, or plaintext (i.e., not encrypted). The encryption recommendation and associated interpretation 330 may also include an indication that two or more data elements referenced by the application code 310 must be encrypted using the same encryption algorithm and the same encryption key.
For one or more of the data elements for which encryption recommendations are provided, the encryption recommendation and associated interpretation 330 also includes an explanation of the reason for encrypting each data element using a particular encryption scheme. Such an interpretation may include the identification and/or location of one or more expressions included in application code 310 that cannot be supported if the data element is encrypted using a stronger encryption scheme than the recommended encryption scheme.
The encryption recommendation and associated interpretation 330 may be output in a variety of forms depending on the implementation. For example, the encrypted recommendation and associated interpretation 330 may be output in a form that can be easily viewed by the user (e.g., a printed format for viewing on paper, or a display format for viewing on a display or on the Web), in a data format such as XML, or programmatically through a data reading interface. Code analysis tool 300 may also act as a data provider for one or more other systems, analyzing application code 310 and database schema information 320 in the manner described herein, and then passing the results of such analysis to other components for further processing.
In one embodiment, code analysis tool 300 comprises a software program (e.g., a processor-based computer system described below in section III) that can be installed on a computer and then executed in an offline mode. In an alternative embodiment, code analysis tool 300 comprises a software program that executes on a server computer such that its features are accessible by one or more client computers connected thereto via one or more networks (e.g., the Internet). According to the latter embodiment, a user of code analysis tool 300 may provide input to and/or receive output from code analysis tool 300 via the network described above. Code analysis tool 300 may also utilize other means for obtaining input. For example, in an embodiment, code analysis tool 300 may capture queries 312 as they are communicated to or processed by a database server. As another example, code analysis tool 300 may obtain stored procedures 314 and/or database schema information 320 by directly accessing the database (e.g., via database connection information maintained by a database server or provided by a user). Other methods for obtaining input may also be used.
In an embodiment, code analysis tool 300 comprises a stand-alone software tool. In an alternative embodiment, code analysis tool 300 comprises a portion of a set of software tools that are incorporated into a particular software product or platform. For example, code analysis tool 300 may include one of a collection of tools incorporated into a version of SQL SERVER published by Microsoft Corporation (e.g., as
Figure BDA0001389546940000131
Part of Management Studio). According to such embodiments, the features of code analysis tool 300 may be accessed via one or more Graphical User Interfaces (GUIs) provided as part of a particular software product or platform. For example, such a GUI may include a GUI through which a user may provide application code 310 and/or database schema information 320; or a GUI through which a user may specify a means for accessing such input (e.g., a database connection); and a GUI for displaying or otherwise providing output (e.g., encrypted recommendations and associated interpretations 330).
FIG. 4 depicts a flow diagram 400 that further illustrates the general manner of operation of code analysis tool 300. The method of flowchart 400 is described herein by way of illustration only and not by way of limitation. As shown in FIG. 4, the method of flowchart 400 begins at step 402, where code analysis tool 300 receives input application code 310 and database schema information 320.
At step 404, code analysis tool 300 analyzes application code 310 and database schema information 320 to identify a recommended encryption scheme for each of one or more data elements (e.g., one or more columns) referenced by the application code.
At step 406, code analysis tool 300 further analyzes application code 310 and database schema information 320 to generate an explanation regarding the reason for identifying the recommended encryption scheme for each of the one or more data elements.
At step 408, the code analysis tool 300 outputs an identification of each of the one or more data elements and the recommended encryption scheme, and an interpretation associated with each of the one or more data elements. Such output may include, for example, an encrypted recommendation and an associated interpretation 330.
The first three subsections in the following four subsections describe the manner in which delayed evaluation analysis logic 302, encrypted recommendation analysis logic 304, and interpreted recommendation analysis logic 306 operate, respectively. In these three subsections, various analyses are sometimes described in the context of databases that support column-level encryption. However, the analysis is not limited to this context and may also be used in other contexts. Also, in these three subsections, it is assumed that the application of interest consists of a set of query and storage procedures. However, this example is not limiting, and the analysis described herein may operate on other types of application code.
It should be noted that in some alternative embodiments, one or more of the delayed evaluation analysis logic 302 and the interpreted recommendation analysis logic 306 may be disabled or may not be included within the code analysis tool 300. For example, if the code analysis tool 300 is applied to an application before it is deployed to an untrusted platform that does not support deferred evaluation, then essentially no deferred evaluation analysis need be performed. Thus, the analysis may be disabled, or the logic for performing such analysis may be eliminated altogether. Also, there may be implementations or scenarios in which it is undesirable to provide an explanation for the encrypted recommendation. In this case, the interpretation recommendation analysis may be disabled, or the logic for performing such analysis may be eliminated altogether.
A. Deferred evaluation analysis
The deferred evaluation analysis is performed by deferred evaluation analysis logic 302. Deferred evaluation analysis identifies expressions within application code 310 that may be deferred to clients by database servers (e.g., by database server 208 supporting deferred evaluation).
The basic premise of deferred evaluation is to defer the evaluation of an expression until the result of the evaluation is needed. Typically, when an expression is evaluated, it will be at the original value (e.g., integer and string). In some cases, it may not be possible to evaluate an expression because some of the inputs to the expression are not yet available. This may occur, for example, when input values are read over a network, when they are computed in different threads, or when they are encrypted and encryption keys are not available. In each of these cases, it is generally desirable for the runtime to "continue" evaluation as much as possible even without these input values, and then "insert" these values when they become available (e.g., when a network read completes, another thread completes, or when an encryption key becomes available).
By way of illustration, consider a table called "Customer" with a column called "Name". The table is assumed to be as follows:
Name
Alice
Bob
also, consider the expression SELECT ' < Customer Name ═ Name + ' > ' FROM Customer. Typically, when applied to the above table, the query will return:
Column
<Customer Name=Alice>
<Customer Name=Bob>
now assume that the column Name in the table Customer is encrypted. Then, the table Customer may be as follows:
Name
0x2432fea324cd
0x64ace45824de
the query cannot be properly evaluated if the database server does not have the necessary encryption key.
A database server configured to perform deferred evaluation may return the following result set to the client:
Name
(+,‘<Customer Name=’,(+,0x2432fea324cd,‘>’)
(+,‘<Customer Name=’,(+,0x64ace45824de,‘>’)
each cell in the result set is essentially a "deferred evaluation expression" comprising one or more original values, one or more operators, and at least one cryptographic value. A client having access to the relevant encryption key can now decrypt the encrypted value and generate the correct result.
In theory, the evaluation of all kinds of expressions can be deferred to the client. However, deferring the evaluation of certain operations is often undesirable for various reasons (e.g., performance, latency, etc.). For example, deferring evaluation of computation and/or data intensive operations (such as join, filter, sort, etc.) may result in poor performance because the inputs to these operations must be communicated to the client and additional resources must be provisioned on the client to perform these operations. Thus, the query processor at the database server may restrict deferred evaluations to certain classes or types of simple expressions. For example, in some platforms, deferred evaluation may be limited to scalar expressions, i.e., to computations on scalar values, rather than relational operations (such as join and filter).
Because these expressions modify database tables, expressions with side effects (e.g., inserts, deletes, and updates) may not be considered good candidates for deferred evaluation. Thus, such expressions can be most efficiently processed by the database server. However, it is even possible to delay the evaluation of these expressions. For example, the database server may potentially store the delayed expression in a table (e.g., for INSERT) and then evaluate the expression as it is retrieved by future SELECT operations.
As previously indicated, platforms that support deferred evaluation can typically support encryption for a greater number of columns, as it is no longer constrained by the capabilities of the available encryption schemes. In other words, more data may be encrypted since the delayed evaluation enables more operations to be performed on the encrypted data. However, given that deferred evaluation itself may have its own set of constraints (e.g., only deferred evaluation of scalar expressions), application developers and publishers still need to direct which data elements (e.g., columns) can be encrypted in order for the application workload to function correctly. Thus, the analysis performed by the delayed evaluation analysis logic 302 ensures that the encryption provided by the code analysis tool 300 recommends the fact that: evaluation of certain expressions within application code 310 may be deferred to the client.
An exemplary method of performing deferred evaluation analysis by its deferred evaluation analysis logic 302 will now be described with reference to flowchart 500 of FIG. 5. The method of flowchart 500 is described herein by way of illustration only. Other methods, including variations of the method of flowchart 500, may be used to identify expressions within application code 310 that may be delayed by a database server to a client. For purposes of illustration, the method of flowchart 500 will now be described with continued reference to code analysis tool 300 of FIG. 3. However, the method is not limited to this embodiment.
As shown in FIG. 5, the method of flowchart 500 begins at step 502, after which control flows to step 504. At step 504, the deferred evaluation analysis logic 302 converts the components of the application code 310 into an Abstract Syntax Tree (AST) representation. For example, delayed evaluation analysis logic 302 may convert each query within query 312 into its own AST and may convert each query within stored procedure 314 into its own AST.
At step 506, the delayed evaluation analysis logic 302 associates a delayed evaluation flag (in one example, flag "isdeleted") with each expression in each AST. Each deferred evaluation flag indicates whether the expression associated with it can be deferred by the database server to the client. Initially, the analytics assume that all expressions can be deferred to the client, which is indicated by setting the value of the deferred evaluation flag to a true value (e.g., "true") for all expressions.
At step 508, the deferred evaluation analysis logic 302 performs a top-down pass on each of the AST to discover new variables that must be evaluated early (i.e., to identify variables that cannot be evaluated via deferred evaluation) and sets the deferred evaluation flag to a false value (e.g., "false") for each such variable. One manner of performing step 508 is described below with reference to flowchart 600 of fig. 6.
At decision step 510, the delayed evaluation analysis logic 302 determines whether any new variables that must be evaluated early were found during the previous iteration of step 508. If the deferred evaluation logic 302 determines that at least one new variable that must be evaluated early was found during a previous iteration of step 508, the deferred evaluation analysis logic 302 will perform another iteration of step 508. However, if the deferred evaluation analysis logic 302 determines that no new variable 508 that must be evaluated early was found during the last iteration of step 508, the deferred evaluation analysis ends, as shown at step 512. The analysis result includes the value of the deferred evaluation flag for each expression in each AST of the application code 310.
Flowchart 600 depicts one method of performing a top-down traversal of each AST in step 508 of flowchart 500 according to an example embodiment. As shown in FIG. 6, the method of flowchart 600 begins at step 602, where deferred evaluation analysis logic 302 sets the deferred evaluation flags for all expressions in an AST that cannot be deferred for evaluation to false values. Which expressions cannot be evaluated with delay will vary from run to run. For example, consider a hypothetical runtime that supports deferred evaluation for addition and subtraction rather than multiplication. Now consider the expression:
x ═ a + b) -c; and
y=(a+b)*c.
obviously, the evaluation of the first expression may be delayed, but the evaluation of the second expression may not.
In an embodiment, deferred evaluation analysis logic 302 sets the deferred evaluation flag to a false value for one or more of the following expression types: all expressions with side effects (e.g., insert/delete/update statements); expressions that are considered too costly to push to the client (e.g., filtering, connecting, etc.); control flow statements (e.g., if, while, etc.); and expressions assigned to variables that must be evaluated early. However, this is merely an example, and in alternative embodiments, one or more of these expression types may be candidates for deferred evaluation.
At step 604, the deferred evaluation analysis logic 302 propagates the value of the deferred evaluation flag from each parent expression to each child expression in the AST. If the parent expression of an expression cannot be deferred for evaluation, then the child expression cannot be deferred for evaluation either. Thus, such propagation includes: propagating the value of the deferred evaluation flag from the parent expression having the deferred evaluation flag with a false value to the child expression having the deferred evaluation flag with a true value. Note that this propagation is monotonic in that once the deferred evaluation flag is set to a false value for an expression (either by propagation from the parent expression or because the expression itself cannot be deferred evaluated, as determined in step 502), it may not be set to a true value again.
At step 606, deferred evaluation analysis logic 302 updates the set of variables that must be evaluated early based on the results of steps 602 and 604. As discussed above with respect to flowchart 500, if a newly discovered variable is added to the combination, another iteration of step 508 (e.g., the method of flowchart 600) is performed. This technique of making multiple iterations of AST is performed at least in part because different AST's may include the same variables, which determination must be taken into account when traversing other AST's that include the same particular variable if it is determined that the particular variable must be evaluated early at the time of traversing one AST.
B. Cryptographic recommendation analysis
The encrypted recommendation analysis logic 304 performs encrypted recommendation analysis. For each of the one or more data elements referenced by application code 310, the encryption recommendation analysis produces an indication of whether the data element can be encrypted and, if the data element can be encrypted, a recommended encryption scheme. The recommended encryption scheme for a particular data element may include the strongest level of encryption that may be applied to that data element without altering the application code. As used herein, the term "encryption scheme" refers to one or more of an encryption algorithm (e.g., AES), an encryption type (randomization, deterministic, etc.), and a key used to encrypt a certain data element (e.g., expression, column, etc.).
In one embodiment, the cryptographic recommendation analysis uses the output of the analysis performed by the delayed evaluation analysis 302 to generate recommendations. As noted above, such output may include a flag or other indicator associated with each expression in application code 310 that indicates whether the expression may be subject to deferred evaluation. However, in certain alternative embodiments (e.g., embodiments in which deferred evaluation analysis logic 302 is disabled or not present), the encryption recommendation analysis need not take into account this deferred evaluation information.
In one particular embodiment, the encrypted recommendation analysis generates encrypted recommendations for data elements according to the following rules:
Figure BDA0001389546940000191
Figure BDA0001389546940000201
however, this is only one example, and the encryption schemes that may be recommended to apply to different data elements may vary from implementation to implementation.
A description of one method by which the encrypted recommendation analysis logic 304 may perform encrypted recommendation analysis is now described with reference to the flow diagram 700 of FIG. 7. The method of flowchart 700 is described herein by way of illustration only. Other methods, including variations of the method of flowchart 700, may be used to generate encrypted recommendations for data elements within application code 310. For purposes of illustration, the method of flowchart 700 will now be described with continued reference to code analysis tool 300 in FIG. 3. However, the method is not limited to this embodiment.
Pursuant to the method of flowchart 700, encryption recommendation analysis logic 304 maintains a mapping from the set of expressions in application code 310 to the strongest encryption schemes that can be applied to those expressions. During step 702, upon initialization, the encryption recommendation analysis logic 304 assigns each data element (e.g., each column and variable) in the application code 310 to its own set and maps each of those sets to the strongest encryption scheme available.
During step 704, the encryption recommendation analysis logic 304 performs a bottom-up traversal for each AST in the AST representation of the application code 310 to access each expression included therein.
At step 706, for each expression accessed, the encryption recommendation analysis logic 304 determines the strongest encryption scheme that may be applied to the data on which the expression operates. This determination may be performed based on whether expressions can be evaluated deferred, based on constraints associated with the expressions (e.g., some expressions may operate on certain types of encrypted data, while other expressions cannot), and other factors. Based on the results of this determination, the cryptographic recommendation analysis logic 304 modifies the existing set and mapping as needed.
One exemplary method of performing step 706 of flowchart 700 for each accessed expression is now described with reference to flowchart 800 of FIG. 8. Pursuant to this example, encryption recommendation analysis logic 304 may recommend or select one of three encryption levels for each data element in application code 310 from a predefined set of encryption schemes. These encryption levels are, from strongest to weakest: randomized encryption, deterministic encryption, and plaintext (i.e., no encryption). However, this is merely an example, and one skilled in the relevant art will appreciate that in other implementations, the recommendation may be based on a different set of encryption schemes.
As shown in FIG. 8, the method of flowchart 800 begins at step 802, after which control flows to decision step 804. At decision step 804, the cryptographic recommendation analysis logic 304 determines whether the accessed expression can be evaluated deferred. This step may entail, for example, checking a deferred evaluation flag, which is associated with the expression by deferred evaluation analysis logic 302. Further to this example, if the deferred evaluation flag is equal to a true value (e.g., "true"), the expression may perform deferred evaluation, and if the deferred evaluation flag is equal to a false value (e.g., "false"), the expression may not perform deferred evaluation.
If the encryption recommendation analysis logic 304 determines that the accessed expression can be evaluated deferred, then no changes are made to the mapping of the set containing expressions, and the method ends at step 814 (after which another expression can be accessed if the bottom-up traversal of each of the application codes AST has not ended). If, however, the cryptographic recommendation analysis logic 304 determines that the accessed expression cannot be evaluated deferred, then control flows to decision step 806.
At decision step 806, the cryptographic recommendation analysis logic 304 determines whether evaluating the accessed expression requires an equality check on any of its sub-expressions. If the encryption recommendation analysis logic 304 determines that evaluating the accessed expression requires an equality check on any of its sub-expressions, then the encryption recommendation analysis logic 304 imposes the following constraints, as shown in step 808: the encryption scheme for the two sub-expressions must be the same and must not be stronger than deterministic encryption, after which the method ends at step 814.
However, if the encryption recommendation analysis logic 304 determines during decision step 806 that both sub-expressions of the accessed expression cannot be evaluated if they are deterministically encrypted, then control flows to decision step 810.
At decision step 810, the cryptographic recommendation analysis logic 304 determines whether the accessed expression is of a type that must be evaluated by a plaintext operation. If the cryptographic recommendation analysis logic 304 determines that the accessed expression is not of a type that must be evaluated by a plaintext operation, the method ends at step 814.
However, if the cryptographic recommendation analysis logic 304 determines during decision step 810 that the accessed expression is of a type that must be evaluated by a plaintext operation, then the cryptographic recommendation analysis logic 304 maps the set containing the expression into plaintext, as shown at step 812, after which the method ends at step 814.
The pseudo code of one implementation of the above encryption recommendation analysis method is represented as follows:
Figure BDA0001389546940000221
Figure BDA0001389546940000231
as can be seen from the pseudo code above, the analysis maintains a mapping (expressencmap) from the set of expressions in the application to the strongest encryption scheme. Initially, each column and variable in the application belongs to its own set and is mapped to the strongest encryption scheme (in this case randomizided encryption). The analysis also maintains a separate Set of expressions (Expression Set). The disjoint sets support three operations, namely, makeset (v), find (v), and Union (v1, v 2). Makeset (v) creates a new collection with only one element v. Find (v) returns a value representing a set containing the value v. Union (v1, v2) looks up the sets containing the values v1 and v2, respectively, and replaces them with the Union of the two sets. Each expression in each AST of the access application is analyzed.
If the expression can be deferred to the client (as indicated by the Isdeferred flag), no changes are made to the expressEncMAP. If an expression cannot be delayed, the analysis checks if the expression can be evaluated if both sub-expressions are deterministically encrypted. In this case, the analysis computes a union of sets containing two sub-expressions, and maps the resulting set to an encryption scheme that is at least as weak as deterministic encryption. If this is not the case, the analysis will check if the expression must be evaluated in clear data. In this case, the analysis maps the set containing the expression to ClearText.
Further explanation of the manner in which the encrypted recommendation analysis logic 304 operates will now be provided with respect to the example database tables and the example query. It is contemplated that the user has a table named "Employee" with the following columns and associated data types: "FirstName CHAR (100)", "LastName CHAR (100)", "SSN CHAR (10)", "Base _ Salary INT", and "Annual _ Bonus INT".
Now, further consider that the user has a workload that is executing (for simplicity) the following two sets of queries:
query set 1:
SELECT FirstName+LastName FROM Employee WHERE SSN=@SSN
query set 2:
DECLARE@base_salary int
DECLARE@annual_bonus int
SELECT@base_salary=Base_Salary FROM Employee
SELECT@annual_bonus=Annual_Bonus FROM Employee
SELECT@base_salary+@annual_bonus
where @ base _ salary and @ annular _ bones are local variables.
It is now envisioned that there are three possible encryption schemes that can be applied to data: randomized encryption, deterministic encryption, and additive encryption. Each of these encryption schemes is limited in the type of operations that can be performed on data encrypted using that scheme. If the encryption scheme is randomized encryption, no operation can be performed on the encrypted data other than storing and retrieving the data. If the encryption scheme is deterministic encryption, then the equality operation can only be performed on encrypted data, in addition to storing and retrieving the encrypted data. If the encryption scheme is addition encryption, only INTEGER (inter) addition can be performed on the encrypted data in addition to storing and retrieving the encrypted data. The data may also be represented in clear text, which supports all operations, but does not encrypt the data.
In this case, the inputs to the encrypted recommendation analysis logic 304 will include at least database schema information (e.g., tables, columns, and their data types) and user workload (e.g., all query and store processes that will run in the database).
The cryptographic recommendation analysis logic 304 will analyze the query set and identify constraints and dependencies between data elements. For query set 1, the constraints are as follows:
FirstName must support string collocation
LastName must support string collocation
SSN must support an equation
For query set 1, the dependencies are as follows:
the FirstName and LastName must have the same encryption scheme and key
The SSN and the @ SSN must have the same encryption scheme and key
For query set 2, the constraints are as follows:
@ base _ salary and @ annular _ bones must support integer addition
For query set 2, the dependencies are as follows:
@ Base _ Salary must have the same encryption scheme and key as the column Base _ Salary
@ Annual _ Bonus must have the same encryption scheme and key as the column Annual _ Bonus
@ base _ salary and @ annual _ bones must have the same encryption scheme.
Once the encrypted recommendation analysis logic 304 has collected all of the above information from the user's workload, it will operate to identify solutions that meet all of the above requirements. This may be achieved using an analysis such as that described above with reference to fig. 7 and 8.
In accordance with such analysis, the cryptographic recommendation analysis logic 304 adds all expressions that have dependencies on each other to the same set. The encryption recommendation analysis logic 304 maps each set to an encryption scheme corresponding to the requirements that satisfy the current analysis. The encryption recommendation analysis logic 304 starts with no sets, and creates a set when a new expression is found. The encryption recommendation analysis logic 304 initially maps each new set to the strongest possible encryption scheme.
Following the example set forth above, the cryptographic recommendation analysis logic 304 begins processing constraints and dependencies. For query set 1, the encryption recommendation analysis logic 304 creates a FirstName set and maps it to the strongest encryption scheme (i.e., randomized encryption). The encryption recommendation analysis logic 304 then applies the constraint that the FirstName must support string concatenation, and considering that no encryption scheme that supports string concatenation is supported, the encryption recommendation analysis logic 304 downgrades the recommended encryption scheme to plaintext.
The encryption recommendation analysis logic 304 also creates a LastName set and maps it to the strongest encryption scheme (i.e., randomized encryption). The encryption recommendation analysis logic 304 then applies the constraint that LastName must support string concatenation, and given that no encryption scheme that supports string concatenation is supported, the encryption recommendation analysis logic 304 downgrades the recommended encryption scheme for the set to plaintext.
The encryption recommendation analysis logic 304 also creates a set of SSNs and maps them to the strongest encryption scheme (i.e., randomized encryption). The encryption recommendation analysis logic 304 then applies the constraint that the SSN must support an equation and downgrades the recommended encryption scheme for the set to deterministic based on the constraint. The cryptographic recommendation analysis logic 304 also adds the @ SSN to the set comprising SSNs.
For query set 2, the encryption recommendation analysis logic 304 creates a set of Base _ salt and maps it to the strongest encryption scheme (i.e., randomized encryption). The encrypted recommendation analysis logic 304 also adds @ Base _ Salary to the set containing Base _ Salary.
The encryption recommendation analysis logic 304 also creates a set of Annual _ bones and maps it to the strongest encryption scheme (i.e., randomized encryption). The encrypted recommendation analysis logic 304 also adds @ Annual _ Bonus to the collection containing Annual _ Bonus.
Because there is an addition, the encryption recommendation analysis logic 304 concatenates the set comprising @ annual _ bones and @ base _ salary and downgrades the encryption scheme of the resulting set to an addition in order to support the addition.
Since additional queries may impose additional constraints, it is possible that if another query requires an equation on Base _ Salary, then a set containing Base _ Salary (which to date has been { Base _ Salary, @ Base _ Salary, Annual _ bones, @ Annual _ bones }) may require a support equation. However, since a single encryption scheme cannot be used to support both equations and additions, it may be desirable for the encryption recommendation analysis logic 304 to downgrade the set to plaintext.
In an embodiment, the output of the encryption recommendation analysis includes an identification of each of a plurality of data elements (e.g., columns) referenced by application code 310 and an indication of whether each data element may be encrypted. If the data element can be encrypted, the output can also include a recommended encryption scheme for the data element. The recommended encryption scheme may include the highest level of encryption that may be applied to the data elements without changing the semantics of the program. In some instances, an indication that two or more data elements must be encrypted using the same encryption algorithm and the same key may also be provided.
C. Explain recommendation analysis
In addition to making encryption recommendations, it may be considered important to keep the user aware of why code analysis tool 300 made certain encryption recommendations for certain data elements. For example, if code analysis tool 300 recommends that a column cannot be encrypted, it may be useful to also provide a reason for the column not being encrypted. This information may help application developers and publishers identify portions of code that need to be reconstructed so that data elements can be encrypted using a stronger form of encryption. To this end, the interpretation recommendation analysis logic 306 extends the analysis performed by the encrypted recommendation analysis logic 304 to generate a set of interpretations for each encrypted recommendation.
Of course, the encryption recommendation may be output without providing any associated interpretation. For this reason, the explain recommendation analysis logic 306 may be considered optional. For example, in some embodiments, the interpretive recommendation analysis logic 306 may be disabled or simply not included within the code analysis tool 300.
One manner in which the explanation recommendation analysis may be performed will now be described with reference to flowchart 900 of FIG. 9. The method of flowchart 900 is described herein by way of illustration only. Other methods, including variations of the method of flowchart 900, may be used to generate an interpretation of encrypted recommendations made to data elements within application code 310. For illustrative purposes, the method of flowchart 900 will now be described with continued reference to code analysis tool 300 of FIG. 3, however, the method is not limited to this embodiment.
The analysis is based on logic for performing cryptographic recommendation analysis. In particular, as shown at step 902, while the encryption recommendation analysis logic 304 traverses the AST representing the application code 310 from bottom to top in accordance with generating encryption recommendations, for each expression in a given AST, the interpretation recommendation analysis logic 306 tracks dependencies with other expressions along with operations that produce the dependencies. Indications of the occurrence of such operations in the application code may also be tracked. For example, for the predicate "a ═ b", the explain recommendation analysis logic 306 may track the following two dependencies:
(a,b,Equality,"a=b","Procedure X Line Y Column Z")
(b,a,Equality,"a=b","Procedure X Line Y Column Z")。
as shown in step 904, the explain recommendation analysis logic 306 represents the dependencies as a dependency graph, where each vertex represents an expression in the AST representation of the application code and each edge represents a dependency between them.
Next, at step 906, for each data element of interest (e.g., for each database column), the interpretive recommendation analysis logic 306 identifies a subgraph of the dependency graph that (a) contains vertices for the data element of interest, and (b) whose operations collectively justify the encrypted recommendations for the data element of interest. For example, a subgraph that may explain why column X cannot be encrypted may be as follows:
side 1: column X is used in the equation with column Y
And (2) side: column X is used in addition with column Z
In one embodiment, explain recommendation analysis logic 306 computes all possible subgraphs for each column in the database. Step 906 may be performed by enumerating all possible subgraphs of the column and identifying subgraphs that explain the recommendations for the column. In general, the number of subgraphs is an index (of all possible combinations) of the number of edges of the graph. Therefore, brute force enumeration may be costly. To address this issue, in one embodiment, the interpretation recommendation analysis logic 306 systematically enumerates subgraphs from smaller graphs to larger graphs, stopping once a valid interpretation is found. One particular implementation of the algorithm for performing step 906 is formally described below.
Figure BDA0001389546940000281
Figure BDA0001389546940000291
Figure BDA0001389546940000301
The algorithm terminates because it will never cycle periodically. The algorithm adds only edges that do not belong to a part of the existing graph.
Furthermore, the above algorithm finds a solution with only the required edges. In other words, if knowledge has been found (e.g., if addition and equality operations are found for the same data element, and there is no encryption scheme that supports both types of operations), the algorithm will not continue to add more edges because: (1) the algorithm starts with a 0 length and grows to an i +1 size only after all i-size maps have been processed; (2) once a solution is found, the algorithm stops expanding the graph; and (3) if the graph is a hypergraph of a solution, then the algorithm immediately excludes it. Reference point (3), since the size of the graph grows without shrinking, ensures that the algorithm has found any subgraph as a solution before processing the hypergraph.
Additionally, the algorithm described above does not return a copy because the algorithm performs a check to determine if it has processed the graph.
Further explanation of the analysis performed by the explain recommendation analysis logic 306 will now be provided with reference to the example table "Employee", the example query set 1, and the example query set 2 introduced in the previous section.
As noted above, the interpretation recommendation analysis creates a dependency graph using the constraints and dependencies thus collected. All expressions contain vertices of the dependency graph, while all edges contain dependencies. Further, edges may have attributes that define constraints. Referring to the example table "Employee", the example query set 1, and the example query set 2, the dependency graph will have the following vertices:
{FirstName,LastName,SSN,@SSN,Base_Salary,@base_salary,Annual_Bonus,@annual_bonus}
furthermore, the dependency graph may have the following:
(1) FirstName connects to LastName with the attribute "plaintext" (since there is no encryption scheme to support string concatenation)
(2) SSN is linked to @ SSN and has the attribute "deterministic"
(3) Base _ salt is connected to @ Base _ salt and has no property, since this is a simple assignment
(4) Annual _ Bonus is connected to @ Annual _ Bonus and has no attribute, since this is a simple assignment
(5) @ base _ salary is connected to @ annual _ bones and the attribute is "add"
Let us now assume that there is also a query that performs an equation between Annual _ Bonus and Base _ Salary. This may require both columns to be represented in plaintext form because there is no encryption scheme that can support both equations and additions (which is required for query set 2). This may add one side:
(6) base _ salary is connected to Annual _ bones, with the attribute "deterministic".
The explain recommendation analysis logic 306 operates to provide information to the user that helps explain why the Base _ sales and Annual _ bones must be represented in clear text. By doing so, the interpretive recommendation analysis logic 306 enables the user to change his application code (e.g., his query) to address this limitation.
Let us take Base _ sales as an example. The algorithm described above starts with a graph that contains only the vertex and no edges, and starts appending edges that contain at least one node in the current tree. In this case, the eligible edges are 3 and 6. Thus, two new graphs are generated. The graph with edge 3 does not have any constraints yet, while the graph with edge 6 has deciteristic as its constraint.
This process is continuously recursive. In this case, since the graph is small, things are simple and the algorithm will find the "interpretation" quickly. For the first of the two graphs generated, the eligible edges are 5 and 6. For the second of the two graphs generated, the eligible edges are 3 and 4. Once the algorithm finds a graph containing edges with constraints that result in plaintext, the algorithm solves the "interpretation" problem. In this case, interesting subgraphs contain edges (3, 4, 5 and 6). Edges have both additive constraints through intermediate variables and also equality constraints. Since the algorithm constructs the graph by incrementally adding more edges, it can be guaranteed that a solution with fewer edges is found first.
According to the previous example, the interpretation provided to the user is basically a sub-graph identified by the algorithm described above:
base _ Salary requires deterministic encryption because it involves an equality operation with Annual _ Bonus
Base _ Salary must have the same encryption as @ Base _ Salary
@ base _ salary requires addition encryption because it involves an addition operation
All in all, these requirements result in a recommendation for plaintext, since this is the only encryption scheme that can satisfy all requirements. To encrypt the Base _ salt, the user must change his application code (in this case, for his query) in order to remove some of these constraints or dependencies.
The manner and form in which such an interpretation is provided to the user may vary from implementation to implementation. For example, in one embodiment, the output may include each data element of interest, the encryption scheme recommended for that data element (or, if encryption is not recommended, plaintext), and the operation that caused the encryption scheme recommendation. Still further, the output may include an identification of the location or occurrence of such an operation within the application code. Other forms of presentation may also be used.
D. Alternative embodiments
In an alternative embodiment, the input to code analysis tool 300 also includes an identification of the data elements referenced by application code 310 (e.g., columns) that a user of code analysis tool 300 (e.g., an application developer or publisher) believes are sensitive and/or want encryption applied thereto. The foregoing algorithm will perform in much the same manner as outlined above, except that at the end of the analysis the code analysis tool 300 will output encryption recommendations and associated interpretations 330 for only those data items specified by the user. The encryption recommendations and associated interpretations 330 may also include encryption recommendations and interpretations for other data elements in the same encryption set as the user-specified data element, as they will have a dependency on the user-specified data element. All other data elements will not be reported, that is, the encrypted recommendation and associated interpretation 330 will not include information about other data elements except the user-specified data element and the data elements having dependencies therewith.
Such an implementation advantageously enables users to focus analysis only on data elements that are of interest to them. The provided output may enable the user to determine whether each of the user-specified data elements may be encrypted, the highest level of encryption that may be applied thereto, and an explanation for each recommendation. Such interpretation may help the user determine how best to modify the application code to increase the encryption level that may be applied to one or more of the user-specified data elements.
Example computer System implementation
Fig. 10 depicts an example processor-based computer system 1000 that may be used to implement various embodiments described herein. For example, system 1000 may be used to implement end-user terminal 104 as described above with reference to fig. 1-31-104NMiddle tier application server 106, database server 108, end user terminal 2041-204NAny of middle tier application server 206, database server 208, and code analysis tool 300. The system 1600 may also be used to implement any or all of the steps of any or all of the flowcharts depicted in fig. 4-9. A description of the system 1000 provided herein isAre provided for purposes of illustration and are not intended to be limiting. As will be appreciated by one skilled in the relevant art, embodiments may be implemented in other types of computer systems.
As shown in FIG. 10, the system 1000 includes a processing unit 1002, a system memory 1004, and a bus 1006 that couples various system components including the system memory 1004 to the processing unit 1002. The processing unit 1002 may include one or more microprocessors or microprocessor cores. Bus 1006 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. The system memory 1004 includes Read Only Memory (ROM)1008 and Random Access Memory (RAM) 1010. A basic input/output system 1012(BIOS) is stored in ROM 1008.
The system 1000 also has one or more of the following drivers: a hard disk drive 1014 for reading from and writing to a hard disk, a magnetic disk drive 1016 for reading from or writing to a removable magnetic disk 1018, and an optical disk drive 1020 for reading from or writing to a removable optical disk such as a CD ROM, DVD ROM, BLU-RAYTMRemovable optical disk 1022, such as a disk or other optical media, reads from or writes to. The hard disk drive 1014, magnetic disk drive 1016 and optical disk drive 1020 are connected to the bus 1006 by a hard disk drive interface 1024, a magnetic disk drive interface 1026 and an optical drive interface 1028, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer. Although a hard disk, a removable magnetic disk and a removable optical disk are described, other types of computer readable storage devices and storage structures can be used to store data, such as flash memory cards, digital video disks, Random Access Memories (RAMs), Read Only Memories (ROMs), and the like.
Several program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These program modules include an operating system 1030, one or more application programs 1032, other program modules 1034 andprogram data 1036. According to various embodiments, the program modules may include computer program logic executable by the processing unit 1002 to perform the end-user terminal 104 described above with reference to fig. 1-31-104NMiddle tier application server 106, database server 108, end user terminal 2041-204NAny or all of the functions and features of middle tier application server 206, database server 208, and code analysis tool 300. The program modules may also include computer program logic that, when executed by the processing unit 1002, performs any of the steps or operations shown or described with reference to the flowcharts of fig. 4-9.
A user may enter commands and information into the system 1000 through input devices such as a keyboard 1038 and pointing device 1040 (e.g., a mouse). Other input devices (not shown) may include a microphone, joystick, game controller, scanner, or the like. In one embodiment, a touch screen is provided in conjunction with the display 1044 to allow a user to provide user input via a touch application (e.g., by a finger or stylus) to one or more points on the touch screen. These and other input devices are often connected to the processing unit 1002 through a serial port interface 1042 that is coupled to bus 1006, but may be connected by other interfaces, such as a parallel port, game port, or a Universal Serial Bus (USB). Such an interface may be a wired interface or a wireless interface.
A display 1044 is connected to bus 1006 via an interface, such as a video adapter 1046. In addition to the display 1044, the system 1000 may include other peripheral output devices (not shown), such as speakers and printers.
System 1000 is connected to a network 1048 (e.g., a local area network or wide area network such as the internet) through a network interface or adapter 1050, a modem 1052, or other suitable means for establishing communications over the network. The modem 1052, which may be internal or external, is connected to the bus 1006 via the serial port interface 1042.
As used herein, the terms "computer program medium," "computer-readable medium," and "computer-readable storage medium" are used to generally refer to storage devices or storage structures, such as the hard disk associated with hard disk drive 1014, removable magnetic disk 1018, removable optical disk 1022; and other memory devices or storage structures such as flash memory cards, digital video disks, Random Access Memories (RAMs), Read Only Memories (ROMs), and the like. Such computer-readable storage media are distinct from and non-overlapping with (and do not include) communication media. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media. Embodiments are also directed to such communication media.
As noted above, computer programs and modules (including application programs 1032 and other program modules 1034) may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received via network interface 1050, serial port interface 1042, or any other interface type. Such computer programs, when executed or loaded by an application, enable system 1000 to implement features of embodiments of the present invention discussed herein. Thus, such computer programs represent controllers of the system 1000.
Embodiments are also directed to computer program products comprising software stored on any computer usable medium. Such software, when executed in one or more data processing devices, causes the data processing devices to operate as described herein. Embodiments of the present invention employ any computer-usable or computer-readable medium, known now or in the future. Examples of computer-readable media include, but are not limited to, memory devices and storage structures such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, magnetic tape, magnetic storage devices, optical storage devices, MEMs, nanotechnology-based storage devices, and the like.
In alternative implementations, the system 1000 may be implemented as hardware logic/circuitry or firmware. According to further embodiments, one or more of these components may be implemented in a system on chip (SoC). The SoC may include an integrated circuit chip that includes one or more of the following to perform its functions: a processor (e.g., a microcontroller, microprocessor, Digital Signal Processor (DSP), etc.), memory, one or more communication interfaces, and/or other circuitry and/or embedded firmware.
Additional exemplary embodiments
An apparatus is described herein that includes one or more processors and one or more memory devices connected to the one or more processors. The one or more memory devices store computer program logic that is executable by the one or more processors. The computer program logic includes encryption recommendation analysis logic operable, when executed by the one or more processors, to receive the application code and the database schema information, to analyze the application code and the database schema information to identify a recommended encryption scheme for each of the one or more data elements referenced by the application code, and to output the identification of each of the one or more data elements and the recommended encryption scheme associated therewith.
In one embodiment of the apparatus described above, the encryption recommendation analysis logic is operable to identify, from a predefined set of encryption schemes, a strongest encryption scheme that can be applied to each of the one or more data elements without altering semantics of the application code.
In another embodiment of the above apparatus, the encryption recommendation analysis logic is operable to select a recommended encryption scheme for each of the one or more data elements from a set of encryption schemes, the encryption scheme including one or more of: a randomized encryption scheme; a deterministic encryption scheme; and in the clear.
In a further embodiment of the apparatus above, the application code includes one or more of: one or more queries; and one or more stored procedures.
In yet another embodiment of the apparatus above, the encryption recommendation analysis is operable to: assigning each data element in the application code to its own set and mapping each set to the strongest encryption scheme from the set of predefined encryption schemes; traversing each of the one or more abstract syntax trees in the abstract syntax tree representation of the application code to access each of the one or more expressions included therein; and for each expression accessed, determining the strongest encryption scheme from a predefined set of encryption schemes that can be applied to data operated on by the expression, and modifying the set and mapping accordingly.
In another embodiment of the apparatus above, the computer program logic further comprises deferred evaluation analysis logic that, when executed by the one or more processors, is operable to identify expressions within the application code that may be deferred by the database server to the client. In accordance with such embodiments, the encryption recommendation analysis logic is operable to identify a recommended encryption scheme for each of the one or more data elements by considering expressions within application code that may be delayed by the database server to the client.
Further, in accordance with embodiments consistent with this disclosure, the delayed evaluation analysis logic may be operative to: converting the application code into one or more abstract syntax trees, each abstract syntax tree comprising one or more expressions; associating a deferred evaluation flag with each expression in each of the one or more abstract syntax trees and initializing each deferred evaluation flag to a true value; and iteratively performing a top-down pass on each of the one or more abstract syntax trees to find variables that must be evaluated early, and setting the deferred evaluation flag for each such variable to a false value until no new variables are found that must be evaluated early.
Further, according to such embodiments, the deferred evaluation analysis logic may be operated to perform a top-down pass on each of the one or more abstract syntax trees to discover variables that must be evaluated early by: setting a deferred evaluation flag to a false value for all expressions that cannot be deferred evaluated; propagating the value of the deferred evaluation flag having the parent expression with the deferred evaluation flag having a false value to the child expression having the deferred evaluation flag having the true value; and updating the set of variables that must be evaluated early. Setting the deferred evaluation flag to false values for all expressions that cannot be deferred evaluated may include setting the deferred evaluation flag to false values for one or more of the following expression types: expressions with side effects; expressions that are considered too costly to push to the customer; a control flow statement; and expressions assigned to variables that must be evaluated early.
In yet another embodiment of the aforementioned apparatus, the computer program logic further comprises interpretation analysis logic operable, when executed by the one or more processors, to generate an interpretation regarding a reason for identifying each recommended encryption scheme for each of the one or more data elements.
According to such embodiments, the interpretation may include identifying one or more operations in the application code that result in identification of a recommended encryption scheme for each of the one or more data elements. The interpreting may further include identifying a location within the application code where each of the one or more operations is found.
Further, in accordance with such embodiments, the interpretation recommendation analysis logic may be operative to: while performing a bottom-up traversal of each of the one or more expressions in each of the one or more abstract syntax trees that access one or more abstract syntax trees that represent application code, for each expression accessed, tracking dependencies with other expressions and one or more operations that result in the dependencies; representing the dependencies as a dependency graph having one or more vertices and one or more edges, wherein each vertex in the dependency graph represents an expression in the abstract syntax tree and each edge in the dependency graph represents a dependency between them; and for each of the one or more data elements, identifying one or more subgraphs of the dependency graph that contain the data element and whose operation evidences identification of a recommended encryption scheme for the data element. The interpretive recommendation analysis logic may be operative to identify one or more subgraphs in a dependency graph containing data elements, and its operation proves identification of a recommended encryption scheme for the data elements by systematically enumerating the subgraphs of the dependency graph from smaller subgraphs to larger subgraphs and stopping once one or more subgraphs are identified whose operation proves identification of the recommended encryption scheme for the data elements are identified.
Also described herein is a computer-implemented method for analyzing application code to recommend encryption of data elements that it references without affecting program semantics. The method comprises the following steps: receiving application codes and database schema information; analyzing the application code and the database schema information to identify a recommended encryption scheme for each of the one or more data elements referenced by the application code; and outputting the identification of each of the one or more data elements and the recommended encryption scheme associated therewith in a form viewable by the user.
In one embodiment of the above method, analyzing the application code and the database schema information to identify a recommended encryption scheme for each of the one or more data elements referenced by the application code comprises: the application code and the database schema information are analyzed to identify a strongest encryption scheme that may be applied to each of the one or more data elements from the predefined set of encryption schemes without altering the semantics of the application code.
In another embodiment, the aforementioned method further comprises: identifying an expression within application code that may be delayed by a database server to a client, wherein analyzing the application code and database schema information to identify a recommended encryption scheme for each of one or more data elements comprises: consider an expression within application code that may be delayed by a database server to a client.
In yet another embodiment, the aforementioned method further comprises: an explanation is generated regarding a reason for identifying a recommended encryption scheme for each of the one or more data elements, and each explanation is output in a form viewable by a user. Generating the interpretation may include: an identification of one or more operations in the application code is generated that result in an identification of a recommended encryption scheme for each of the one or more data elements.
A computer program product is also described herein. The computer program product includes a computer-readable memory having computer program logic recorded thereon that, when executed by at least one processor, causes the at least one processor to perform a method comprising: receiving an application code; analyzing at least the application code to select a recommended encryption scheme for each of the one or more data elements referenced by the application code and generating an interpretation of a reason for selecting the recommended encryption scheme for each of the one or more data elements, each interpretation including an identification of one or more operations in the application code; and outputting the identification and recommended encryption scheme for each of the one or more data elements and the interpretation associated therewith.
Conclusion V
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (20)

1. An electronic device, comprising:
one or more processors; and
one or more memory devices connected to the one or more processors, the one or more memory devices storing computer program logic executable by the one or more processors, the computer program logic comprising:
cryptographic recommendation analysis logic operable when executed by the one or more processors to receive application code and database schema information associated with a database, the database schema information including information about a logical organization or structure of the database, and from which a dependency relationship between one or more data elements stored in the database that are referenced by the application code may be determined; analyzing the application code and the database schema information to identify a recommended encryption scheme for each of the one or more data elements by:
assigning each data element in the application code to its own set and mapping each set to a strongest encryption scheme of a set of predefined encryption schemes, the assigning based at least in part on the determined dependency between the one or more data elements,
traversing each of the one or more abstract syntax trees in the abstract syntax tree representation of the application code to access each of the one or more expressions included therein, an
For each expression accessed, determining the strongest encryption scheme from the set of predefined encryption schemes that can be applied to the data operated on by that expression, and modifying the set and mapping accordingly, an
Outputting an identification of each of the one or more data elements and the recommended encryption scheme associated therewith.
2. The electronic device of claim 1, wherein the encryption recommendation analysis logic is operable to identify a strongest encryption scheme from the set of predefined encryption schemes that can be applied to each of the one or more data elements without altering semantics of the application code.
3. The electronic device of claim 1, wherein the encryption recommendation analysis logic is operable to select the recommended encryption scheme for each of the one or more data elements from the predefined set of encryption schemes, the set of encryption schemes including one or more of:
a randomized encryption scheme;
a deterministic encryption scheme; and
in plain text.
4. The electronic device of claim 1, wherein the application code comprises one or more of:
one or more queries; and
one or more stored procedures.
5. The electronic device of claim 1, wherein the computer program logic further comprises:
deferred evaluation analysis logic operable when executed by the one or more processors to identify expressions within the application code that can be deferred to clients by a database server;
wherein the encryption recommendation analysis logic is operable to identify the recommended encryption scheme for each of the one or more data elements by considering the expression within the application code that can be deferred to the client by the database server.
6. The electronic device of claim 5, wherein the deferred evaluation analysis logic is operable to:
converting the application code into the one or more abstract syntax trees;
associating a deferred evaluation flag with each expression of each of the one or more abstract syntax trees and initializing each deferred evaluation flag to a true value; and
iteratively performing a top-down pass on each of the one or more abstract syntax trees to find variables that must be evaluated early, and setting the deferred evaluation flag for each such variable to a false value until no new variables that must be evaluated early are found, wherein the variables that must be evaluated early include variables that cannot be evaluated via deferred evaluation.
7. The electronic device of claim 6, wherein the deferred evaluation analysis logic is operable to perform a top-down pass on each of the one or more abstract syntax trees to discover variables that must be evaluated early by:
setting the deferred evaluation flag to the false value for all expressions for which evaluation cannot be deferred;
propagating the value of the deferred evaluation flag having a parent expression with the deferred evaluation flag of the false value to a child expression having the deferred evaluation flag with the true value; and
the set of variables that must be evaluated early is updated.
8. The electronic device of claim 7, wherein setting the deferred evaluation flag to the false value for all expressions whose evaluation cannot be deferred comprises: setting the deferred evaluation flag to the false value for one or more of the following expression types:
expressions with side effects;
expressions that are considered too costly to push to the customer;
a control flow statement; and
expressions assigned to variables that must be evaluated early.
9. The electronic device of claim 1, wherein the computer program logic further comprises:
interpreting recommendation analysis logic operable, when executed by the one or more processors, to generate an interpretation of a reason for each recommended encryption scheme identified for each of the one or more data elements.
10. The electronic device of claim 9, wherein the interpretation includes an identification of one or more operations in the application code that result in the identification of the recommended encryption scheme for each of the one or more data elements.
11. The electronic device of claim 10, wherein the interpretation further includes an identification of a location within the application code at which each of the one or more operations was found.
12. The electronic device of claim 9, wherein the interpretation recommendation analysis logic is operable to:
while accessing a bottom-up traversal of each of one or more expressions in each of the one or more abstract syntax trees representing the application code is performed, for each accessed expression, tracking dependencies with other expressions and one or more operations that result in the dependencies;
representing the dependencies as a dependency graph having one or more vertices and one or more edges, wherein each vertex in the dependency graph represents an expression in the abstract syntax tree and each edge in the dependency graph represents a dependency between them; and
for each of the one or more data elements, identifying one or more subgraphs of the dependency graph that contain the data element and whose operation evidences the identification of the recommended encryption scheme for the data element.
13. The electronic device of claim 12, wherein the interpretation recommendation analysis logic is operable to identify the one or more subgraphs of the dependency graph that contain the data element and whose operation attests to the identification of the recommended encryption scheme for the data element by:
enumerating subgraphs of the dependency graph systematically from smaller to larger subgraphs and stopping once the one or more subgraphs whose operations justify the identification of the recommended encryption scheme for the data element are identified.
14. A computer-implemented method for analyzing application code by analyzing the application code to recommend encryption of data elements referenced by the application code without affecting program semantics, the method comprising:
receiving application code and database schema information associated with a database, the database schema information including information about a logical organization or structure of the database, and from which dependencies between one or more data elements stored in the database that are referenced by the application code may be determined; analyzing the application code and the database schema information to identify a recommended encryption scheme for each of one or more data elements by:
assigning each data element in the application code to its own set and mapping each set to a strongest encryption scheme of a set of predefined encryption schemes, the assigning based at least in part on the determined dependency between the one or more data elements,
traversing each of the one or more abstract syntax trees in the abstract syntax tree representation of the application code to access each of the one or more expressions included therein, an
For each expression accessed, determining the strongest encryption scheme from the set of predefined encryption schemes that can be applied to the data operated on by that expression, and modifying the set and mapping accordingly; and
outputting, in a user-viewable form, an identification of each of the one or more data elements and the recommended encryption scheme associated therewith.
15. The method of claim 14, wherein analyzing the application code and the database schema information to identify the recommended encryption scheme for each of the one or more data elements comprises:
analyzing the application code and the database schema information to identify, from the set of predefined encryption schemes, a strongest encryption scheme that can be applied to each of the one or more data elements without altering the semantics of the application code.
16. The method of claim 14, further comprising:
identifying an expression within the application code that can be deferred to a client by a database server;
wherein analyzing the application code and the database schema information to identify a recommended encryption scheme for each of the one or more data elements comprises: consider the expression within the application code that can be deferred to the client by the database server.
17. The method of claim 14, further comprising:
generating an explanation regarding a reason for each recommended encryption scheme identified for each of the one or more data elements; and
each interpretation is output in a form viewable by the user.
18. The method of claim 17, wherein generating the interpretation comprises:
generating an identification of one or more operations in the application code that result in the identification of the recommended encryption scheme for each of the one or more data elements.
19. A computer-readable storage medium, on which a computer program is stored, which, when executed by at least one processor, causes the at least one processor to perform a method comprising:
receiving application code and database schema information associated with a database, the database schema information including information about a logical organization or structure of the database, and from which dependencies between one or more data elements stored in the database that are referenced by the application code may be determined;
analyzing the application code and the database schema information to:
selecting a recommended encryption scheme for each of the one or more data elements by:
assigning each data element in the application code to its own set and mapping each set to a strongest encryption scheme of a set of predefined encryption schemes, the assigning based at least in part on the determined dependency between the one or more data elements,
traversing each of the one or more abstract syntax trees in the abstract syntax tree representation of the application code to access each of the one or more expressions included therein, an
For each expression accessed, determining the strongest encryption scheme from the set of predefined encryption schemes that can be applied to the data operated on by that expression, and modifying the set and mapping accordingly, an
Generating a user-visible explanation as to why the recommended encryption scheme was selected for each of the one or more data elements; and
outputting an identification of each of the one or more data elements and the recommended encryption scheme associated therewith along with the description.
20. The computer-readable storage medium of claim 19, wherein analyzing the application code and the database schema information comprises:
analyzing the application code and the database schema information to select a strongest encryption scheme from the set of predefined encryption schemes that can be applied to each of the one or more data elements without altering semantics of the application code.
CN201680012395.4A 2015-02-27 2016-02-25 Code analysis tool for recommending data encryption without affecting program semantics Active CN107409040B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/633,596 2015-02-27
US14/633,596 US9860063B2 (en) 2015-02-27 2015-02-27 Code analysis tool for recommending encryption of data without affecting program semantics
PCT/US2016/019433 WO2016138188A1 (en) 2015-02-27 2016-02-25 Code analysis tool for recommending encryption of data without affecting program semantics

Publications (2)

Publication Number Publication Date
CN107409040A CN107409040A (en) 2017-11-28
CN107409040B true CN107409040B (en) 2020-09-18

Family

ID=55527653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680012395.4A Active CN107409040B (en) 2015-02-27 2016-02-25 Code analysis tool for recommending data encryption without affecting program semantics

Country Status (3)

Country Link
US (1) US9860063B2 (en)
CN (1) CN107409040B (en)
WO (1) WO2016138188A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10248668B2 (en) * 2016-07-18 2019-04-02 International Business Machines Corporation Mapping database structure to software
US10885157B2 (en) 2017-04-03 2021-01-05 International Business Machines Corporation Determining a database signature
CN107257499B (en) * 2017-07-21 2018-09-18 安徽大学 Method for secret protection and video recommendation method in a kind of video recommendation system
CN108154040A (en) * 2017-12-25 2018-06-12 杭州闪捷信息科技有限公司 Database table encipher-decipher method, device and realization device based on Job
US10846083B2 (en) * 2018-12-12 2020-11-24 Sap Se Semantic-aware and self-corrective re-architecting system
CN112230781B (en) * 2019-07-15 2023-07-25 腾讯科技(深圳)有限公司 Character recommendation method, device and storage medium
US11194838B2 (en) * 2019-10-23 2021-12-07 International Business Machines Corporation Generating a data partitioning strategy for secure and efficient query processing
CN113434535B (en) * 2021-08-25 2022-03-08 阿里云计算有限公司 Data processing method, communication system, device, product and storage medium
US20230112179A1 (en) * 2021-10-08 2023-04-13 Ab Initio Technology Llc Automated modification of computer programs

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8082144B1 (en) * 2006-05-22 2011-12-20 Intuit Inc. Tax calculation explanation generator
CN104217169A (en) * 2013-06-05 2014-12-17 腾讯科技(深圳)有限公司 Encryption recommendation method and device and terminal

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7237125B2 (en) 2000-08-28 2007-06-26 Contentguard Holdings, Inc. Method and apparatus for automatically deploying security components in a content distribution system
US7418600B2 (en) 2003-03-13 2008-08-26 International Business Machines Corporation Secure database access through partial encryption
US20060210071A1 (en) 2005-03-16 2006-09-21 Chandran Gayathiri R Encryption of security-sensitive data
US8200972B2 (en) 2005-03-16 2012-06-12 International Business Machines Corporation Encryption of security-sensitive data by re-using a connection
US20070005594A1 (en) * 2005-06-30 2007-01-04 Binyamin Pinkas Secure keyword search system and method
EP1934713A4 (en) 2005-09-26 2009-04-22 Safenet Inc System and method for protecting sensitive data
US8261240B2 (en) 2008-01-15 2012-09-04 Microsoft Corporation Debugging lazily evaluated program components
US8199911B1 (en) 2008-03-31 2012-06-12 Symantec Operating Corporation Secure encryption algorithm for data deduplication on untrusted storage
US20090327943A1 (en) * 2008-06-26 2009-12-31 Microsoft Corporation Identifying application program threats through structural analysis
US20120096281A1 (en) 2008-12-31 2012-04-19 Eszenyi Mathew S Selective storage encryption
US10496824B2 (en) 2011-06-24 2019-12-03 Microsoft Licensing Technology, LLC Trusted language runtime on a mobile platform
US9087212B2 (en) 2012-01-25 2015-07-21 Massachusetts Institute Of Technology Methods and apparatus for securing a database
US8819770B2 (en) 2012-10-04 2014-08-26 Microsoft Corporation Data mapping using trust services
US9111071B2 (en) * 2012-11-05 2015-08-18 Sap Se Expression rewriting for secure computation optimization
GB2509709A (en) 2013-01-09 2014-07-16 Ibm Transparent encryption/decryption gateway for cloud storage services
US9747456B2 (en) 2013-03-15 2017-08-29 Microsoft Technology Licensing, Llc Secure query processing over encrypted data
US10162858B2 (en) * 2013-07-31 2018-12-25 Sap Se Local versus remote optimization in encrypted query processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8082144B1 (en) * 2006-05-22 2011-12-20 Intuit Inc. Tax calculation explanation generator
CN104217169A (en) * 2013-06-05 2014-12-17 腾讯科技(深圳)有限公司 Encryption recommendation method and device and terminal

Also Published As

Publication number Publication date
US20160254911A1 (en) 2016-09-01
WO2016138188A1 (en) 2016-09-01
CN107409040A (en) 2017-11-28
US9860063B2 (en) 2018-01-02

Similar Documents

Publication Publication Date Title
CN107409040B (en) Code analysis tool for recommending data encryption without affecting program semantics
US10540383B2 (en) Automatic ontology generation
KR102054568B1 (en) Filtering Data Schematic Diagram
US11170125B2 (en) Data overlap count in a multiple tenant database system
US10592672B2 (en) Testing insecure computing environments using random data sets generated from characterizations of real data sets
US20140365527A1 (en) Secure access to hierarchical documents in a sorted, distributed key/value data store
US10789295B2 (en) Pattern-based searching of log-based representations of graph databases
US20130133075A1 (en) Fixing security vulnerability in a source code
CN107683481B (en) Computing encrypted data using delayed evaluation
US20170212945A1 (en) Branchable graph databases
US20150012529A1 (en) Pivot facets for text mining and search
US20220100899A1 (en) Protecting sensitive data in documents
WO2019017997A1 (en) Distributed graph database writes
US20160246705A1 (en) Data fabrication based on test requirements
Johnson et al. Chorus: a programming framework for building scalable differential privacy mechanisms
US20180089252A1 (en) Verifying correctness in graph databases
US20200097615A1 (en) Difference-based comparisons in log-structured graph databases
Souza et al. Provenance of dynamic adaptations in user-steered dataflows
US11157495B2 (en) Dynamically managing predicate expression columns in an encrypted database
Stamatogiannakis et al. Prov 2r: practical provenance analysis of unstructured processes
Wang et al. A differential privacy protection query language for medical data: a proof-of-concept system validation
Norrman Anonymization of production data to create functioning and useful test environment data
WO2023044445A1 (en) Metadata-driven data ingestion
Stamatogiannakis et al. VU Research Portal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant