WO2018103830A1 - A method and system for searchable encrypted cloud storage of media data - Google Patents

A method and system for searchable encrypted cloud storage of media data Download PDF

Info

Publication number
WO2018103830A1
WO2018103830A1 PCT/EP2016/079947 EP2016079947W WO2018103830A1 WO 2018103830 A1 WO2018103830 A1 WO 2018103830A1 EP 2016079947 W EP2016079947 W EP 2016079947W WO 2018103830 A1 WO2018103830 A1 WO 2018103830A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
index
data elements
data element
encrypted
Prior art date
Application number
PCT/EP2016/079947
Other languages
French (fr)
Inventor
Andreas Schaad
Bjoern GROHMANN
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2016/079947 priority Critical patent/WO2018103830A1/en
Publication of WO2018103830A1 publication Critical patent/WO2018103830A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Definitions

  • the present invention in some embodiments thereof, relates to a method for encrypting data and, more specifically, but not exclusively, to a method for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
  • cloud providers may access the personal data and perform searches and analysis on the private data. For example, image recognition techniques may be used to analyze photos and/or videos to calculate the identity of individuals, the location of the image, and other information that may be derived from image analysis technology.
  • cloud storage As users of cloud storage become aware of the access that the cloud providers have to data the user considered to be private, users may decide to avoid using cloud storage for certain data or altogether. This loss of customer trust may negatively impact the business model of the cloud storage provider, and provide a business opportunity to a cloud storage provider that can guarantee the privacy of user data.
  • Another solution is to store encrypted user data along with encrypted plaintext tags associated with the user data.
  • a user may initiate a search using one or more plaintext tags that are associated with the user data, and the cloud storage provider or an application on a user device may provide encryption and decryption of the plaintext tags. While this solution does not provide the cloud storage provider with access to unencrypted user data, it is vulnerable to data leakage through analysis of the queries, for example the size of the queries, repetition of key phrases, and other techniques for analyzing encrypted communications.
  • a memory adapted to store a plurality of data elements each associated with a unique identifier and an index dataset entry in an index dataset
  • a processor adapted to calculate a plurality of the index dataset entries by executing a code comprising: instructions to calculate a binary data tree having a plurality of nodes including a root node and a plurality of leaf nodes, each one of the plurality of leafs is associated with one of the plurality of data elements, and for each one of the plurality of data elements combining label values of nodes from the plurality of nodes which are found along a path starting from a respective the associated leaf and ends in a node connected directly to the root node of the binary data tree to create a data element vector, encrypting each label value of the data element vector with a key, calculating from the data element vector a probabilistic data structure vector, and adding the probabilistic data structure vector as an index entry value to the index dataset.
  • the processor is further configured to sum each of the encrypted label of the data element vector with a corresponding the unique identifier prior to calculating the probabilistic data structure.
  • the probabilistic data structure is a member of a group consisting of Bloom filter, Count-Min Sketch, Quotient filter, and any other probabilistic data structure that may be used as an index value.
  • the metadata associated with each the data element is encrypted using the key, the metadata comprising global positioning service (GPS) information, time information, a file name, an alphanumeric string, the index value, and any other type of information associated with the data element.
  • the metadata is associated with one or more the index dataset entries.
  • each of the plurality of data elements is chosen from the interval [0,2 n ) where n is a positive integer.
  • the plurality of data elements are encrypted.
  • the data element is associated with a computer file that is one of a list of computer file types consisting of document files, image files, video files, multimedia files, graphics file, streaming media files, and any other type of computer file.
  • the plurality of data elements and associated index values are stored in a computing device chosen from a list of computer devices consisting of a server on a local area network (LAN), a server connected to the internet, a server located on a cloud based storage network, and any other type of computer memory storage device, and wherein the computer device is located remotely from the system, and communicates with the system using a computer network.
  • a computing device chosen from a list of computer devices consisting of a server on a local area network (LAN), a server connected to the internet, a server located on a cloud based storage network, and any other type of computer memory storage device, and wherein the computer device is located remotely from the system, and communicates with the system using a computer network.
  • the keyed encryption is chosen from a group of block cypher schemes comprising Rinjdael, Two fish, Serpent, and any other encryption scheme that provides indistinguishability under chosen-plaintext attack (IND-CPA).
  • access to the encryption key is restricted to authorized users.
  • the metadata is automatically generated, the automatic creation comprising image recognition techniques to generate descriptive words associated with image data, recording information from a GPS device, descriptions of locations associated with the GPS data, an image calculated from the data element, and any other technique for automatically generating metadata associated with a computer file.
  • the data element may be located and retrieved from a plurality of data elements by presenting the index value associated with the data element.
  • a query of a range of data elements from the plurality of data elements comprises presenting a range of the index values associated with the range of data elements.
  • each query of a range of data elements comprises the same positive integer "x" number of data elements, wherein random data elements are added to the range until the range comprises the x data elements, and the queries with a range exceeding the x data elements are divided into a plurality of the queries wherein each the query comprises x members comprising a sum of the random data elements and members of the range.
  • the binary tree node label values are calculated according to standard binary tree label methods.
  • each one of the plurality of leafs is associated with one of the plurality of data elements, for each one of the plurality of data elements: combining label values of nodes from the plurality of nodes which are found along a path starting from a respective the associated leaf and ends in a node connected directly to the root node of the binary data tree to create a data element vector, encrypting each label value of the data element vector with a key, calculating from the data element vector a probabilistic data structure, and adding the probabilistic data structure as an index value to the index dataset, wherein the index dataset is structured for facilitating search of any of the plurality of data elements.
  • the method further comprises summing each the binary tree label value with the unique identifier to create the data element vector.
  • FIG. 1 is a flowchart of an exemplary process for calculating a searchable encrypted index of a database, according to some embodiments of the current invention
  • FIG. 2 is a schematic illustration of exemplary system for calculating a searchable encrypted index of a database, according to some embodiments of the present invention
  • FIG. 3 is an exemplary diagram of a binary tree with labels, as is known in the art
  • FIG. 4 is a schematic messaging diagram of an exemplary process for storing an encrypted data element on a cloud service, and retrieving the data element, according to some embodiments of the present invention
  • FIG. 5A is a schematic illustration of a user interface to an application on a user device for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
  • FIG. 5B is a schematic diagram of the messaging as shown in FIG. 4, according to some embodiments of the present invention.
  • FIG. 6 is a schematic illustration of a user device for storing and searching for a data element, according to some embodiments of the present invention.
  • the present invention in some embodiments thereof, relates to a method for encrypting data and, more specifically, but not exclusively, to a method for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
  • Personal files stored on a cloud storage may be encrypted to prevent the cloud storage provider from accessing personal information.
  • encryption may make searching for computer data files difficult since it is generally not possible to search encrypted data directly.
  • visual media where the search process is often performed visually, encrypted data is not available for viewing and therefore not searchable visually.
  • files and other data associated with plaintext tags it is not possible to search for the tags without first decrypting the associated data files.
  • even encrypted data is vulnerable to a variety of statistical and cryptologic techniques for analyzing data flows which may enable partial or full decryption.
  • the present invention in some embodiments thereof, comprises a process and system for encrypting a plaintext tag which may be used as an encrypted index entry associated with an encrypted data element to facilitate searching for encrypted data files.
  • the plaintext tag encryption comprises generating a probabilistic data structure using a combination of keyed encryption, a set of hash functions, a numerical transformation based on a binary data tree leaf labels, and optionally a unique identifier associated with the data element.
  • a probabilistic data structure comprises a method of storing in a data structure a numerical transformation of at least one numerical data item.
  • the numerical transformation comprises at least a hashing function.
  • the method of testing whether a certain numerical data item is stored within the data structure is to perform the numerical transformation on the numerical data item, and search for the result within the data structure.
  • a plurality of stored transformed data items may be stored in the data structure, and may partially or fully overlap each other in the data structure. Due to the possible overlapping of transformed data items, when testing whether a specific numerical data item is present in the data structure, there is a nonzero probability of a false positive result.
  • the numerical data item is referred to herein as an unencrypted dataset index entry
  • the transformed numerical item is referred to herein as an encrypted dataset index entry
  • the data structure is referred to herein as a dataset index.
  • the encrypted tag provides strong encryption, protection from data leakage due to analysis of encrypted queries and responses, combined with a convenient and fast interface to the end user.
  • the current invention in some embodiments thereof, may be employed in an application running on a computing device, for example a Smartphone, a tablet, and the like, for an end user to store and retrieve data elements on a cloud storage service.
  • the data element may be a computer file, for example an image file, a video file, and the like.
  • the user associates the data element in the application with a plaintext tag.
  • the plaintext tag may be "Black Forest".
  • the application may encrypt the data element with any encryption known in the art, for example keyed encryption, and encrypt the plaintext tag using the current invention as described below in FIG. 1.
  • the application may transmit the encrypted data element and encrypted plaintext tag to be stored on a cloud storage provider. However, the cloud storage provider does not have access to the unencrypted file or the associated plaintext tag. Additionally, analysis of the query communication is unlikely to leak data.
  • the current invention in some embodiments thereof, provides a number of advantages over the existing art.
  • the use of a probabilistic data structure as opposed to a deterministic data structure, reduces the opportunity for decrypting plaintext tags through statistical analysis of queries and responses.
  • a probabilistic data structure uses less computer memory resources than storage of the same quantity of data when individually encrypted.
  • the combination of keyed encryption, numerical transformation using a set of binary tree labels, and a set of hash functions to generate the probabilistic data structure provides searchable encryption than conventional keyed encryption.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • a network for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware -based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
  • FIG. 1 a flowchart of process 100 for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
  • Process 100 calculates an encrypted probabilistic data structure which is used as an encrypted dataset index entry associated with a data element.
  • the data element may be a computer file, for example an image file, a video file, a multimedia file, a text file, an executable program file, and the like.
  • Process 100 comprises a combination of encryption and encoding methodologies to calculate the probabilistic data structure.
  • Each entry in the dataset index entry may be associated with a data element, and may be used to search a database for the associated data element.
  • the probabilistic data structure is calculated by inputting to a set of cryptographic hash functions a key encrypted data element vector.
  • the data element vector is calculated from a set of labels of nodes on a binary data tree that is associated with the data element.
  • a data element may be associated to one or more unencrypted dataset index entries, wherein an unencrypted dataset index entry may be an integer or any data that can be associated to an integer, such as a plaintext tag.
  • the one or more unencrypted dataset index entries can be in turn associated to a label in the binary data tree.
  • FIG. 2 a schematic illustration of exemplary system 200 for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
  • System 200 may be a computing device, for example a server, a Smartphone, a laptop, and/or any other computing device.
  • System 200 includes an input/output (I/O) interface 202 for receiving user queries and outputting query results, a processor(s) 204, and a storage 208.
  • I/O 202 may include one or more input interfaces, for example a keyboard, a soft keyboard, a voice to text system, and/or any other data input interface.
  • I/O 202 may comprise one or more output interfaces, for example a screen, a touch screen, video display, and or any other visual display device.
  • Processor(s) 204 may comprise one or more processors, multi-core processors, and/or any other type of core processing unit (CPU).
  • Storage 208 may include one or more non-transitory persistent storage devices, for example, a hard drive, a Flash array and the like.
  • Storage 208 further comprises a database, for example database 220, and/or a dataset table, for
  • database 220 comprises three columns: a column for data elements, a column for dataset index entries, and a column for unique identifiers associated with the data element.
  • Database 220 may comprise any equivalent data structure associating between data elements, index entries, and unique identifiers, for instance a connected list, one or more matrices and/or the like.
  • a data element, as described above, may be a computer file, for example an image file, a video file, a multimedia file, a text file, an executable program file, and the like.
  • the index entries may be an array, a vector, and/or any other data structure that is adapted to contain a probabilistic data structure.
  • the unique identifier may be an integer, a binary number, and/or any other numerical representation of a real number.
  • Each row of the database comprises entries that are associated each with the data element.
  • the dataset index may be stored remotely from the database, for example on another instance of a cloud storage server 240 connected via network 230.
  • system 200 is connected to a network 230 via I/O 202.
  • I/O 230 may be a network interface car (NIC), a wireless router, and/or any other type of network interface adapted to communicating with network 230.
  • NIC network interface car
  • wireless router any other type of network interface adapted to communicating with network 230.
  • Network 230 may be any type of data network, for example, a local area network (LAN), an Ethernet LAN, a fiber optic LAN, a digital subscriber line (DSL), a wireless LAN, a broadband connection, an Internet connection using an Internet Service Provider (ISP) and/or any other type of computer network.
  • Network 230 may employ any type of data networking protocols, including transport control protocol and internet protocol (TCP/IP), user datagram protocol (UDP), and the like.
  • a cloud storage server 240 may be connected to server 200 via network 230.
  • Cloud storage server 240 may be any type of computer platform, and/or a network of computer platforms, adapted to host a database and perform database operations, and communicate with database clients via a network, for example network 230.
  • database 220 is stored on cloud storage server 240.
  • user device 260 may be connected to System 200, for example via network 230.
  • the user device may be a smartphone, a computer, and/or any other computing platform.
  • Process 100 may be executed by processor 204 executing code from by one or more software modules in storage 208, for example Tree Label Calculator 210, Key Encryptor 211, Hash Function Calculator 212, and/or Probabilistic Structure Generator 213.
  • a software module refers to a plurality of program instructions stored in a non-transitory medium such as the storage 208 and executed by a processor such as the processor(s) 204.
  • process 100 begins by receiving, for example from a user device 260, a new data element, a unique identifier and unencrypted dataset index entry, which are stored, for example in storage 208 by code instructions in I/O 202 executed on processor 204.
  • the unencrypted dataset index entry may be an integer, and by extension any form of data that may be associated with an integer, for example a plaintext tag that is associated with an integer by a lookup table.
  • a plurality of unencrypted dataset index entries may be received for a single data element. For example, when a user associates multiple plaintext tags with a single data element, then multiple dataset index entries may be received.
  • the plurality of unencrypted dataset index entries are encrypted and encoded as described below, and stored in a single probabilistic data structure, as is known in the art, which is then used as the encrypted dataset index entry.
  • a set of binary tree labels associated with the unencrypted dataset index entry is calculated, for example by executing code instructions in Tree Label Calculator 210 on processor 204.
  • the unencrypted dataset index entry may be an integer "r" within the range of zero to 2 n , where "n" is a positive integer.
  • each node of the binary tree is labeled in the standard "0-1" labeling, as is known in the art. This labeling is accomplished by starting from the root, and traveling down the tree labeling each left branch "0" and each right branch” 1".
  • the label of each node comprises the label of the parent branch label concatenated with the labels of all nodes on the path up to but not including the root.
  • Leaf “r” is identified as the leaf with the binary label equal to the integer "r”.
  • a data element vector is calculated for "r", comprising the labels of the nodes from leaf "r" to the node directly connected to the root, inclusive.
  • FIG. 3 an exemplary diagram of a binary tree with standard "0-1" labels.
  • leaf “r” is the leaf labeled "100” which is the binary equivalent of the number 4.
  • each member of the data element vector is key encrypted, for example by executing code instructions of Key Encryptor 211 on processor 204, generating an encrypted data element vector.
  • each element of the encrypted data element vector is concatenated with the associated unique identifier.
  • each member of the encrypted data element vector is an input to a set of cryptographic hash functions to generate a probabilistic data structure, for example by executing code from Hash Function Calculator 212 on processor 204.
  • the probabilistic data structure may be a Bloom Filter, Count-Min Sketch, Quotient filter, and/or any other probabilistic data structure.
  • process 100 is an embodiment of the present invention wherein the probabilistic data structure is generated by Bloom Filter, as is known in the art.
  • the probabilistic data structure for example a Bloom Filter
  • IND-CPA chosen-plaintext attack
  • the probabilistic data structure may be enhanced with indistinguishability under chosen-plaintext attack (IND-CPA), for example by adding a nonce ⁇ alpha to every Bloom filter and evaluating the hash value of an data element "z"as hash( ⁇ alpha II z).
  • a Bloom Filter is calculated in the following manner.
  • a bit array “b” of "m” bits is initially set to zero.
  • a set of “k” cryptographic hash functions calculates a set of values by feeding as an input to each member of "k” each element of the encrypted data element vector, wherein “m” and “k” are integers, and "k” is smaller than "m”.
  • Each calculated value is used as an index "x" in the bit array "b[ x ]”. Every bit belonging to "b[ x ]" is set to "1".
  • the size of the bit array “b” is chosen to be larger than the total number of bits set to "1" in “b”.
  • the calculation of values continues until every element of the encrypted data vector is input to every hash function, for example by executing code from Hash Function Calculator 212 on processor 204.
  • the resulting bit array is the probabilistic data structure.
  • the probabilistic data structure is inserted as an index entry value into the index dataset, for example by code from Probabilistic Structure Generator 213 executed on processor 204.
  • system 200 comprises code instructions that when executed on processor 204 may retrieve a data element based on a plaintext search. For example, a user may enter a search for a data element that is associated with an encrypted dataset index entry, for example by a lookup table.
  • the associated encrypted dataset index entry may be transferred to a database, for example database 220, which is adapted to respond to client requests, and the associated encrypted data element is transmitted to system 220, for example via network 230.
  • the data element is then decrypted and displayed to the user.
  • the search for a range of data elements may be performed as described above by a user entering a range of unencrypted dataset index entries.
  • a search for a range of data elements will always comprise the same number of data elements.
  • the database comprises key encrypted metadata associated with the data element, for example, a global positioning service (GPS) coordinate, a time stamp, a file name, an alphanumeric string, the index entry, a plaintext tag, and/or any other data generated automatically by user device 260 or entered manually by a user to an application.
  • the metadata may be associated with an index dataset entry, for example an item of metadata may be used to generate an encrypted dataset index entry.
  • system 200 comprises code instructions executed on processor 204 to automatically generate metadata, for example image recognition techniques to generate plaintext descriptive of an image file, recoding information from a GPS device, for example a GPS device connected to system 200 by network 230, a parsing program to detect key words in a text file, a translation program to translate all or part of a text file to another language, an optical character recognition (OCR) program to automatically generate plaintext from an image file, and/or any other technique to automatically generate metadata from a computer file.
  • code instructions executed on processor 204 to automatically generate metadata, for example image recognition techniques to generate plaintext descriptive of an image file, recoding information from a GPS device, for example a GPS device connected to system 200 by network 230, a parsing program to detect key words in a text file, a translation program to translate all or part of a text file to another language, an optical character recognition (OCR) program to automatically generate plaintext from an image file, and/or any other technique to automatically generate metadata from a computer file.
  • OCR optical character recognition
  • the data element is encrypted with a keyed encryption.
  • the keyed encryption is chosen from a group of block cypher schemes comprising Rinjdael, Two fish, Serpent, and any other encryption scheme that provides IND-CPA.
  • the key for the keyed encryption is stored in a secured computer memory location, for example storage 208, and access to the key is protected by a password, user identifier, and/or any other access control mechanism.
  • FIG. 4 a schematic messaging diagram of an exemplary process for storing an encrypted data element on a cloud service, and retrieving the data element, according to some embodiments of the current invention.
  • the encrypted dataset index entry is computed for a computer file generated by a user device, for example user device 260, according to some embodiments of the current invention.
  • the encrypted dataset index entry and the encrypted computer file are stored on a cloud storage, for example cloud storage 240.
  • FIG. 5A a schematic illustration of a user interface to an application that implements process 100 on a user device, for example user device 260, according to some embodiments of the current invention.
  • a user may input a plaintext tag to the application UI to initiate a search of encrypted files on a cloud service provider, for example cloud storage 240.
  • a cloud service provider for example cloud storage 240.
  • FIG. 5B a schematic diagram of the messaging as shown in FIG. 4 within user device 260, according to some embodiments of the current invention.
  • FIG. 6 a schematic illustration of a user device for encrypting, storing and searching for a data element, according to some embodiments of the current invention.
  • the user device for example user device 260
  • the camera which creates an image file.
  • the image file is encrypted and the encrypted dataset index entry is generated as described above.
  • a user initiates a search for a plaintext tag that is associated with the image file, as described above.
  • the encrypted dataset index entry associated with the plaintext tag is located and sent as a query to database 220 in cloud storage 240.
  • the encrypted image file is received from cloud storage 240 and decrypted with the encryption key stored on user device 260.
  • composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to an aspect of some embodiments of the present invention there is provided a system for calculating a searchable encrypted index of a database comprising: a memory adapted to store a data elements each associated with a unique identifier and an index dataset entry in an index dataset, a processor adapted to execute a code, a code comprising instructions to calculate a binary data tree where each leaf is associated with one of the data elements, and for each data element combining label values of nodes which are found along a path starting from a respective leaf and ends in a node connected directly to the root node to create a data vector, encrypting each label of the data vector with a key, calculating from the data vector a probabilistic data structure, and adding the probabilistic data structure as an index entry value to the index dataset.

Description

Title: A METHOD AND SYSTEM FOR SEARCHABLE ENCRYPTED CLOUD STORAGE OF MEDIA DATA
BACKGROUND
The present invention, in some embodiments thereof, relates to a method for encrypting data and, more specifically, but not exclusively, to a method for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
Many people enjoy the convenience of storing their personal data in a computing cloud. However, while many clouds services are available, for example Google Drive, Facebook, and the like, the cloud providers may access the personal data and perform searches and analysis on the private data. For example, image recognition techniques may be used to analyze photos and/or videos to calculate the identity of individuals, the location of the image, and other information that may be derived from image analysis technology.
As users of cloud storage become aware of the access that the cloud providers have to data the user considered to be private, users may decide to avoid using cloud storage for certain data or altogether. This loss of customer trust may negatively impact the business model of the cloud storage provider, and provide a business opportunity to a cloud storage provider that can guarantee the privacy of user data.
One existing solution to the loss of privacy is to encrypt the personal data, however it is generally not possible to perform a search on encrypted data. Every time a user initiates a search of encrypted data, all the data to be searched must be downloaded and decrypted. This solution reduces the ease of access for users, which may be even more problematic to the cloud storage provider than loss of trust.
Another solution is to store encrypted user data along with encrypted plaintext tags associated with the user data. A user may initiate a search using one or more plaintext tags that are associated with the user data, and the cloud storage provider or an application on a user device may provide encryption and decryption of the plaintext tags. While this solution does not provide the cloud storage provider with access to unencrypted user data, it is vulnerable to data leakage through analysis of the queries, for example the size of the queries, repetition of key phrases, and other techniques for analyzing encrypted communications.
SUMMARY
According to a first aspect of some embodiments of the present invention there is provided a system for calculating a searchable encrypted index of a database comprising:
a memory adapted to store a plurality of data elements each associated with a unique identifier and an index dataset entry in an index dataset, a processor adapted to calculate a plurality of the index dataset entries by executing a code comprising: instructions to calculate a binary data tree having a plurality of nodes including a root node and a plurality of leaf nodes, each one of the plurality of leafs is associated with one of the plurality of data elements, and for each one of the plurality of data elements combining label values of nodes from the plurality of nodes which are found along a path starting from a respective the associated leaf and ends in a node connected directly to the root node of the binary data tree to create a data element vector, encrypting each label value of the data element vector with a key, calculating from the data element vector a probabilistic data structure vector, and adding the probabilistic data structure vector as an index entry value to the index dataset.
In an implementation form of the first aspect of the invention, the processor is further configured to sum each of the encrypted label of the data element vector with a corresponding the unique identifier prior to calculating the probabilistic data structure.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the probabilistic data structure is a member of a group consisting of Bloom filter, Count-Min Sketch, Quotient filter, and any other probabilistic data structure that may be used as an index value.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the metadata associated with each the data element is encrypted using the key, the metadata comprising global positioning service (GPS) information, time information, a file name, an alphanumeric string, the index value, and any other type of information associated with the data element. According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the metadata is associated with one or more the index dataset entries.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, each of the plurality of data elements is chosen from the interval [0,2n) where n is a positive integer.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the plurality of data elements are encrypted.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the data element is associated with a computer file that is one of a list of computer file types consisting of document files, image files, video files, multimedia files, graphics file, streaming media files, and any other type of computer file.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the plurality of data elements and associated index values are stored in a computing device chosen from a list of computer devices consisting of a server on a local area network (LAN), a server connected to the internet, a server located on a cloud based storage network, and any other type of computer memory storage device, and wherein the computer device is located remotely from the system, and communicates with the system using a computer network.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the keyed encryption is chosen from a group of block cypher schemes comprising Rinjdael, Two fish, Serpent, and any other encryption scheme that provides indistinguishability under chosen-plaintext attack (IND-CPA).
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, access to the encryption key is restricted to authorized users.
According to an implementation form of the first aspect, the metadata is automatically generated, the automatic creation comprising image recognition techniques to generate descriptive words associated with image data, recording information from a GPS device, descriptions of locations associated with the GPS data, an image calculated from the data element, and any other technique for automatically generating metadata associated with a computer file.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, the data element may be located and retrieved from a plurality of data elements by presenting the index value associated with the data element.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, a query of a range of data elements from the plurality of data elements comprises presenting a range of the index values associated with the range of data elements.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention, each query of a range of data elements comprises the same positive integer "x" number of data elements, wherein random data elements are added to the range until the range comprises the x data elements, and the queries with a range exceeding the x data elements are divided into a plurality of the queries wherein each the query comprises x members comprising a sum of the random data elements and members of the range.
According to an implementation form of the first aspect as such or to any of the foregoing implementation forms of the invention , the binary tree node label values are calculated according to standard binary tree label methods.
According to a second aspect of some embodiments of the present invention there is provided a method for calculating a searchable encrypted index of a database comprising:
receiving a plurality of data elements each associated with a unique identification and an index dataset entry, calculating a binary data tree having a plurality of nodes including a root node and a plurality of leaf nodes, each one of the plurality of leafs is associated with one of the plurality of data elements, for each one of the plurality of data elements: combining label values of nodes from the plurality of nodes which are found along a path starting from a respective the associated leaf and ends in a node connected directly to the root node of the binary data tree to create a data element vector, encrypting each label value of the data element vector with a key, calculating from the data element vector a probabilistic data structure, and adding the probabilistic data structure as an index value to the index dataset, wherein the index dataset is structured for facilitating search of any of the plurality of data elements.
According to an implementation form of the second aspect of the invention the method further comprises summing each the binary tree label value with the unique identifier to create the data element vector.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
FIG. 1 is a flowchart of an exemplary process for calculating a searchable encrypted index of a database, according to some embodiments of the current invention;
FIG. 2 is a schematic illustration of exemplary system for calculating a searchable encrypted index of a database, according to some embodiments of the present invention;
FIG. 3 is an exemplary diagram of a binary tree with labels, as is known in the art;
FIG. 4 is a schematic messaging diagram of an exemplary process for storing an encrypted data element on a cloud service, and retrieving the data element, according to some embodiments of the present invention; FIG. 5A is a schematic illustration of a user interface to an application on a user device for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
FIG. 5B is a schematic diagram of the messaging as shown in FIG. 4, according to some embodiments of the present invention; and
FIG. 6 is a schematic illustration of a user device for storing and searching for a data element, according to some embodiments of the present invention.
DETAILED DESCRIPTION
The present invention, in some embodiments thereof, relates to a method for encrypting data and, more specifically, but not exclusively, to a method for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
Personal files stored on a cloud storage may be encrypted to prevent the cloud storage provider from accessing personal information. However, encryption may make searching for computer data files difficult since it is generally not possible to search encrypted data directly. In the case of visual media, where the search process is often performed visually, encrypted data is not available for viewing and therefore not searchable visually. In the case of files and other data associated with plaintext tags, it is not possible to search for the tags without first decrypting the associated data files. In addition, even encrypted data is vulnerable to a variety of statistical and cryptologic techniques for analyzing data flows which may enable partial or full decryption.
The present invention, in some embodiments thereof, comprises a process and system for encrypting a plaintext tag which may be used as an encrypted index entry associated with an encrypted data element to facilitate searching for encrypted data files. The plaintext tag encryption comprises generating a probabilistic data structure using a combination of keyed encryption, a set of hash functions, a numerical transformation based on a binary data tree leaf labels, and optionally a unique identifier associated with the data element.
A probabilistic data structure, as is known in the art, comprises a method of storing in a data structure a numerical transformation of at least one numerical data item. The numerical transformation comprises at least a hashing function. The method of testing whether a certain numerical data item is stored within the data structure is to perform the numerical transformation on the numerical data item, and search for the result within the data structure. A plurality of stored transformed data items may be stored in the data structure, and may partially or fully overlap each other in the data structure. Due to the possible overlapping of transformed data items, when testing whether a specific numerical data item is present in the data structure, there is a nonzero probability of a false positive result. In the present invention, the numerical data item is referred to herein as an unencrypted dataset index entry, the transformed numerical item is referred to herein as an encrypted dataset index entry, and the data structure is referred to herein as a dataset index.
The encrypted tag provides strong encryption, protection from data leakage due to analysis of encrypted queries and responses, combined with a convenient and fast interface to the end user.
For example, the current invention in some embodiments thereof, may be employed in an application running on a computing device, for example a Smartphone, a tablet, and the like, for an end user to store and retrieve data elements on a cloud storage service. The data element may be a computer file, for example an image file, a video file, and the like. The user associates the data element in the application with a plaintext tag. For example, if the file is a picture from a holiday in Germany, the plaintext tag may be "Black Forest". The application may encrypt the data element with any encryption known in the art, for example keyed encryption, and encrypt the plaintext tag using the current invention as described below in FIG. 1. The application may transmit the encrypted data element and encrypted plaintext tag to be stored on a cloud storage provider. However, the cloud storage provider does not have access to the unencrypted file or the associated plaintext tag. Additionally, analysis of the query communication is unlikely to leak data.
The current invention, in some embodiments thereof, provides a number of advantages over the existing art. The use of a probabilistic data structure, as opposed to a deterministic data structure, reduces the opportunity for decrypting plaintext tags through statistical analysis of queries and responses. In addition, a probabilistic data structure uses less computer memory resources than storage of the same quantity of data when individually encrypted. The combination of keyed encryption, numerical transformation using a set of binary tree labels, and a set of hash functions to generate the probabilistic data structure provides searchable encryption than conventional keyed encryption.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention. Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware -based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Reference is now made to FIG. 1, a flowchart of process 100 for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
Process 100 calculates an encrypted probabilistic data structure which is used as an encrypted dataset index entry associated with a data element. The data element may be a computer file, for example an image file, a video file, a multimedia file, a text file, an executable program file, and the like.
Process 100 comprises a combination of encryption and encoding methodologies to calculate the probabilistic data structure. Each entry in the dataset index entry may be associated with a data element, and may be used to search a database for the associated data element.
The probabilistic data structure is calculated by inputting to a set of cryptographic hash functions a key encrypted data element vector. The data element vector is calculated from a set of labels of nodes on a binary data tree that is associated with the data element. As an example, a data element may be associated to one or more unencrypted dataset index entries, wherein an unencrypted dataset index entry may be an integer or any data that can be associated to an integer, such as a plaintext tag. The one or more unencrypted dataset index entries can be in turn associated to a label in the binary data tree. A detailed description will be given in the following with reference to figure 1.
Reference is now made to FIG. 2, a schematic illustration of exemplary system 200 for calculating a searchable encrypted index of a database, according to some embodiments of the current invention.
System 200 may be a computing device, for example a server, a Smartphone, a laptop, and/or any other computing device. System 200 includes an input/output (I/O) interface 202 for receiving user queries and outputting query results, a processor(s) 204, and a storage 208. I/O 202 may include one or more input interfaces, for example a keyboard, a soft keyboard, a voice to text system, and/or any other data input interface. I/O 202 may comprise one or more output interfaces, for example a screen, a touch screen, video display, and or any other visual display device. Processor(s) 204 may comprise one or more processors, multi-core processors, and/or any other type of core processing unit (CPU). Storage 208 may include one or more non-transitory persistent storage devices, for example, a hard drive, a Flash array and the like. Storage 208 further comprises a database, for example database 220, and/or a dataset table, for example dataset index 221.
Optionally, database 220 comprises three columns: a column for data elements, a column for dataset index entries, and a column for unique identifiers associated with the data element. Database 220 may comprise any equivalent data structure associating between data elements, index entries, and unique identifiers, for instance a connected list, one or more matrices and/or the like. A data element, as described above, may be a computer file, for example an image file, a video file, a multimedia file, a text file, an executable program file, and the like. The index entries may be an array, a vector, and/or any other data structure that is adapted to contain a probabilistic data structure. The unique identifier may be an integer, a binary number, and/or any other numerical representation of a real number. Each row of the database comprises entries that are associated each with the data element. Optionally, the dataset index may be stored remotely from the database, for example on another instance of a cloud storage server 240 connected via network 230.
Optionally, system 200 is connected to a network 230 via I/O 202. For example, I/O 230 may be a network interface car (NIC), a wireless router, and/or any other type of network interface adapted to communicating with network 230.
Network 230 may be any type of data network, for example, a local area network (LAN), an Ethernet LAN, a fiber optic LAN, a digital subscriber line (DSL), a wireless LAN, a broadband connection, an Internet connection using an Internet Service Provider (ISP) and/or any other type of computer network. Network 230 may employ any type of data networking protocols, including transport control protocol and internet protocol (TCP/IP), user datagram protocol (UDP), and the like.
A cloud storage server 240 may be connected to server 200 via network 230. Cloud storage server 240 may be any type of computer platform, and/or a network of computer platforms, adapted to host a database and perform database operations, and communicate with database clients via a network, for example network 230. Optionally database 220 is stored on cloud storage server 240.
Optionally, user device 260 may be connected to System 200, for example via network 230. The user device may be a smartphone, a computer, and/or any other computing platform.
Process 100 may be executed by processor 204 executing code from by one or more software modules in storage 208, for example Tree Label Calculator 210, Key Encryptor 211, Hash Function Calculator 212, and/or Probabilistic Structure Generator 213. Wherein a software module refers to a plurality of program instructions stored in a non-transitory medium such as the storage 208 and executed by a processor such as the processor(s) 204.
Reference is now made again to FIG 1. As shown in 101, process 100 begins by receiving, for example from a user device 260, a new data element, a unique identifier and unencrypted dataset index entry, which are stored, for example in storage 208 by code instructions in I/O 202 executed on processor 204.
Optionally, the unencrypted dataset index entry may be an integer, and by extension any form of data that may be associated with an integer, for example a plaintext tag that is associated with an integer by a lookup table. Optionally, a plurality of unencrypted dataset index entries may be received for a single data element. For example, when a user associates multiple plaintext tags with a single data element, then multiple dataset index entries may be received. The plurality of unencrypted dataset index entries are encrypted and encoded as described below, and stored in a single probabilistic data structure, as is known in the art, which is then used as the encrypted dataset index entry.
As shown in 102, a set of binary tree labels associated with the unencrypted dataset index entry is calculated, for example by executing code instructions in Tree Label Calculator 210 on processor 204. Optionally, the unencrypted dataset index entry may be an integer "r" within the range of zero to 2n, where "n" is a positive integer.
Optionally, each node of the binary tree is labeled in the standard "0-1" labeling, as is known in the art. This labeling is accomplished by starting from the root, and traveling down the tree labeling each left branch "0" and each right branch" 1". The label of each node comprises the label of the parent branch label concatenated with the labels of all nodes on the path up to but not including the root. Leaf "r" is identified as the leaf with the binary label equal to the integer "r". A data element vector is calculated for "r", comprising the labels of the nodes from leaf "r" to the node directly connected to the root, inclusive.
Reference is now made to FIG. 3, an exemplary diagram of a binary tree with standard "0-1" labels. For example, when "r" = 4, leaf "r" is the leaf labeled "100" which is the binary equivalent of the number 4. In this example, "n"=3, so there are 23=8 leafs in the tree. As shown in 301, the data element vector for "r = 4" comprises a set with 3 members: 100, 010, and 001.
Reference is now made again to FIG. 1. As shown in 103, each member of the data element vector is key encrypted, for example by executing code instructions of Key Encryptor 211 on processor 204, generating an encrypted data element vector. Optionally, each element of the encrypted data element vector is concatenated with the associated unique identifier.
As shown in 104, each member of the encrypted data element vector is an input to a set of cryptographic hash functions to generate a probabilistic data structure, for example by executing code from Hash Function Calculator 212 on processor 204. Optionally, the probabilistic data structure may be a Bloom Filter, Count-Min Sketch, Quotient filter, and/or any other probabilistic data structure. By way of example, process 100 is an embodiment of the present invention wherein the probabilistic data structure is generated by Bloom Filter, as is known in the art.
Optionally, the probabilistic data structure, for example a Bloom Filter, may be enhanced with indistinguishability under chosen-plaintext attack (IND-CPA), for example by adding a nonce \alpha to every Bloom filter and evaluating the hash value of an data element "z"as hash(\alpha II z).
A Bloom Filter, as is known in the art, is calculated in the following manner. A bit array "b" of "m" bits is initially set to zero. A set of "k" cryptographic hash functions calculates a set of values by feeding as an input to each member of "k" each element of the encrypted data element vector, wherein "m" and "k" are integers, and "k" is smaller than "m". Each calculated value is used as an index "x" in the bit array "b[x]". Every bit belonging to "b[x]" is set to "1". The size of the bit array "b" is chosen to be larger than the total number of bits set to "1" in "b".
As shown in 105, the calculation of values continues until every element of the encrypted data vector is input to every hash function, for example by executing code from Hash Function Calculator 212 on processor 204. The resulting bit array is the probabilistic data structure.
As shown in 106, the probabilistic data structure is inserted as an index entry value into the index dataset, for example by code from Probabilistic Structure Generator 213 executed on processor 204.
Optionally, system 200 comprises code instructions that when executed on processor 204 may retrieve a data element based on a plaintext search. For example, a user may enter a search for a data element that is associated with an encrypted dataset index entry, for example by a lookup table. The associated encrypted dataset index entry may be transferred to a database, for example database 220, which is adapted to respond to client requests, and the associated encrypted data element is transmitted to system 220, for example via network 230. The data element is then decrypted and displayed to the user.
Optionally, the search for a range of data elements may be performed as described above by a user entering a range of unencrypted dataset index entries.
Optionally, a search for a range of data elements will always comprise the same number of data elements. By always using a fixed number of elements in a range search, the possibility of leaking data according to the size of the search is eliminated. Optionally, the database comprises key encrypted metadata associated with the data element, for example, a global positioning service (GPS) coordinate, a time stamp, a file name, an alphanumeric string, the index entry, a plaintext tag, and/or any other data generated automatically by user device 260 or entered manually by a user to an application. The metadata may be associated with an index dataset entry, for example an item of metadata may be used to generate an encrypted dataset index entry.
Optionally, system 200 comprises code instructions executed on processor 204 to automatically generate metadata, for example image recognition techniques to generate plaintext descriptive of an image file, recoding information from a GPS device, for example a GPS device connected to system 200 by network 230, a parsing program to detect key words in a text file, a translation program to translate all or part of a text file to another language, an optical character recognition (OCR) program to automatically generate plaintext from an image file, and/or any other technique to automatically generate metadata from a computer file.
Optionally, the data element is encrypted with a keyed encryption.
Optionally, the keyed encryption is chosen from a group of block cypher schemes comprising Rinjdael, Two fish, Serpent, and any other encryption scheme that provides IND-CPA.
Optionally, the key for the keyed encryption is stored in a secured computer memory location, for example storage 208, and access to the key is protected by a password, user identifier, and/or any other access control mechanism.
Reference is now made to FIG. 4, a schematic messaging diagram of an exemplary process for storing an encrypted data element on a cloud service, and retrieving the data element, according to some embodiments of the current invention.
As shown in 401 , the encrypted dataset index entry is computed for a computer file generated by a user device, for example user device 260, according to some embodiments of the current invention. As shown in 402, the encrypted dataset index entry and the encrypted computer file are stored on a cloud storage, for example cloud storage 240.
Reference is now made to FIG. 5A, a schematic illustration of a user interface to an application that implements process 100 on a user device, for example user device 260, according to some embodiments of the current invention. As shown in 501, a user may input a plaintext tag to the application UI to initiate a search of encrypted files on a cloud service provider, for example cloud storage 240.
Reference is now made to FIG. 5B, a schematic diagram of the messaging as shown in FIG. 4 within user device 260, according to some embodiments of the current invention.
Reference is now made to FIG. 6, a schematic illustration of a user device for encrypting, storing and searching for a data element, according to some embodiments of the current invention. As shown in 601, the user device, for example user device 260, is equipped with a camera which creates an image file. The image file is encrypted and the encrypted dataset index entry is generated as described above. As shown in 602, a user initiates a search for a plaintext tag that is associated with the image file, as described above. As shown in 603, the encrypted dataset index entry associated with the plaintext tag is located and sent as a query to database 220 in cloud storage 240. As shown in 604, the encrypted image file is received from cloud storage 240 and decrypted with the encryption key stored on user device 260.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent maturing from this application many relevant encryption and hashing technologies will be developed and the scope of the terms encryption and hashing is intended to include all such new technologies a priori.
As used herein the term "about" refers to ± 10 %.
The terms "comprises", "comprising", "includes", "including", "having" and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of and "consisting essentially of .
The phrase "consisting essentially of means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims

WHAT IS CLAIMED IS:
1. A system for calculating a searchable encrypted index of a database comprising: a memory adapted to store a plurality of data elements each associated with a unique identifier and an index dataset entry in an index dataset;
a processor adapted to calculate a plurality of said index dataset entries by executing a code comprising:
instructions to calculate a binary data tree having a plurality of nodes including a root node and a plurality of leaf nodes, each one of said plurality of leafs is associated with one of said plurality of data elements;
for each one of said plurality of data elements:
combining label values of nodes from said plurality of nodes which are found along a path starting from a respective said associated leaf and ends in a node connected directly to said root node of said binary data tree to create a data element vector;
encrypting each label value of said data element vector with a key; calculating from said data element vector a probabilistic data structure vector; and
adding said probabilistic data structure vector as an index entry value to said index dataset.
2. The system of claim 1, wherein the processor is further configured to sum each of said encrypted label of said data element vector with a corresponding said unique identifier prior to calculating said probabilistic data structure.
3. The system of any of the preceding claims, wherein said probabilistic data structure is a member of a group consisting of Bloom filter, Count-Min Sketch, Quotient filter, and any other probabilistic data structure that may be used as an index value.
4. The system of any of the above claims, wherein metadata associated with each said data element is encrypted using said key, said metadata comprising global positioning service (GPS) information, time information, a file name, an alphanumeric string, said index value, and any other type of information associated with said data element.
5. The system of claim 4, wherein said metadata is associated with one or more said index dataset entries.
6. The system of any of claims 1 to 5, wherein each of said plurality of data elements is chosen from the interval [0,2n) where n is a positive integer.
7. The system of any of the preceding claims, wherein said plurality of data elements are encrypted.
8. The system of any of the preceding claims, wherein said data element is associated with a computer file that is one of a list of computer file types consisting of document files, image files, video files, multimedia files, graphics file, streaming media files, and any other type of computer file.
9. The system of any of the preceding claims, wherein said plurality of data elements and associated index values are stored in a computing device chosen from a list of computer devices consisting of a server on a local area network (LAN), a server connected to the internet, a server located on a cloud based storage network, and any other type of computer memory storage device, and wherein said computer device is located remotely from said system, and communicates with said system using a computer network.
10. The system of any of the preceding claims, wherein said keyed encryption is chosen from a group of block cypher schemes comprising Rinjdael, Twofish, Serpent, and any other encryption scheme that provides indistinguishability under chosen- plaintext attack (IND-CPA).
11. The system of any of the preceding claims, wherein access to said encryption key is restricted to authorized users.
12. The system of claim 4, wherein said metadata is automatically generated, said automatic creation comprising image recognition techniques to generate descriptive words associated with image data, recording information from a GPS device, descriptions of locations associated with said GPS data, an image calculated from the data element, and any other technique for automatically generating metadata associated with a computer file.
13. The system of any of the preceding claims, wherein said data element may be located and retrieved from a plurality of data elements by presenting said index value associated with said data element.
14. The system of any of the preceding claims, wherein a query of a range of data elements from said plurality of data elements comprises presenting a range of said index values associated with said range of data elements.
15. The system of any of the preceding claims, wherein each query of a range of data elements comprises the same positive integer "x" number of data elements, wherein random data elements are added to said range until said range comprises said x data elements, and said queries with a range exceeding said x data elements are divided into a plurality of said queries wherein each said query comprises x members comprising a sum of said random data elements and members of said range.
16. The system of any of the preceding claims, wherein said binary tree node label values are calculated according to standard binary tree label methods.
17. A method for calculating a searchable encrypted index of a database comprising: receiving a plurality of data elements each associated with a unique identification and an index dataset entry;
calculating a binary data tree having a plurality of nodes including a root node and a plurality of leaf nodes, each one of said plurality of leafs is associated with one of said plurality of data elements;
for each one of said plurality of data elements: combining label values of nodes from said plurality of nodes which are found along a path starting from a respective said associated leaf and ends in a node connected directly to said root node of said binary data tree to create a data element vector;
encrypting each label value of said data element vector with a key; calculating from said data element vector a probabilistic data structure; and
adding said probabilistic data structure as an index value to said index dataset;
wherein said index dataset is structured for facilitating search of any of said plurality of data elements.
18. The method of claim 17, further comprising summing each said binary tree label value with said unique identifier to create said data element vector.
PCT/EP2016/079947 2016-12-06 2016-12-06 A method and system for searchable encrypted cloud storage of media data WO2018103830A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/079947 WO2018103830A1 (en) 2016-12-06 2016-12-06 A method and system for searchable encrypted cloud storage of media data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/079947 WO2018103830A1 (en) 2016-12-06 2016-12-06 A method and system for searchable encrypted cloud storage of media data

Publications (1)

Publication Number Publication Date
WO2018103830A1 true WO2018103830A1 (en) 2018-06-14

Family

ID=57570046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/079947 WO2018103830A1 (en) 2016-12-06 2016-12-06 A method and system for searchable encrypted cloud storage of media data

Country Status (1)

Country Link
WO (1) WO2018103830A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110113635A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method and system of automatic broadcasting PUSH message
CN110347680A (en) * 2019-06-21 2019-10-18 北京航空航天大学 A kind of space-time data indexing means towards high in the clouds environment
CN110955901A (en) * 2019-10-12 2020-04-03 烽火通信科技股份有限公司 Storage method and server for virtual machine image file of cloud computing platform
CN111970176A (en) * 2020-10-21 2020-11-20 中国人民解放军国防科技大学 Data summarization method and equipment for IPv4 and IPv6 dual-stack networks
CN115174568A (en) * 2022-06-23 2022-10-11 南京信息工程大学 Attribute-based ciphertext retrieval method
CN117539884A (en) * 2024-01-10 2024-02-09 湖南科研云信息科技有限公司 Method and related device for associating and storing research and development process data of enterprise project

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140229489A1 (en) * 2011-05-24 2014-08-14 Acunu Limited Data Storage System

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140229489A1 (en) * 2011-05-24 2014-08-14 Acunu Limited Data Storage System

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EU-JIN GOH: "Secure Indexes", INTERNATIONAL ASSOCIATION FOR CRYPTOLOGIC RESEARCH,, vol. 20040316:200136, 16 March 2004 (2004-03-16), pages 1 - 18, XP061000695 *
STEVEN M BELLOVIN ET AL: "Privacy-Enhanced Searches Using Encrypted Bloom Filters", INTERNATIONAL ASSOCIATION FOR CRYPTOLOGIC RESEARCH,, vol. 20040201:185825, 1 February 2004 (2004-02-01), pages 1 - 12, XP061001131 *
XIONG SISI ET AL: "kBF: A Bloom Filter for key-value storage with an application on approximate state machines", IEEE INFOCOM 2014 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, IEEE, 27 April 2014 (2014-04-27), pages 1150 - 1158, XP032613595, DOI: 10.1109/INFOCOM.2014.6848046 *
YAO HANBING ET AL: "An Approach for Searching on Encrypted Data Based on Bloom Filter", DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING&SCIENCE (DCABES), 2012 11TH INTERNATIONAL SYMPOSIUM ON, IEEE, 19 October 2012 (2012-10-19), pages 301 - 304, XP032283550, ISBN: 978-1-4673-2630-8, DOI: 10.1109/DCABES.2012.40 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110113635A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method and system of automatic broadcasting PUSH message
CN110113635B (en) * 2019-04-25 2021-05-25 广州智伴人工智能科技有限公司 Method and system for automatically playing push message
CN110347680A (en) * 2019-06-21 2019-10-18 北京航空航天大学 A kind of space-time data indexing means towards high in the clouds environment
CN110347680B (en) * 2019-06-21 2021-11-12 北京航空航天大学 Space-time data indexing method for interpyury environment
CN110955901A (en) * 2019-10-12 2020-04-03 烽火通信科技股份有限公司 Storage method and server for virtual machine image file of cloud computing platform
CN111970176A (en) * 2020-10-21 2020-11-20 中国人民解放军国防科技大学 Data summarization method and equipment for IPv4 and IPv6 dual-stack networks
CN111970176B (en) * 2020-10-21 2021-01-15 中国人民解放军国防科技大学 Data summarization method and equipment for IPv4 and IPv6 dual-stack networks
CN115174568A (en) * 2022-06-23 2022-10-11 南京信息工程大学 Attribute-based ciphertext retrieval method
CN117539884A (en) * 2024-01-10 2024-02-09 湖南科研云信息科技有限公司 Method and related device for associating and storing research and development process data of enterprise project
CN117539884B (en) * 2024-01-10 2024-04-02 湖南科研云信息科技有限公司 Method and related device for associating and storing research and development process data of enterprise project

Similar Documents

Publication Publication Date Title
US10498706B2 (en) Searchable encryption enabling encrypted search based on document type
WO2018103830A1 (en) A method and system for searchable encrypted cloud storage of media data
US10097522B2 (en) Encrypted query-based access to data
US10013574B2 (en) Method and apparatus for secure storage and retrieval of encrypted files in public cloud-computing platforms
US10404669B2 (en) Wildcard search in encrypted text
US11366921B2 (en) Encrypting data records and processing encrypted records without exposing plaintext
US11379606B2 (en) Provision of risk information associated with compromised accounts
US20150365385A1 (en) Method and apparatus for securing sensitive data in a cloud storage system
US10037433B2 (en) Secure text retrieval
US20130179684A1 (en) Encrypted database system, client terminal, encrypted database server, natural joining method, and program
US20130290731A1 (en) Systems and methods for storing and verifying security information
US20160344553A1 (en) Storing and retrieving ciphertext in data storage
US20170300703A1 (en) Securely Processing Range Predicates on Cloud Databases
US10114900B2 (en) Methods and systems for generating probabilistically searchable messages
JP2012164031A (en) Data processor, data storage device, data processing method, data storage method and program
CN108400970A (en) Set of metadata of similar data message locking encryption De-weight method, cloud storage system in cloud environment
Rane et al. Multi-user multi-keyword privacy preserving ranked based search over encrypted cloud data
CN110062941B (en) Message transmission system, message transmission method, communication terminal, server device, and recording medium
US9596081B1 (en) Order preserving tokenization
Malik et al. A homomorphic approach for security and privacy preservation of Smart Airports
US20130290732A1 (en) Systems and methods for storing and verifying security information
CN104978536B (en) A kind of secret protection cloud image management system
CN105678185B (en) A kind of data security protection method and intelligent terminal management system
CN108319659B (en) Social contact discovery method based on encrypted image quick search
CN111030930B (en) Decentralized network data fragment transmission method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16812696

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16812696

Country of ref document: EP

Kind code of ref document: A1