EP3583518A1 - Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur - Google Patents
Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveurInfo
- Publication number
- EP3583518A1 EP3583518A1 EP18706792.1A EP18706792A EP3583518A1 EP 3583518 A1 EP3583518 A1 EP 3583518A1 EP 18706792 A EP18706792 A EP 18706792A EP 3583518 A1 EP3583518 A1 EP 3583518A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- encrypted
- server
- document
- encryption
- client equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000004044 response Effects 0.000 claims abstract description 29
- 238000004364 calculation method Methods 0.000 claims abstract description 16
- 230000005540 biological transmission Effects 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims 1
- 241000700605 Viruses Species 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000001010 compromised effect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011221 initial treatment Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/319—Inverted lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/11—Patent retrieval
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Definitions
- the field of the invention relates to searching information in a database in a form that preserves the confidentiality of data and queries.
- the application relates in particular to systems for processing personal data, and in particular health data.
- Databases are an integral part of many applications, such as financial applications and eHealth applications. Databases can be very sensitive, containing valuable data from a company or individuals. Theft of sensitive data is a growing concern for individuals, businesses and governments.
- Databases can be collections of raw files or managed using the database management system (DBMS), such as the Oracle database, MySQL, Microsoft SQL Server, and so on.
- DBMS database management system
- a database can be deployed on a server within a company, on a virtual server in a cloud, or on a DBMS service in a cloud. Data theft is a concern for every type of deployment.
- a database system can also be deployed by a company on a virtual server, which runs on a cloud like Amazon Elastic Compute Cloud (Amazon EC2).
- Amazon Elastic Compute Cloud Amazon Elastic Compute Cloud
- the virtual server that underlies the database is physically under the control of the cloud provider, and on the enterprise virtual server installs DBMS to manage their databases.
- data theft also occurs in this case, if the cloud infrastructure is compromised by attackers, infected with malware or viruses, and the company's database administrators could violate the confidentiality and integrity of the databases.
- cloud providers are not all trustworthy; they can steal database data from the virtual servers provided by them.
- the homomorphic encryption methods have notably been developed for search engine applications: the user sends an encrypted request to the search engine, without the latter being aware of the request received. It applies a conventional search operation for corresponding documents and returns the response to the user in an encrypted manner. Thus, the search engine never knows the content in clear the query.
- Another application relates to biometrics using a fingerprint database of persons authorized to perform an action, for example entering a protected building. These fingerprints are naturally encrypted because they are non-revocable personal data.
- the search tree is encrypted with a first private encryption key.
- the server receives a request from a client, the request comprising a set of keywords, in which each request term is encrypted with the first private encryption key.
- the search is performed using a query and evaluation at each node of the tree to determine if one or more matches exist. The answer is based on the match of keywords for each document and one or more nodes encrypted with the first private encryption key.
- European patent EP2865127 describes a homomorphic encryption for database interrogation. Numeric values are encrypted using keys and random numbers to produce encrypted text.
- the ciphertext is homomorphic and consists of two or more ciphered subtexts. Queries using addition, averaging and multiplication operations can be performed without decrypting the numerical values applicable to the query. Each encrypted subtext is stored in a single record and in separate attributes.
- the invention relates to methods for encryption and decryption, creating an appropriate table, querying such a database and updating such a database.
- solutions of the prior art have a major disadvantage resulting from the computing power necessary to run on the server the encryption processing homomorphic each indexing of a new document and each new request. For this reason, the solutions of the prior art are applicable only to very small corpora, for example a business directory or a small set of textual documents.
- the solutions of the prior art are limited to searching for documents on the basis of a binary criterion of presence or absence in the document of a term of the request, without making it possible to effectively propose a scheduling the relevance of the documents corresponding to the request.
- the method according to the invention proposes an effective solution to the search for information in a large encrypted corpus.
- the invention relates to a first aspect of a method for searching information in an encrypted corpus stored on a server, from a digital query calculated on a piece of equipment.
- client containing a sequence of terms, comprising the following steps:
- said first table TF ⁇ comprising, for each indexed term w i of the document, the number of occurrences of the word w in document i - said second table Adf L constituted by the index of words w i in the document
- An additional step executed on the client equipment aggregating said identifiers of the data contained in said encrypted response and in the df_A index recorded on the client equipment
- the method comprises a step of reconstitution on the client equipment of the index df_A from the encrypted information ⁇ Adfi ⁇ recorded for each document i in the dedicated space of the server assigned to the user A.
- the calculations performed on the server are implemented in parallel and / or distributed manner.
- the server (2) is constituted by a cloud platform (in English "cloud”).
- the invention also relates to a method for preparing a searchable database containing a sequence of terms, characterized in that it comprises the following steps: a) calculation steps on the client equipment, when introducing a new documentable document i, for each document i belonging to the corpus, a first table TF ⁇ and a second table Adf L
- said first table TF ⁇ comprising, for each indexed term w i of the document, the number of occurrences of the word w in document i
- said second table Adf L constituted by the presence or absence of each term w in the document ib) the encryption of the document i and said table
- Adf L Adf L
- Adf L the encryption by a homomorphic encryption method of said TF L table
- the invention also relates to a method of searching information in an encrypted corpus stored on a server, from a digital query calculated on a client equipment, containing a sequence of terms, characterized in that it comprises the following steps :
- FIG. 1 represents a schematic view of a computer system according to the invention
- FIG. 2 represents a schematic view of the data flows between the various computing resources.
- Hardware architecture Figure 1 shows a schematic view of the hardware architecture of the invention.
- It comprises computer equipment (1) client connected to a server (2) by a computer network, for example the Internet.
- the server (2) is associated with a memory (3) for the registration of a database.
- the server (2) comprises a processor for performing digital processing.
- the server (2) and the memories (3) are in a particular example constituted by a set of distributed resources, for example of the "cloud” type.
- Functional architecture constituted by a set of distributed resources, for example of the "cloud” type.
- Figure 2 shows an example of a functional architecture.
- the client equipment (1) performs the initial processing of a document i constituted by a digital file (9) stored in a working memory.
- each term of the document is subject to prior pretreatment by known means of radicalization ("stemming" in English), list of exclusions (deletion of common words (“stop list” in English) and any other usual linguistic treatment).
- the first task is to apply an encryption to the document i with a known cryptographic method, for example AES symmetric encryption and records an encrypted version (10) of this document on the client equipment, and optionally on the server (2) or a third-party storage service.
- a known cryptographic method for example AES symmetric encryption
- the corpus of encrypted documents thus defined forms the basis of documents (32).
- a second task executed in parallel or sequentially, consists in calculating an index of the occurrences of the terms present in the file (9), and in recording a table TF ⁇ (14) of the occurrences, in the form of a list of the terms W j present in the document i, each of the terms W j of this list being associated with a number corresponding to the occurrence tf lfj of the term W j in the document i.
- the TF ⁇ (14) table is therefore of type [W j ; tf lfj ] ⁇ j for a document i.
- a third task consists of calculating an Adf table L (15) corresponding, for each term w j, the presence or absence of the term in the document.
- This table Adf L (15) is therefore of type
- the encryption of the TF ⁇ (14) table is then carried out by a homomorphic encryption method, for example according to a method described in the article Zhou, H., & Wornell, G. (2014, February). Efficient homomorphic encryption on integer vectors and its applications. In Information Theory and Applications Workshop (ITA), 2014 (pp. 1-9). IEEE.
- the result of this encryption of the TF ⁇ (14) table is a set of encrypted data (11).
- Each set of encrypted data (11) is transmitted by the client equipment (1) to the server (2).
- the grouping of the encrypted data sets (11) constitutes an encrypted basis (30) of all ⁇ TF 1 ⁇ 1 .
- an encryption of the Adf L table (15) is carried out according to a known method, by AES example and the transmission to the server (2) to register an encrypted version (12) on the server (2).
- the set of encrypted files (12) stored on the server constitutes a base (31).
- Each encrypted file (12) recorded on the server (2) makes it possible to reconstruct a df_A (13) table by decryption by an inverse algorithm to that used for the encryption of above.
- This table df_A (13) is calculated only on the client equipment (1), from:
- This data preparation step leads to the recording on the server of data which are not directly queryable and which do not reveal significant information on the content or the documents, notably in case of server attack or action malicious of a privileged user. querying
- the request is made by issuing a textual request formed by a combination of words (20) from the client equipment (1).
- this request (20) is preprocessed by known means of radicalization type ("stemming" in English, list of exclusions (deletion of common words (“Stop list” in English) and any other usual language treatment.
- the request (20) is encrypted with the same homomorphic encryption method used for encrypting the TF ⁇ (14) table to obtain an encrypted request (21).
- the encrypted request (21) is transmitted to the server (2) which records to form a request (40).
- the server (2) calculates an encrypted response (41).
- This processing consists in calculating, in the encrypted domain, the number of occurrences of each term q k of the request (40) for each document i known.
- the client (1) is then able to decrypt the response (50) to calculate a decrypted response (51).
- the client (1) can combine the response (51) and the df_A (13) table to calculate a TF-IDF (52) score (52) (English Term Frequency-Inverse Document Frequency) according to a known method.
- This score (52) constitutes a classification key of the documents i in order of relevance with respect to the request (20).
- the client equipment (1) presents results in the manner of a search engine and allows the user to find the corresponding record.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1751241A FR3062936B1 (fr) | 2017-02-15 | 2017-02-15 | Procede de recherche d'informations dans un corpus chiffre stocke sur un serveur |
PCT/FR2018/050276 WO2018150119A1 (fr) | 2017-02-15 | 2018-02-05 | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3583518A1 true EP3583518A1 (fr) | 2019-12-25 |
Family
ID=59974493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18706792.1A Withdrawn EP3583518A1 (fr) | 2017-02-15 | 2018-02-05 | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur |
Country Status (5)
Country | Link |
---|---|
US (1) | US11308233B2 (fr) |
EP (1) | EP3583518A1 (fr) |
CA (1) | CA3050353A1 (fr) |
FR (1) | FR3062936B1 (fr) |
WO (1) | WO2018150119A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12099997B1 (en) | 2020-01-31 | 2024-09-24 | Steven Mark Hoffberg | Tokenized fungible liabilities |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007120360A2 (fr) * | 2005-12-29 | 2007-10-25 | Blue Jungle | Système de gestion d'informations |
US8544058B2 (en) * | 2005-12-29 | 2013-09-24 | Nextlabs, Inc. | Techniques of transforming policies to enforce control in an information management system |
US7877781B2 (en) * | 2005-12-29 | 2011-01-25 | Nextlabs, Inc. | Enforcing universal access control in an information management system |
US7716240B2 (en) * | 2005-12-29 | 2010-05-11 | Nextlabs, Inc. | Techniques and system to deploy policies intelligently |
US8966250B2 (en) * | 2008-09-08 | 2015-02-24 | Salesforce.Com, Inc. | Appliance, system, method and corresponding software components for encrypting and processing data |
US20100146299A1 (en) * | 2008-10-29 | 2010-06-10 | Ashwin Swaminathan | System and method for confidentiality-preserving rank-ordered search |
US8904171B2 (en) * | 2011-12-30 | 2014-12-02 | Ricoh Co., Ltd. | Secure search and retrieval |
EP2865127A4 (fr) | 2012-06-22 | 2016-03-09 | Commw Scient Ind Res Org | Cryptage homomorphe pour interrogation de base de données |
EP2709028A1 (fr) * | 2012-09-14 | 2014-03-19 | Ecole Polytechnique Fédérale de Lausanne (EPFL) | Technologies renforçant la protection de la vie privée pour tests médicaux à l'aide de données génomiques |
US9536047B2 (en) * | 2012-09-14 | 2017-01-03 | Ecole Polytechnique Federale De Lausanne (Epfl) | Privacy-enhancing technologies for medical tests using genomic data |
WO2015017787A2 (fr) * | 2013-08-01 | 2015-02-05 | Visa International Service Association | Systèmes, procédés et appareils pour opérations de bases de données homomorphiques |
US9501661B2 (en) * | 2014-06-10 | 2016-11-22 | Salesforce.Com, Inc. | Systems and methods for implementing an encrypted search index |
US10037433B2 (en) * | 2015-04-03 | 2018-07-31 | Ntt Docomo Inc. | Secure text retrieval |
US20170293913A1 (en) * | 2016-04-12 | 2017-10-12 | The Governing Council Of The University Of Toronto | System and methods for validating and performing operations on homomorphically encrypted data |
US10783270B2 (en) * | 2018-08-30 | 2020-09-22 | Netskope, Inc. | Methods and systems for securing and retrieving sensitive data using indexable databases |
-
2017
- 2017-02-15 FR FR1751241A patent/FR3062936B1/fr active Active
-
2018
- 2018-02-05 EP EP18706792.1A patent/EP3583518A1/fr not_active Withdrawn
- 2018-02-05 US US16/483,684 patent/US11308233B2/en active Active
- 2018-02-05 WO PCT/FR2018/050276 patent/WO2018150119A1/fr unknown
- 2018-02-05 CA CA3050353A patent/CA3050353A1/fr active Pending
Also Published As
Publication number | Publication date |
---|---|
FR3062936B1 (fr) | 2021-01-01 |
WO2018150119A1 (fr) | 2018-08-23 |
US11308233B2 (en) | 2022-04-19 |
FR3062936A1 (fr) | 2018-08-17 |
US20200019723A1 (en) | 2020-01-16 |
CA3050353A1 (fr) | 2018-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10498706B2 (en) | Searchable encryption enabling encrypted search based on document type | |
US9825925B2 (en) | Method and apparatus for securing sensitive data in a cloud storage system | |
CN106326360B (zh) | 一种云环境中密文数据的模糊多关键词检索方法 | |
Zhang et al. | Pop: Privacy-preserving outsourced photo sharing and searching for mobile devices | |
US10404669B2 (en) | Wildcard search in encrypted text | |
US20140108435A1 (en) | Secure private database querying system with content hiding bloom fiters | |
US20130159694A1 (en) | Document processing method and system | |
CA2778847C (fr) | Identification par controle de donnees biometriques d'utilisateur | |
Cheng et al. | Person re-identification over encrypted outsourced surveillance videos | |
US20230306131A1 (en) | Systems and methods for tracking propagation of sensitive data | |
US20210184840A1 (en) | Encrypted Search with a Public Key | |
Cui et al. | Harnessing encrypted data in cloud for secure and efficient image sharing from mobile devices | |
Heen et al. | On the privacy impacts of publicly leaked password databases | |
EP3583518A1 (fr) | Procédé de recherche d'informations dans un corpus chiffre stocke sur un serveur | |
EP3461055B1 (fr) | Système et procédé pour assurer l'annotation externalisée sécurisée d'ensembles de données | |
Alamri et al. | Secure sharing of health data over cloud | |
Poon et al. | Privacy-aware search and computation over encrypted data stores | |
Boldyreva et al. | Masking fuzzy-searchable public databases | |
Surrah | Multi Keyword Retrieval On Secured Cloud | |
Malhotra et al. | An efficacy analysis of data encryption architecture for cloud platform | |
Sah et al. | Preserving Data Privacy with Record Retrieval using Visual Cryptography and Encryption Techniques | |
Agarwal et al. | Privacy Preserving content-based image retrieval using Cloud Computing | |
Rajendran et al. | An Efficient Ranked Multi-Keyword Search for Multiple Data Owners Over Encrypted Cloud Data: Survey | |
Zhang et al. | Outsource photo sharing and searching for mobile devices with privacy protection | |
Thomas et al. | Image De-Duplication by using Tin Eye Match Service Engine in Cloud Computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20190809 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20211125 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20231130 |