WO2015136801A1

WO2015136801A1 - Information management system

Info

Publication number: WO2015136801A1
Application number: PCT/JP2014/082682
Authority: WO
Inventors: 佐藤　恵一
Original assignee: 株式会社日立ソリューションズ
Priority date: 2014-03-13
Filing date: 2014-12-10
Publication date: 2015-09-17
Also published as: JP2015187828A; JP6250497B2

Abstract

This information management system comprises a first database and a second database provided in one or multiple data centers. The first database uses a searchable encryption technique to store data encrypted in a format that cannot be decrypted inside of the aforementioned data center, and the second database stores data associated with the data stored in the first database.

Description

Information management system

Import by reference

This application is a Japanese application filed on March 13, 2014 (Japanese Patent Application No. 2014-049869) and a Japanese application filed on July 31, 2014 (2014). The priority of Japanese Patent Application No. 2014-155671 is claimed, and the contents thereof are incorporated into the present application by reference.

The present invention relates to a database, and more particularly to a technique for improving information management security by applying searchable encryption.

In recent years, cloud computing has been rapidly spreading to consolidate inexpensive services and information. However, security and privacy considerations in cloud computing are not always sufficient.

Searchable encryption technology is known as a security technology in cloud computing. The searchable encryption technique is a technique that enables searching while encrypting data stored in a storage device, and is a typical security technique in cloud computing. With searchable encryption technology, the database itself is encrypted, and search processing can be executed without decrypting the data in the data center, eliminating the security vulnerabilities of the conventional technology. Yes. For this reason, it attracts attention as a technology that strictly considers security and privacy.

There is JP 2012-123614 A as background art of this technical field. In the searchable cryptographic processing system described in Japanese Patent Application Laid-Open No. 2012-123614, a DB server that deposits data, a registration client that deposits data in the DB server, and a search client that causes the DB server to retrieve data cooperate with each other via a network. A searchable cryptographic processing system, wherein a registration client deposits encrypted data in a server using a probabilistic encryption method using a mask using a hash value and a homomorphic function, and the search client Use probabilistic encryption with masks that use homomorphic functions for encryption, do not let the DB server release the mask, and make sure that the appearance frequency of the data that corresponds to the search does not leak to the DB server To output the data not to be searched

However, the technique described in Japanese Patent Application Laid-Open No. 2012-123614 has a problem that it requires a large processing time and a large amount of computer resources due to a heavy load of search processing and analysis processing that are executed while encrypted. In addition, there is a problem that it is difficult to execute a complex analysis with data encrypted, or that many computer resources are required. For this reason, in a conventional data center, it is not easy to secure computer resources, which increases the calculation cost and hinders the practical application of searchable encryption technology.

In order to solve the above-described problems, the present invention separates information that can identify an individual from information that cannot be identified by itself, and stores the information in a form that considers security and privacy.

A typical example of the invention disclosed in the present application is as follows. That is, an information management system in which a first database and a second database are provided in one or a plurality of data centers, and the first database is decrypted in the data center using a searchable encryption technique. Data encrypted in an incapable format is stored, and the second database stores data associated with data stored in the first database.

According to the representative embodiment of the present invention, it is possible to solve security and privacy problems. In addition, computer resources can be reduced. Problems, configurations, and effects other than those described above will become apparent from the description of the following embodiments.

It is a figure which shows the structure of the information management system of embodiment of this invention. It is a figure which shows the structural example of the database of embodiment of this invention. It is a figure which shows the structural example of the database of embodiment of this invention. It is a figure which shows acquisition of the data from the database of embodiment of this invention. It is a figure which shows acquisition of the data from the database of embodiment of this invention. It is a figure which shows the sharing of the data by the terminal of embodiment of this invention. It is a figure which shows the update of the data stored in the database of embodiment of this invention. It is a figure which shows the update of the encryption information stored in the database of embodiment of this invention. It is a figure which shows the form which stores sensitive information in the database of embodiment of this invention.

Hereinafter, one mode for carrying out the present invention will be specifically described with reference to the drawings.

FIG. 1 is a diagram showing a configuration of an information management system according to an embodiment of the present invention.

The information management system according to this embodiment includes a plurality of

databases

1 and 2 and a terminal (PC) 3.

The

databases

1 and 2 include a nonvolatile storage device (or volatile storage device) that stores data, and a database management system (DBMS) that controls input / output and update of data stored in the sex storage device. Is done. The database management system is executed on a computer resource having a processor and a memory.

The database 1 stores data encrypted by applying a searchable encryption technique in a storage device, and outputs the data without being decrypted in the data center 10. The database 2 is a database in which normal plain text data is stored. The database 2 may be a database to which a normal encryption technique is applied (that is, encrypted data is stored in a storage device, but data decrypted in the data center 10 is output).

As shown in FIG. 2, the database 1 configured in this way stores data whose contents should be concealed, for example, data (name, address, etc.) that can identify a specific individual. Further, as shown in FIG. 3, the database 2 stores data associated with the data stored in the database 1, for example, data (such as a health check result) that cannot identify an individual by itself.

FIG. 1 shows only one

database

1 and 2 respectively, but each may include a plurality of databases. In this case, as will be described later, a plurality of pieces of association information are recorded in the database 1 in order to associate the databases.

Databases

1 and 2 are accommodated in one data center 10. The data center 10 is a group of computers that are distributed in a plurality of bases and connected by a network in addition to a group of computers accommodated in a single base. More specifically, the resources of the computer and the storage device It is a cloud service that provides a part of Further, the data center 10 may be one installed in equipment managed by a user (for example, a company) (so-called on-premises).

Furthermore, the

databases

1 and 2 may be distributed and accommodated in a plurality of data centers 10. That is, another business operator may manage the

databases

1 and 2.

If the data stored in the

databases

1 and 2 are associated with each other, personal information is managed and grasped centrally. However, according to the database structure of the present embodiment, the data stored in the database 1 is encrypted, and the encrypted data is output from the database 1. Even if both are accommodated, personal information is not managed centrally.

Although not shown, the data center 10 may be provided with a server for controlling the

databases

1 and 2 and a server for providing an application to the terminal. The server may provide a service using data stored in the

databases

1 and 2.

The terminal 3 is a computer having a processor, a memory, a network interface, and a user interface (keyboard, mouse, display device, etc.). In addition to a personal computer (PC), the computer may be a tablet terminal, a smartphone, or the like. Although only one terminal 3 is shown in FIG. 1, a plurality of terminals 3 are usually provided as shown in FIG.

The terminal 3 may operate a dedicated application program for accessing the

databases

1 and 2 or a web browser. The dedicated application program and the web browser manage a key for the terminal 3 to access the

databases

1 and 2 and transmit a request to a server in the data center 10. As will be described later, a plurality of keys for the terminal 3 to access the database 1 are prepared depending on the type of access permitted (only search, search and decryption). Moreover, the key for access may be the same for each organization, or may be different for each user. Key management can be determined by information managers and users.

The data center 10 and the terminal 3 are connected by a network 4.

When newly registering data from the terminal 3, a personal ID for storing data in the database 1 and a data ID for storing data in the database 2 are issued. This ID may be issued by the terminal 3 or the data center 10 (DBMS).

The terminal 3 separates the input data into data stored in the database 1 and data stored in the database 2 according to a preset separation rule (that is, a database definition table). The terminal 3 transmits a write request including the separated data and the issued ID to the

databases

1 and 2.

Databases

1 and 2 store data in accordance with a write request from terminal 3.

FIG. 2 is a diagram illustrating a configuration example of the database 1 of the present embodiment.

The database 1 stores personal information such as a personal ID 101, a name 102, a date of birth 103, an address 104, an e-mail address 105, a telephone number 106, etc. that uniquely identify an individual such as an individual. The database 1 encrypts these data by applying a searchable encryption technique, stores the data in a storage device, and outputs the data without being decrypted in the data center 10. When the information management system of this embodiment is used in a company, personnel information may be stored in the database 1.

Database 1 stores data ID 107 for associating with data stored in database 2. As the data ID 107, a data ID corresponding to the data ID 201 of the database 2 (for example, a matching data ID) is used. When the database 1 associates the personal ID 101 with the data ID 107, the database 1 and the database 2 can be associated with each other.

When a plurality of

databases

1 and 2 are associated, a plurality of data IDs 107 for associating databases and data are recorded in the database 1.

The database 1 may store a password and an access key for accessing the database 1. In addition, the encrypted data stored in the database 1 can be further improved by changing and updating the encrypted data every time the database 1 is accessed. More specifically, at the time of data registration, the encrypted data is generated differently, and the encrypted data is rewritten for each access. Details of this will be described later with reference to FIG.

FIG. 3 is a diagram illustrating a configuration example of the database 2 according to the present embodiment.

A database 2 shown in FIG. 3 is a database for storing personal health information, and includes a data ID 1231 for uniquely identifying data, a data registration date 1232, and health check results (height 1233, weight 1234, BMI 1235, blood pressure 1236, The blood glucose level 1237) and the like are stored. Thus, the database 2 does not store personal information that can identify an individual. The database 2 may store a password and access key for accessing the database 2.

Conventionally, it has been appropriate to manage the information that associates the database 1 and the database 2 together with the database 2. However, since the association information is encrypted and stored in the database 1, the database 1 and the database 2 are managed centrally. Even so, personal information is not managed centrally.

Management of information for associating data stored in the database 1 with data stored in the database 2 is important in separating personal information. The management of the association information determines whether the data management is connectable anonymization or non-connectable anonymization. For this reason, for example, ethical guidelines in the clinical research field require strict management of association information. More specifically, if the association information cannot be accessed or discarded, an individual corresponding to the data stored in the database 2 cannot be identified, and connection anonymization is performed. On the other hand, if the association information can be accessed, an individual corresponding to the data stored in the database 2 can be identified, and the connection can be made anonymous. For this reason, in the anonymization that can be connected, strict management of association information becomes a heavy burden.

In this embodiment, since the association information is searchable encrypted and stored in the database 1, the association information is encrypted in a format that cannot be decrypted in the data center 10 and stored in the database 1. For this reason, it can be said that the association information is stored outside the data center 10. Since no association information is stored in the data center 10, personal information can be managed safely.

4 and 5 are diagrams showing data acquisition from the database according to the present embodiment.

The data may be acquired from the database 1 after being acquired from the database 1, or may be acquired from the database 1 after being acquired from the database 2. FIG. 4 shows a method for acquiring data first from the database 1, and FIG. 5 shows a method for acquiring data first from the database 2. The application to be executed by the terminal 3 selects which database the data is to be acquired first, and controls the transmission of the request to each database.

When acquiring data first from the database 1, as shown in FIG. 4, the terminal 3 transmits a request to the database 1 and acquires data stored in the database 1 by search processing. The data acquired from the database 1 includes a data ID 107 for specifying the data stored in the database 2. The terminal 3 transmits a request to the database 2 using the acquired data ID, and acquires data stored in the database 2 by a search process. Using the acquired data ID 107, the terminal 3 combines the data acquired from the database 1 and the data acquired from the database 2 and displays them on the display screen.

On the other hand, when data is first acquired from the database 2, as shown in FIG. 5, the terminal 3 transmits a request to the database 2, and acquires data stored in the database 2 by a search process. The terminal 3 uses the data ID 107 acquired from the database 2 to transmit a request to the database 1 and acquires data stored in the database 1 by a search process. Using the acquired data ID 107, the terminal 3 combines the data acquired from the database 1 and the data acquired from the database 2 and displays them on the display screen.

The data acquired from the database 1 is further narrowed down using the data stored in the database 2, and the data acquired from the database 2 is further narrowed down using the data stored in the database 1. The narrowing down using the database may be repeated a plurality of times.

Since the database 1 and the database 2 are arranged in different data centers, the data from the database 1 and the data from the database 2 are transferred through different routes on the network 4, so that security and privacy are further increased. Can be improved. Even if the database 1 and the database 2 are arranged in one data center 10 as in the present embodiment, the data stored in the database 1 and the data stored in the database 2 are linked in the data center 10. Since this is not possible, the same security and privacy as when the database 1 and the database 2 are arranged in different data centers can be ensured.

FIG. 6 is a diagram illustrating data sharing by a plurality of terminals 3 according to the present embodiment.

The terminal 31 has a key for accessing the database 1, and accesses the database 1 to acquire data stored in the database 1 (601). Then, the database 2 is accessed using the data acquired from the database 1 (602), and the data stored in the database 2 is acquired (603). Thereafter, the terminal 31 combines the personal information acquired from the database 1 and the information acquired from the database 2, and displays the information acquired from the database 2 on the display device in a form that allows an individual to be specified.

On the other hand, since the terminal 32 does not have a key for accessing the database 1 and cannot access the database 1, the data stored in the database 1 cannot be acquired (611). However, data stored in database 2 can be obtained by accessing database 2 (612). Since the terminal 32 cannot combine the personal information acquired from the database 1 and the information acquired from the database 2, the information acquired from the database 2 is displayed on the display device in a form in which the individual cannot be specified (for example, a form capable of statistical processing). To do.

Thus, since the terminal 31 has a key for accessing the database 1, it can acquire data including personal information. On the other hand, since the terminal 32 does not have a key for accessing the database 1, it can acquire only data that cannot identify an individual. Personal information can be concealed by changing the authority of the terminal depending on the application.

Note that the keys of the

terminals

31 and 32 may be set for each terminal or may be set corresponding to the user (login ID) of the terminal.

The keys for accessing the database 1 may be divided into a plurality of types of keys having different authority levels, for example, an access key that can be searched and decrypted and an access key that can only be searched. Next, the access to the database when the terminal 3 has an access key that can only be searched will be described with reference to FIG.

FIG. 7 is a diagram showing an update of data stored in the database 2 using the search result of the database 1 according to the embodiment of the present invention.

The terminal 3 transmits a request to the database 1 and obtains a search result from the database 1 (701). Since the terminal 3 has an access key that allows the database 1 to be searched, only the information indicating whether or not the data matching the search condition is stored in the database 1 can be acquired (702). When data matching the search condition is stored in the database 1, the database 1 transmits a key for accessing the database 2 to the terminal 3. The key for accessing the database 2 includes information that can identify one or a plurality of searched people.

The terminal 3 can access the database 2 using the access key acquired from the database 1 and acquire data of one or a plurality of persons specified in the search stored in the database 2 (703). ).

Since the terminal 3 has only an access key capable of searching the database 1, the terminal 3 is not associated with the data stored in the database 1 and the data stored in the database 2, that is, without specifying personal information. , You can access data that matches your search criteria.

More specifically, when the database 1 is searched using the personal information as a search condition, and data matching the search condition is stored in the database 1, the password stored in the database 2 (access to the database 2 is accessed). Password).

FIG. 8 is a diagram showing the update of the encryption information stored in the database 1 according to the embodiment of the present invention.

8, the encrypted data stored in the database 1 is updated every time the database 1 is accessed (for example, the database 1 is searched and data is acquired from the database 1). Specifically, every time the database 1 is accessed, the seed of encryption generation is changed (for example, a new hash value is generated by a pseudo random number generator). In the example shown in FIG. 8, the encrypted data to be updated is an ID and password for accessing the database 1, but other parts of the database 1 may be updated or the entire database 1 may be updated. .

Further, not only the encrypted data stored in the database 1 but also the key for accessing the database 1 may be changed every time the database 1 is accessed. Then, the encrypted data stored in the database 1 is updated with a change in a key (for example, a secret key) for accessing the database 1. Thereby, the key management risk on the terminal 3 side can be reduced.

As described above, in the form shown in FIG. 8, every time the database 1 is accessed, different encrypted data is generated from the same data to be encrypted, so the information in the database 1 is dynamically updated. For this reason, even if the data stored in the database 1 is leaked, the leaked data cannot be easily analyzed, and the risk at the time of information leak can be reduced.

FIG. 9 is a diagram showing a form in which sensitive information is stored in the database 1 according to the embodiment of the present invention.

In the above-described aspect, the database 1 stores data (name, address, etc.) that can identify a specific individual as data whose contents should be kept secret. In the aspect shown in FIG. 9, the database 1 stores sensitive information in addition to data whose contents should be kept secret. In this embodiment, the sensitive information is information such as a credit card number or genetic information that requires special attention for handling even if an individual is not specified, and has a great influence at the time of leakage.

The terminal 3 transmits a request to the database 1 and obtains a search result from the database 1 (701). Since the terminal 3 has an access key that allows the database 1 to be searched, only the information indicating whether or not the data matching the search condition is stored in the database 1 can be acquired (702). When data matching the search condition is stored in the database 1, the database 1 transmits other sensitive information stored in the database 1 to the terminal 3 (903).

More specifically, the personal information is searched in the database 1 using the key as a key, and the terminal 3 can receive the encrypted information only when it matches. By decoding the sensitive information at the terminal 3, the sensitive information can be acquired by the terminal 3.

For example, if the terminal 3 is a web server that provides an EC site (electronic commerce site), the product selection stage accesses the database 2 to confirm the contents of the shopping cart, and the settlement method finalization stage accesses the database 1. Thus, payment information (credit card information) can be acquired and payment information can be transmitted from the terminal 3 to the user.

As described above, in the form shown in FIG. 9, the data center 10 can reduce information management risk by encrypting and managing sensitive information in the database 1.

As described above, in the information management system of the present embodiment, data that should be considered for security and privacy (for example, information related to a living individual, such as a name, date of birth, and other descriptions) Identifiable information) and other data that cannot be identified by themselves. Data that can be identified is stored protected by searchable encryption, and other data is stored in plaintext or normal encryption. To keep it separate from personally identifiable information. For this reason, data that can identify a specific individual cannot be decrypted in the data center, and only information that cannot be identified by itself can be decrypted in the data center. Data can be shared and used.

Also, by protecting only the minimum necessary information with searchable encryption, the amount of searchable encrypted data can be reduced, processing time can be reduced, and computer resources can be reduced. The amount of reduction in processing time and the amount of reduction in computer resources depends on the amount of searchable encrypted data. In fact, the amount of information that cannot identify an individual alone is sufficiently large compared to information that can identify a specific individual, and a great reduction and reduction can be expected. In addition, data stored in plaintext or with normal encryption (information that cannot be used to identify individuals alone) can be analyzed and analyzed with a small amount of computer resources. It can be handled in the same way as the handling of information that has been anonymized several times, such as conversion and secondary anonymization.

Further, when the

databases

1 and 2 are accommodated in one data center 10, data that can identify an individual stored in the database 1 and data that cannot identify an individual stored in the database 2 can be easily combined. However, according to the present embodiment, security and privacy problems can be solved even if data is shared in such a case.

It should be noted that the information stored in the database 1 and the information stored in the database 2 can be arbitrarily set in consideration of the degree to be concealed and the influence on performance. Further, the database does not need to be separated as hardware or software, and may be in a format in which the schema and data string are separated in the same hardware or software.

In the conventional cloud, all data is transferred on the same route of the network. However, the database 1 and the database 2 are arranged in different data centers to separate the data transfer route, and the encryption on the transfer route is performed. (SSL, VPN, etc.), and by applying searchable encryption, multiple encryption and information paths can be separated. Security and privacy considerations can be realized.

In addition, when the data stored in the database 2 is provided to a third party, the data stored in the database 2 may be provided as it is. Further, by providing k-anonymization technology and the like, it is possible to smoothly provide information with improved security and privacy in consideration of re-identification risk to a third party.

The present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described. A part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, you may add the structure of another Example to the structure of a certain Example. In addition, for a part of the configuration of each embodiment, another configuration may be added, deleted, or replaced.

In addition, each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.

Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.

Also, the control lines and information lines indicate what is considered necessary for the explanation, and do not necessarily indicate all control lines and information lines necessary for mounting. In practice, it can be considered that almost all the components are connected to each other.

Claims

An information management system in which the first database and the second database are provided in one or a plurality of data centers,
The first database stores data encrypted in a format that cannot be decrypted in the data center using a searchable encryption technique;
The information management system, wherein the second database stores data associated with data stored in the first database.
The information management system according to claim 1,
The information management system according to claim 1, wherein the first database and the second database are provided in one data center.
The information management system according to claim 1,
The data stored in the first database is personally identifiable data,
The data stored in the second database is data that cannot identify an individual by itself.
The information management system according to claim 1,
A terminal capable of accessing the first database and the second database;
The terminal
You can search the data stored in the first database, but have a key that can not be obtained,
By sending a request to the first database to obtain information as to whether or not the search target data is stored in the first database;
An information management system, wherein access to the second database is permitted when information indicating that the search target data is stored is obtained from the first database.
The information management system according to claim 1,
The information management system, wherein the first database stores information for associating data stored in the first database with data stored in the second database.
The information management system according to claim 1,
The information for associating the data stored in the first database with the data stored in the second database is encrypted in a format that cannot be decrypted in the data center, and the first database or the second database Information management system characterized by being stored in
The information management system according to claim 1,
The encrypted data stored in the first database is updated to different encrypted data that can be decrypted into the same data to be encrypted at the timing of access to the first database. .