WO2015155896A1 - サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 - Google Patents
サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 Download PDFInfo
- Publication number
- WO2015155896A1 WO2015155896A1 PCT/JP2014/060533 JP2014060533W WO2015155896A1 WO 2015155896 A1 WO2015155896 A1 WO 2015155896A1 JP 2014060533 W JP2014060533 W JP 2014060533W WO 2015155896 A1 WO2015155896 A1 WO 2015155896A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- learning
- support vector
- vector machine
- machine learning
- label
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
Definitions
- the present invention relates to a support vector machine learning system and a support vector machine learning method.
- Non-Patent Document 1 when performing support vector machine learning, an analysis requester linearly converts a feature vector into one random matrix and provides it to an analysis performer, and learning is performed using reduced SVM. Yes.
- Non-Patent Document 1 whether the label is positive or negative is given, so that the analysis performer can grasp what classification is realized. End up.
- linear transformation is used for concealing feature vectors, if feature vectors before and after transformation can be associated with the same number of dimensions as the feature vector space, linear transformation from feature vectors after linear transformation is possible. The feature vector before conversion can be specified.
- the present invention has been made in view of such a background, and provides a support vector machine learning system and a support vector machine learning method capable of securely concealing the label of a teacher signal when performing support vector machine learning.
- the purpose is to do.
- a main invention of the present invention for solving the above problems is a system for performing support vector machine learning, comprising a learning data management device and a learning device, wherein the learning data management device is the support vector machine learning.
- a learning data storage unit that stores a set of learning data including a label and a feature vector, an encryption processing unit that encrypts the label of the learning data by an additive homomorphic encryption method, and the encryption
- a learning data transmitting unit that transmits encrypted learning data including the label and the feature vector to the learning device, the learning device receiving a learning data receiving unit that receives the encrypted learning data;
- An update processing unit that performs an update process by a gradient method using an additive homomorphic addition algorithm for encrypted learning data; And be provided.
- the data learning analysis system of this embodiment encrypts data (learning data) used for learning, (b) By adding dummy data to the set of learning data, the label is surely concealed and the security is increased.
- the additive homomorphic encryption method used in the present embodiment is an encryption algorithm having additiveness among encryption methods having homomorphism (a public key encryption method is assumed in the present embodiment).
- an additive homomorphic encryption scheme has additiveness between ciphertexts in addition to asymmetry with respect to an encryption key and a decryption key that a normal public key cryptosystem has. That is, for two ciphertexts, the plaintext is the arithmetic sum of the plaintexts for the two ciphertexts (hereinafter referred to as addition or simply sum, and the operator used for the arithmetic sum is also expressed as “+”). Can be calculated using only public information (without using a secret key or plaintext).
- E (m) E (m 1 + m 2 ) holds. Also in the following description, E (m) represents a ciphertext of plaintext m.
- Additive homomorphic encryption private key / public key generation algorithm This refers to a secret key / public key generation algorithm defined in the additive homomorphic encryption algorithm described above.
- a security parameter and a key seed are input as commands, and a secret / public key having a specific bit length is output.
- Additive homomorphic encryption encryption algorithm This refers to the encryption algorithm defined in the additive homomorphic encryption algorithm described above.
- the additive homomorphic encryption algorithm receives plaintext and public key as input and outputs ciphertext.
- Additive homomorphic encryption / decryption algorithm This refers to a decryption algorithm defined by the additive homomorphic encryption algorithm described above.
- the additive homomorphic encryption / decryption algorithm receives a ciphertext and a secret key, and outputs a plaintext corresponding to the ciphertext.
- Additive homomorphic encryption addition algorithm This refers to an algorithm that realizes an addition operation between ciphertexts defined by the additive homomorphic encryption algorithm described above.
- a plurality of ciphertexts are input as commands, and a ciphertext corresponding to the sum of plaintexts of the plurality of ciphertexts is output.
- the ciphertext E (100) corresponding to 100 and the ciphertext E (200) corresponding to 200 are input as commands
- the ciphertext E (300) corresponding to 300 (100 + 200) is output.
- a learning data set D ⁇ (x i , y i )
- each x i vector called a feature vector.
- Y i is a class for classifying data by the pattern discriminator (see FIG. 1), and is called a label.
- a learning data set (hard margin problem) that can be separated by a hyperplane or a hypersurface as shown in FIG. 3 will be described, but the present invention is not limited to this, and separation is performed. The same method can be applied to cases where this is not possible (soft margin problem).
- description is made using an example that can be separated on a hyperplane, but the present invention is not limited to this, and an example that can be separated on a non-linear hypersurface using an existing kernel method. Is also applicable.
- the gradient method is an algorithm for searching for a solution based on information on the gradient of a function in an optimization problem. For the SVM problem, the optimum solution (a 1 , a 2 ,..., A n ) that maximizes the objective function L is obtained by the gradient method.
- the i-th component L ′ i of the gradient vector of the function L is Therefore, the coefficients (a 1 , a 2 ,..., A n ) are recursively updated at the update rate ⁇ by the gradient method. It is possible to obtain an optimal solution or its approximate solution by updating.
- (A) Encryption of learning data In this embodiment, after label y i of learning data is encrypted, it is given to the analysis execution apparatus 200 that executes SVM learning. As a result, the contents of label y i (whether it is +1 or ⁇ 1) are concealed from the analysis execution apparatus 200 side. Since the contents of the label y i are concealed, it becomes difficult for the analysis execution apparatus 200 to give meaningful meaning to the learning data.
- the additive homomorphic encryption method is used for the encryption algorithm.
- the encrypted data encrypted by the additive homomorphic encryption method can be added with the ciphertext as encrypted (without decryption), and the result of decrypting the added ciphertext is , Corresponding to the result of adding the corresponding plaintext.
- the above update equation (4) can be modified as the following equation (5).
- the analysis execution apparatus 200 performs SVM learning using the above equation (6) as an update equation.
- SVM learning can be performed while the ciphertext E (y i ) remains without giving plaintext to the analysis execution apparatus 200 with respect to the label y i .
- the reciprocal update is performed twice or more using the update equation (6), and the ciphertext E (y) is multiplied. Is required. Therefore, in this embodiment, the update process is performed only once.
- (B) Addition of dummy data In this embodiment, dummy data is added to a set of learning data. As a result, it becomes difficult for the analysis execution apparatus 200 to which the learning data set is given to guess the significance of the learning data using, for example, the bias of the distribution of the learning data.
- the dummy data added to the learning data set is given a label y i of 0 which is neither +1 nor -1.
- the term related to the label y i of the dummy data becomes 0 on the right side of the update formula (5), and there is no influence on the update formula (5).
- This is also equivalent in the update formula (6) using an additive additive homomorphic encryption method.
- the label is encrypted on the analysis performer side, it can be determined that it is not possible to determine whether or not the learning data is dummy data. In addition, it is possible to make the meaning of the learning data more difficult by adding dummy data so that the set of learning data approaches a uniform distribution.
- FIG. 2 is a schematic diagram of a data learning analysis system according to an embodiment of the present invention.
- the data learning analysis system of the present embodiment includes an analysis requesting device 100 and an analysis execution device 200.
- the analysis requesting apparatus 100 is a computer that manages learning data.
- the analysis execution device 200 is a computer that performs processing related to SVM learning.
- the analysis request device 100 and the analysis execution device 200 are designed to be able to send and receive information to and from each other via the network 300.
- the network 300 is, for example, the Internet or a LAN (Local Area Network), and is constructed by, for example, Ethernet (registered trademark), an optical fiber, a wireless communication path, a public telephone line network, a dedicated telephone line network, or the like.
- the analysis requesting apparatus 100 transmits a set of learning data to the analysis execution apparatus 200 via the network 300.
- the analysis execution apparatus 200 performs SVM learning on the learning data received from the analysis requesting apparatus 100, and the result of SVM learning ( Hereinafter, the learning result is transmitted to the analysis requesting apparatus 100 via the network 300.
- the analysis requesting apparatus 100 generates a pattern classifier using the learning result.
- FIG. 3 is a hardware schematic diagram of the analysis requesting apparatus 100.
- the analysis request apparatus 100 includes a CPU 101, an auxiliary storage device 102, a memory 103, a display device 105, an input / output interface 106, and a communication device 107 connected by an internal signal line 104. Configured.
- the auxiliary storage device 102 stores program codes. The program code is loaded into the memory 103 and executed by the CPU 101.
- analysis execution apparatus 200 has the same hardware configuration as that shown in FIG.
- FIG. 4 is a software schematic diagram of the analysis requesting apparatus 100.
- the analysis requesting apparatus 100 includes a learning data storage unit 121, a dummy data storage unit 122, a dummy data addition processing unit 123, an encryption processing unit 124, a learning data transmission unit 125, a learning result reception unit 126, a decryption processing unit 127, and a pattern.
- a discriminator generation unit 128 is provided.
- the learning data storage unit 121 and the dummy data storage unit 122 are realized as part of a storage area provided by the auxiliary storage device 102 and the memory 103 included in the analysis requesting apparatus 100, and include a dummy data addition processing unit 123, an encryption processing unit 124, The learning data transmitting unit 125, the learning result receiving unit 126, the decoding processing unit 127, and the pattern discriminator generating unit 128 load the program code stored in the auxiliary storage device 102 into the memory 103 by the CPU 101 included in the analysis requesting device 100. It is realized by executing.
- the learning data storage unit 121 stores a learning data set D.
- x i ⁇ R m , y i ⁇ ⁇ 1, 1 ⁇ i 1, 2,..., N ⁇ It is expressed.
- the dummy data addition processing unit 123 adds dummy data to the learning data set D.
- the dummy data is data including a label y of “0”.
- the dummy data addition processing unit 123 adds dummy data so that the feature vectors included in the learning data collection D have a uniform distribution in the feature space.
- the dummy data addition processing unit 123 may receive an input of a feature vector having a uniform distribution from the user. For example, the dummy data addition processing unit 123 divides the feature space into sections and determines the uniform distribution using a chi-square test or the like. Until it is done, a section with a small number of feature vectors existing in the section may be selected, and the feature vector may be generated so as to fall into one or more of the selected sections.
- the dummy data addition processing unit 123 may rearrange the learning data (labeled feature vector) at random (subscript i is randomly replaced).
- the dummy data addition processing unit 123 stores information indicating dummy data (for example, a suffix i indicating dummy data) in the dummy data storage unit 122.
- the encryption processing unit 124 encrypts the learning data label y using an additive homomorphic encryption encryption algorithm to generate a ciphertext E (y), and uses the learning data as ciphertext E (y) instead of the label y.
- E (D) concealment learning data
- the concealment learning data E (D) is as follows.
- E (D) ⁇ (x i , E (y i ))
- x i ⁇ R m , y i ⁇ ⁇ 1, 1, 0 ⁇ i 1, 2,..., N ⁇
- the learning data transmission unit 125 transmits the concealment learning data to the analysis execution apparatus 200.
- the learning result receiving unit 126 receives the processing result of SVM learning transmitted from the analysis execution apparatus 200.
- the real number coefficients (a 1 , a 2 ,..., A m ) ⁇ R m are not received from the analysis execution apparatus 200, but a value ⁇ a i obtained by multiplying the coefficients by labels ⁇ a i y i
- i 1, 2,..., N ⁇ (hereinafter referred to as learning result) ⁇ E (a i y i )
- i 1, 2,. Is received as a processing result.
- the decryption processing unit 127 decrypts the concealment learning result to obtain (a 1 y 1 , a 2 y 2 ,..., A N y N ).
- the decoding processing unit 127 identifies dummy data in the learning result decoded from the information stored in the dummy data storage unit 122, and removes the dummy data from the learning result (a 1 , a 2 ,... a n ) is extracted.
- the decoding processing unit 127 adds the vector (a 1 , a 2 ,..., An ) to the orthogonal complement space of (y 1 , y 2 ,..., Y n ).
- the projected orthographic vector may be used as a learning result.
- the pattern discriminator generating unit 128 generates a pattern discriminator using the coefficients (a 1 , a 2 ,..., A m ) ⁇ R m . Note that the pattern discriminator generation method is the same as that used in general SVM learning, and a description thereof is omitted here.
- FIG. 5 is a software schematic diagram of the analysis execution apparatus 200.
- the analysis execution device 200 includes a learning data receiving unit 221, a coefficient generating unit 222, an update processing unit 223, and a learning result transmitting unit 224.
- the coefficient generation unit 222, the update processing unit 223, and the learning result transmission unit 224 are realized by the CPU 101 included in the analysis execution device 200 loading the program code stored in the auxiliary storage device 102 into the memory 103 and executing it. Is done.
- the learning data receiving unit 221 receives the concealed learning data set transmitted from the analysis requesting apparatus 100.
- the coefficient generator 222 generates coefficients (a 1 , a 2 ,..., A N ) of the objective function L.
- the coefficient generation unit 222 generates a random number N times as a coefficient. For example, a predetermined initial value (for example, all a i can be set to 0) is set. It may be.
- the update processing unit 223 performs update processing according to the update formula (6).
- the update processing unit 223 uses addition processing using an additive homomorphic encryption method for the calculation of the operator “+” according to the update formula (6).
- an additive homomorphic encryption method having no multiplicative property such as a Palier encryption method
- the update processing unit 223 randomly A set of ciphertexts E (a i y i ) calculated by giving the set coefficient and the concealment learning data collection to the update formula (6) is directly generated as a concealment learning result.
- the learning result transmitting unit 224 transmits the concealment learning result to the analysis requesting apparatus 100.
- FIG. 6 is a diagram showing a flow of processing executed in the data learning analysis system of this embodiment.
- the encryption processing unit 124 generates a secret key / public key used in the following using a secret key / public key generation algorithm based on an additive homomorphic encryption method (S100).
- a data set D ⁇ (x i , y i )
- x i ⁇ R m , y i ⁇ ⁇ 1, 1, 0 ⁇ i 1, 2,..., N ⁇ is generated (S150).
- the dummy data addition processing unit 123 may rearrange the learning data at random.
- FIG. 7 illustrates a feature space in which a set of dummy feature vectors having a label 0 is added to a set of positive / negative feature vectors.
- a vector corresponding to “ ⁇ ” is a positive label feature vector
- a vector corresponding to “ ⁇ ” is a negative label feature vector
- a vector corresponding to “ ⁇ ” is a dummy feature vector.
- the dummy data addition processing unit 123 adds dummy data so that the feature vectors are close to a uniform distribution.
- x i ⁇ R m , y i ⁇ ⁇ 1, 1, 0 ⁇ i 1, 2,..., N ⁇ .
- the label y i is a plain text
- the cipher text E (y i ) is generated using the additive homomorphic encryption encryption algorithm with the public key generated in (S100)
- the concealment learning data E (D ) ⁇ (X i , E (y i ))
- x i ⁇ R m , y i ⁇ ⁇ 1, 1, 0 ⁇ i 1, 2,..., N ⁇ is generated (S200).
- the learning data transmission unit 125 transmits the concealment learning data (D100) to the analysis execution device 200.
- the analysis performer terminal 200 that has received the concealment learning data (D100) performs the learning process shown in FIG. 8 (S300).
- the learning result transmission unit 224 returns the learning result ⁇ E (a i y i )
- i 1, 2,..., N ⁇ to the analysis requesting apparatus 100 as the concealment learning result (D200).
- the learning result receiving unit 126 receives the concealment learning result (D200) transmitted from the analysis execution apparatus 200, and the decryption processing unit 127 uses the secret key generated in (S100). Then, the concealment learning result (D200) is decrypted to obtain learning results (a 1 y 1 , a 2 y 2 ,..., A N y N ) (S400).
- the decoding processing unit 127 removes the result corresponding to the dummy data from (a 1 y 1 , a 2 y 2 ,..., A N y N ), and obtains the final coefficient sequence (a 1 , a 2 ,. a n ).
- the post-processing is finished as described above (S500).
- the decoding processing unit 127 As a vector (a 1, a 2, ... , a n) and (y 1, y 2, ... , y n) are orthogonally projected on the orthogonal complement of the coefficient sequence that orthogonal projection vector (a 1 , A 2 ,..., A n ).
- the pattern discriminator generating unit 128 generates a pattern discriminator using the coefficient sequence (a 1 , a 2 ,..., An ) (S600).
- FIG. 8 is a diagram showing a processing flow of the learning process in (S300) of FIG.
- the update processing unit 223 calculates the update formula (6) for the initial coefficients (a 1 , a 2 ,..., A N ) and the concealment learning data (D100) (S303).
- the learning result transmission unit 224 transmits the concealment learning processing result ⁇ E (a i y i )
- i 1, 2,..., N ⁇ (D200) calculated by the update formula (6) to the analysis requesting apparatus 100. (S304).
- the SVM learning is performed by the gradient method while the label is encrypted (without decryption). be able to. Therefore, the label attached to the feature vector as the teacher signal can be kept secret from the analysis execution apparatus 200 side.
- the label is encrypted instead of linear conversion.
- linear conversion is performed on all feature vectors using the same matrix, and therefore, for example, a combination of a concealed feature vector and an original feature vector is a feature.
- the matrix used for the conversion may be specified, and thereby the original feature vector may be specified.
- the additive homomorphic cryptosystem is resistant to selected plaintext / ciphertext attacks, and it is difficult to estimate the label even if a set of feature vectors leaks beyond the dimension of the feature vector space. . Therefore, the label can be securely concealed from the analysis execution apparatus 200 side, and an improvement in security is expected.
- the label is encrypted after adding dummy data to the learning data set, it is difficult to estimate the label from the uneven distribution of feature vectors. Therefore, security can be improved.
- security can be improved.
- dummy data is used so that the feature vectors approach a uniform distribution. It is difficult to infer information about the original feature vector from the set of encrypted feature vectors. Therefore, the label can be securely concealed from the analysis execution apparatus 200 side. Therefore, security can be further improved.
- the label of the dummy data is set to “0”, it is possible to eliminate the influence due to the addition of the dummy data in the gradient method update process.
- the label of the dummy data is also encrypted, it cannot be estimated from the encrypted data whether the influence is eliminated. Therefore, it is possible to securely conceal the learning data from the analysis execution apparatus 200 side.
- the analysis execution apparatus 200 updates the initial coefficient by the gradient method only once (S303).
- the gradient method In the learning process (S300) in the first embodiment, the analysis execution apparatus 200 updates the initial coefficient by the gradient method only once (S303).
- the solution obtained as shown in FIG. 7 is not always the optimal solution. Therefore, the hypersurface obtained from the concealment learning result (D200) that has been updated only once may not coincide with the hypersurface that maximizes the margin obtained from the optimal solution as shown in FIG. It depends on the value of random coefficients (a 1 , a 2 ,..., A N ) selected as initial coefficients.
- k initial values (a 1 , a 2 ,..., A N ) are prepared, update processing is performed, and the sum of the update results E (a i y i ) is calculated. Easing the dependence on the initial value.
- the only difference from the first embodiment is the learning process (S300), and the other processing flow is the same as that of the first embodiment, so only the learning process (S300) will be described here.
- FIG. 11 is a processing flow of the learning process (S300) in the second embodiment.
- the initialization learning result E (a i y i ) is initialized to 0 (S603).
- the update processing unit 223 performs initial coefficient (a 1 , a 2 ,..., A N ), concealment learning data (D100), and concealment learning result ⁇ E (a i y i )
- i 1, 2, ..., N ⁇ and update formula
- the concealment learning result E (a i y i ) is updated (S604).
- i 1, 2,..., N ⁇ calculated by the update formula (7). The data is transmitted to the device 100 (S606).
- FIG. 12 is a diagram for explaining the update process in the learning process (S300) in the second embodiment.
- the concealment learning process result (D200) is calculated from the update process of one initial coefficient, but in the second embodiment, the concealment is performed by adding a plurality of initial coefficients.
- the computerized learning process result (D200) is calculated. Therefore, compared to the case where the update process is performed only once as in the first embodiment (see FIG. 9), a solution closer to the optimal solution can be obtained.
- the analysis execution device 200 can be configured not to decrypt the concealment learning data. Therefore, the learning result can be made closer to the optimum solution while keeping the learning data secret from the analysis execution apparatus 200 side.
- the analysis requesting apparatus 100 and the analysis execution apparatus 200 are assumed to be a single computer.
- the present invention is not limited thereto, and for example, at least one of the analysis requesting apparatus 100 and the analysis execution apparatus 200 is a plurality of computers. You may comprise by a computer.
- the update equations (5) to (7) may be calculated using a general kernel function K (x i , x j ) including values.
- the update coefficient ⁇ is set to 0.01. However, this value is not necessarily set, and a value calculated by an update coefficient determination algorithm of an existing gradient method may be used.
- the coefficient generation unit 222 of the analysis execution apparatus 200 determines the number k for which the initial value of the coefficient is prepared.
- the value k is specified from the analysis request apparatus 100. Also good.
- the learning data transmission unit 125 may receive an input of a value of k from the user and transmit it to the analysis execution device 200 together with the concealment learning data.
- Analysis request device 101
- CPU Auxiliary storage device (storage device)
- DESCRIPTION OF SYMBOLS 103
- Memory 104
- Internal signal line 105
- Display apparatus 106
- Input / output interface 107
- Communication apparatus 200
- Analysis execution apparatus 300 Network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
まず、本実施形態で使用する暗号方式およびデータ分析の用語を定義する。本実施形態では、利用する加法的準同型暗号方式を1つ固定して実施するものとする。
本実施形態で用いる加法的準同型暗号方式は、準同型性を有する暗号方式(本実施形態では公開鍵暗号方式を想定している。)のうち加法性を有する暗号アルゴリズムである。例えば加法的準同型暗号方式は、通常の公開鍵暗号方式が有する暗号化鍵、復号化鍵に対する非対称性に加え、暗号文同士の加法性を有する。つまり、2つの暗号文に対して、平文がその2つの暗号文に対する平文の算術和(以下、加算、もしくは単に和と称し、当該算術和に用いる演算子も「+」と表記する。)となる暗号文を公開情報のみを用いて(秘密鍵や平文を用いることなく)計算する事が可能である。したがって、平文mの暗号文をE(m)とすると、E(m1)+E(m2)=E(m1+m2)が成立する。以下の説明においても、E(m)は平文mの暗号文を表すものとする。
上述した加法的準同型暗号アルゴリズムで規定されている秘密鍵/公開鍵生成アルゴリズムを指す。セキュリティパラメータと鍵シードをコマンド入力とし、特定のビット長の秘密鍵/公開鍵を出力とする。
上述した加法的準同型暗号アルゴリズムで規定されている暗号化アルゴリズムを指す。加法的準同型暗号暗号化アルゴリズムは、平文と公開鍵を入力とし暗号文を出力する。
上述した加法的準同型暗号アルゴリズムで規定されている復号化アルゴリズムを指す。加法的準同型暗号復号化アルゴリズムは、暗号文と秘密鍵を入力とし、その暗号文に対応する平文を出力する。
上述した加法的準同型暗号アルゴリズムで規定されている暗号文同士の加算演算を実現するアルゴリズムを指す。複数の暗号文をコマンド入力とし、その複数の暗号文の平文の総和に対応する暗号文を出力する。例えば、100に対応する暗号文E(100)と200に対応する暗号文E(200)をコマンド入力とした際には、300(100+200)に対応する暗号文E(300)を出力する。
教師あり学習を用いる識別手法の一つであり、以下のようなSVM学習の対象となる学習データ集合
D={(xi,yi)|xi∈Rm,yi∈{-1,1}i=1,2,…,n}
が与えられた際、Rm内でyi=1であるxiベクトルとyi=-1であるxiベクトルを分離する超平面もしくは超曲面の中で最もマージンの大きい超平面もしくは超曲面を算出する。ここで、超平面もしくは超曲面のマージンとはyi=1であるxiベクトルとyi=-1であるxiベクトルの中で最もその超平面もしくは超曲面に近いxiベクトルとの距離である。また、本実施形態では、各xiベクトルを特徴ベクトルと呼ぶ。さらにyi=1である特徴ベクトルxiをポジティブラベル特徴ベクトル、yi=-1である特徴ベクトルxiをネガティブラベル特徴ベクトルと呼ぶ。また、yiは、パターン識別器によりデータをクラス分けするためのクラスであり(図1参照)、ラベルと呼ばれる。なお、本実施形態では、図3のような超平面もしくは超曲面で分離可能な学習データ集合(ハードマージン問題)を用いて説明を行うが、本発明はこれに限定されるものではなく、分離不可能な場合(ソフトマージン問題)についても同様の方法が適用できる。また以下では、超平面で分離可能な例を用いて説明を行うが、本発明はこれに限定されるものではなく、既存のカーネル法を用いて非線形な超曲面で分離可能な例に対しても適用できる。
上述の学習データ集合
D={(xi,yi)|xi∈Rm,yi∈{-1,1}i=1,2,…,n}
が与えられた際、Rm内でマージンを最大化する超平面を求めるアルゴリズムをSVM学習アルゴリズムと呼び、その超平面を求める問題をSVM問題と呼ぶ。より具体的に、この問題は、目的関数L(a1,a2,…,an)を最大化する実数係数(a1,a2,…,am)∈Rmを探索する問題に帰着する。ここで目的関数Lは次式により表される。
という制約条件上を満たすものとする。
(8)勾配法
勾配法とは、最適化問題において関数の勾配に関する情報を基に解を探索するアルゴリズムである。上記SVM問題について、上記目的関数Lを最大化する最適解(a1,a2,…,an)は勾配法により求める。
上述のとおり、本実施形態のデータ学習分析システムでは、SVM学習にあたり(a)学習データを暗号化し、(b)学習データにダミーデータを追加する。
本実施形態では、学習データのラベルyiを暗号化したうえでSVM学習を実行する分析実行装置200に与える。これにより、分析実行装置200側に対してラベルyiの内容(+1であるか-1であるか)を隠蔽する。ラベルyiの内容が隠蔽されることにより、分析実行装置200では学習データに有意な意味づけを行うことが困難となる。
また、本実施形態では、学習データの集合にダミーデータを加える。これにより、学習データ集合の与えられた分析実行装置200側では、例えば学習データの分布の偏りなどを用いて、学習データの有意な意味づけを推測することも困難となる。
==第一の実施形態==
図2は、本発明の一実施の形態であるデータ学習分析システムの概略図である。図2に示すように、本実施形態のデータ学習分析システムは、分析依頼装置100と分析実行装置200とを含んで構成される。分析依頼装置100は、学習データを管理するコンピュータである。分析実行装置200はSVM学習に係る処理を行うコンピュータである。
図3は、分析依頼装置100のハードウェア概略図である。図3に示すように、分析依頼装置100は、CPU101と、補助記憶装置102と、メモリ103と、表示装置105と、入出力インターフェース106と、通信装置107と、が内部信号線104で連結し、構成される。また、補助記憶装置102には、プログラムコードが格納されている。プログラムコードは、メモリ103にロードされCPU101によって実行される。
図4は、分析依頼装置100のソフトウェア概略図である。分析依頼装置100は、学習データ記憶部121、ダミーデータ記憶部122、ダミーデータ追加処理部123、暗号化処理部124、学習データ送信部125、学習結果受信部126、復号化処理部127およびパターン識別器生成部128を備える。
D={(xi,yi)|xi∈Rm,yi∈{-1,1}i=1,2,…,n}
と表される。
図5は、分析実行装置200のソフトウェア概略図である。分析実行装置200は、学習データ受信部221、係数生成部222、更新処理部223および学習結果送信部224を備える。なお、係数生成部222、更新処理部223および学習結果送信部224は、分析実行装置200が備えるCPU101が補助記憶装置102に格納されているプログラムコードをメモリ103にロードして実行することにより実現される。
図6は、本実施形態のデータ学習分析システムにおいて実行される処理の流れを示す図である。
となるように、ベクトル(a1,a2,…,an)を(y1,y2,…,yn)の直交補空間に正射影し、その正射影ベクトルを係数列(a1,a2,…,an)としてもよい。パターン識別器生成部128は、係数列(a1,a2,…,an)を用いてパターン識別器を生成する(S600)。
次に第二の実施形態について説明する。
に与えて秘匿化学習結果E(aiyi)を更新する(S604)。
101 CPU
102 補助記憶装置(記憶装置)
103 メモリ
104 内部信号線
105 表示装置
106 入出力インターフェース
107 通信装置
200 分析実行装置
300 ネットワーク
Claims (9)
- サポートベクトルマシン学習を行うシステムであって、
学習データ管理装置および学習装置を含んで構成され、
前記学習データ管理装置は、
前記サポートベクトルマシン学習の対象となる、ラベルおよび特徴ベクトルを含む学習データの集合を記憶する学習データ記憶部と、
前記学習データの前記ラベルを加法的準同型暗号方式により暗号化する暗号化処理部と、
前記暗号化された前記ラベルおよび前記特徴ベクトルを含む暗号化学習データを前記学習装置に送信する学習データ送信部と、
を備え、
前記学習装置は、
前記暗号化学習データを受信する学習データ受信部と、
前記暗号化学習データに対して加法的準同型加算アルゴリズムを用いて勾配法による更新処理を行う更新処理部と、
を備えることを特徴とするサポートベクトルマシン学習システム。 - 請求項1に記載のサポートベクトルマシン学習システムであって、
前記学習データ管理装置は、前記学習データの集合にダミーデータを追加するダミーデータ追加処理部をさらに備え、
前記ダミーデータに含まれる前記ラベルの値は0であること、
を特徴とするサポートベクトルマシン学習システム。 - 請求項1ないし4のいずれか1項に記載のサポートベクトルマシン学習システムであって、
前記更新処理部は、前記更新処理の対象となる複数の係数組のそれぞれを用いて前記更新処理を行うこと、
を特徴とするサポートベクトルマシン学習システム。 - 請求項5に記載のサポートベクトルマシン学習システムであって、
前記更新処理部は、前記複数の係数組のそれぞれについての前記更新処理の処理結果を合計し、当該合計値を前記処理結果とすること、
を特徴とするサポートベクトルマシン学習システム。 - サポートベクトルマシン学習を行うシステムであって、
前記サポートベクトルマシン学習の対象となる、特徴ベクトルおよび加法的準同型暗号方式により暗号化されたラベルを含む学習データの集合を記憶する学習データ記憶部と、
前記暗号化学習データに対して加法的準同型加算アルゴリズムを用いて勾配法による更新処理を行う更新処理部と、
を備えることを特徴とするサポートベクトルマシン学習システム。 - サポートベクトルマシン学習を行う方法であって、
前記サポートベクトルマシン学習の対象となる、ラベルおよび特徴ベクトルを含む学習データの集合を記憶する学習データ管理装置が、
前記学習データの前記ラベルを加法的準同型暗号方式により暗号化するステップと、
前記暗号化した前記ラベルおよび前記特徴ベクトルを含む暗号化学習データを学習装置に送信するステップとを実行し、
前記学習装置が、
前記暗号化学習データを受信するステップと、
前記暗号化学習データに対して加法的準同型加算アルゴリズムを用いて勾配法による更新処理を行うステップとを実行すること、
を特徴とするサポートベクトルマシン学習方法。 - 請求項1に記載のサポートベクトルマシン学習方法であって、
前記学習データ管理装置はさらに前記学習データの集合にダミーデータを追加するステップを実行し、
前記ダミーデータに含まれる前記ラベルの値は0であること、
を特徴とするサポートベクトルマシン学習方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016512563A JPWO2015155896A1 (ja) | 2014-04-11 | 2014-04-11 | サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 |
PCT/JP2014/060533 WO2015155896A1 (ja) | 2014-04-11 | 2014-04-11 | サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 |
US15/303,092 US20170039487A1 (en) | 2014-04-11 | 2014-04-11 | Support vector machine learning system and support vector machine learning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/060533 WO2015155896A1 (ja) | 2014-04-11 | 2014-04-11 | サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015155896A1 true WO2015155896A1 (ja) | 2015-10-15 |
Family
ID=54287495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/060533 WO2015155896A1 (ja) | 2014-04-11 | 2014-04-11 | サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20170039487A1 (ja) |
JP (1) | JPWO2015155896A1 (ja) |
WO (1) | WO2015155896A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657104A (zh) * | 2017-09-20 | 2018-02-02 | 浙江浙能台州第二发电有限责任公司 | 基于在线支持向量机的锅炉燃烧系统动态建模方法 |
JP2018054765A (ja) * | 2016-09-27 | 2018-04-05 | 日本電気株式会社 | データ処理装置、データ処理方法、およびプログラム |
JP2020115257A (ja) * | 2019-01-17 | 2020-07-30 | 富士通株式会社 | 学習方法、学習プログラムおよび学習装置 |
WO2022138959A1 (ja) * | 2020-12-25 | 2022-06-30 | 国立研究開発法人情報通信研究機構 | 協調学習システム及び協調学習方法 |
US11522671B2 (en) | 2017-11-27 | 2022-12-06 | Mitsubishi Electric Corporation | Homomorphic inference device, homomorphic inference method, computer readable medium, and privacy-preserving information processing system |
WO2024034077A1 (ja) * | 2022-08-10 | 2024-02-15 | 日本電気株式会社 | 学習システム、学習方法、およびコンピュータ可読媒体 |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10083061B2 (en) * | 2016-11-02 | 2018-09-25 | Sap Se | Cloud embedded process tenant system for big data processing |
US10491373B2 (en) | 2017-06-12 | 2019-11-26 | Microsoft Technology Licensing, Llc | Homomorphic data analysis |
US11113624B2 (en) * | 2017-07-12 | 2021-09-07 | Sap Se | Distributed machine learning on heterogeneous data platforms |
JP6691087B2 (ja) * | 2017-10-04 | 2020-04-28 | ファナック株式会社 | 熱変位補正システム |
CN108418833B (zh) * | 2018-03-23 | 2022-01-07 | 中科创达软件股份有限公司 | 一种软件的管理方法、云端服务器和终端 |
US20190332814A1 (en) * | 2018-04-27 | 2019-10-31 | Nxp B.V. | High-throughput privacy-friendly hardware assisted machine learning on edge nodes |
JP7079483B2 (ja) * | 2018-06-18 | 2022-06-02 | 国立研究開発法人産業技術総合研究所 | 情報処理方法、システム及びプログラム |
US11087223B2 (en) | 2018-07-11 | 2021-08-10 | International Business Machines Corporation | Learning and inferring insights from encrypted data |
CN109598385A (zh) * | 2018-12-07 | 2019-04-09 | 深圳前海微众银行股份有限公司 | 反洗钱联合学习方法、装置、设备、系统及存储介质 |
WO2020155173A1 (en) * | 2019-02-03 | 2020-08-06 | Platon Co., Limited | Data processing method, device and system for machine learning model |
JP7287474B2 (ja) * | 2019-08-23 | 2023-06-06 | 日本電信電話株式会社 | デバイス識別装置、デバイス識別方法およびデバイス識別プログラム |
CN112818369B (zh) * | 2021-02-10 | 2024-03-29 | 中国银联股份有限公司 | 一种联合建模方法及装置 |
US20220284892A1 (en) * | 2021-03-05 | 2022-09-08 | Lenovo (Singapore) Pte. Ltd. | Anonymization of text transcripts corresponding to user commands |
-
2014
- 2014-04-11 JP JP2016512563A patent/JPWO2015155896A1/ja not_active Ceased
- 2014-04-11 US US15/303,092 patent/US20170039487A1/en not_active Abandoned
- 2014-04-11 WO PCT/JP2014/060533 patent/WO2015155896A1/ja active Application Filing
Non-Patent Citations (3)
Title |
---|
JUSTIN ZHAN ET AL.: "How To Construct Support Vector Machines Without Breaching Privacy", STUDIA INFORMATICA, vol. 1, no. 7, 2006, pages 233 - 244, XP055229535 * |
KENG-PEI LIN ET AL.: "Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation", PROCEEDINGS OF THE 16TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 25 July 2010 (2010-07-25), pages 363 - 372, XP055229532 * |
YASUHIRO FUJII ET AL.: "Advanced Security Technologies for Cloud Computing and Utilization of Big Data", HITACHI HYORON, vol. 94, no. 10, 1 October 2012 (2012-10-01), pages 49 - 53, XP055229536 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018054765A (ja) * | 2016-09-27 | 2018-04-05 | 日本電気株式会社 | データ処理装置、データ処理方法、およびプログラム |
CN107657104A (zh) * | 2017-09-20 | 2018-02-02 | 浙江浙能台州第二发电有限责任公司 | 基于在线支持向量机的锅炉燃烧系统动态建模方法 |
US11522671B2 (en) | 2017-11-27 | 2022-12-06 | Mitsubishi Electric Corporation | Homomorphic inference device, homomorphic inference method, computer readable medium, and privacy-preserving information processing system |
JP2020115257A (ja) * | 2019-01-17 | 2020-07-30 | 富士通株式会社 | 学習方法、学習プログラムおよび学習装置 |
JP7279368B2 (ja) | 2019-01-17 | 2023-05-23 | 富士通株式会社 | 学習方法、学習プログラムおよび学習装置 |
WO2022138959A1 (ja) * | 2020-12-25 | 2022-06-30 | 国立研究開発法人情報通信研究機構 | 協調学習システム及び協調学習方法 |
WO2024034077A1 (ja) * | 2022-08-10 | 2024-02-15 | 日本電気株式会社 | 学習システム、学習方法、およびコンピュータ可読媒体 |
Also Published As
Publication number | Publication date |
---|---|
US20170039487A1 (en) | 2017-02-09 |
JPWO2015155896A1 (ja) | 2017-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015155896A1 (ja) | サポートベクトルマシン学習システムおよびサポートベクトルマシン学習方法 | |
CN110572253B (zh) | 一种联邦学习训练数据隐私性增强方法及系统 | |
JP6180177B2 (ja) | プライバシーを保護することができる暗号化データの問い合わせ方法及びシステム | |
JP5657128B2 (ja) | 秘匿計算システム、秘匿計算方法、および秘匿計算プログラム | |
US20150149763A1 (en) | Server-Aided Private Set Intersection (PSI) with Data Transfer | |
US10313119B2 (en) | Data management device, system, re-encryption device, data sharing device, and storage medium | |
JP2016512611A (ja) | プライバシー保護リッジ回帰 | |
CN113162752B (zh) | 基于混合同态加密的数据处理方法和装置 | |
US20170310479A1 (en) | Key Replacement Direction Control System and Key Replacement Direction Control Method | |
US20090138698A1 (en) | Method of searching encrypted data using inner product operation and terminal and server therefor | |
JP2018142013A (ja) | 関連付けられた秘密鍵部分を用いた高速公開鍵暗号化のためのシステムおよび方法 | |
CN113434878B (zh) | 基于联邦学习的建模及应用方法、装置、设备及存储介质 | |
Shu et al. | Secure task recommendation in crowdsourcing | |
CN107204997A (zh) | 管理云存储数据的方法和装置 | |
JP2012128398A (ja) | プライバシを保護したまま暗号化された要素の順序を選択するための方法およびシステム | |
Njorbuenwu et al. | A survey on the impacts of quantum computers on information security | |
KR101697868B1 (ko) | 공유 또는 검색을 위한 데이터 암호화 방법 및 이를 수행하는 장치 | |
US10594473B2 (en) | Terminal device, database server, and calculation system | |
WO2014030706A1 (ja) | 暗号化データベースシステム、クライアント装置およびサーバ、暗号化データ加算方法およびプログラム | |
Raja et al. | Opposition based joint grey wolf-whale optimization algorithm based attribute based encryption in secure wireless communication | |
Bandaru et al. | Block chain enabled auditing with optimal multi‐key homomorphic encryption technique for public cloud computing environment | |
CN112380404B (zh) | 数据过滤方法、装置及系统 | |
CN116502732B (zh) | 基于可信执行环境的联邦学习方法以及系统 | |
CN108599941A (zh) | 随机非对称扩充字节通信数据加密方法 | |
CN112906052A (zh) | 联邦学习中多用户梯度置换的聚合方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14888570 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016512563 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15303092 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14888570 Country of ref document: EP Kind code of ref document: A1 |