JP2017126112A

JP2017126112A - Server, distributed server system, and information processing method

Info

Publication number: JP2017126112A
Application number: JP2016003417A
Authority: JP
Inventors: 勇児糟谷; Yuji Kasuya; 杉本　裕介; Yusuke Sugimoto; 裕介杉本; 拓哉水原; Takuya Mizuhara; 一寛大場; Kazuhiro Oba
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2017-07-20

Abstract

PROBLEM TO BE SOLVED: To protect personal information, while collecting information to be used in statistic/machine learning.SOLUTION: A server includes: an existing data storage unit 21 which stores data on an existing user; a data acquisition unit 22 which acquires data on a user different from the existing user; a difference calculation unit 23 which calculates difference data between the data on the existing user stored in the existing data storage unit 21 and the data on another user acquired by the data acquisition unit 22; a difference data storage unit 24 which stores the difference data calculated by the difference calculation unit 23; and a statistic unit 25 which collects statistics on users by use of the existing-user data and the difference data.SELECTED DRAWING: Figure 3

Description

本発明は、サーバ、分散型サーバシステム、及び情報処理方法に関する。 The present invention relates to a server, a distributed server system, and an information processing method.

従来から、顧客の個人情報を統計・機械学習に使用する際、顧客に個人情報を使用するための同意を求めるシステムが既に知られている。他方で、いくら同意を得るといっても個人情報を提供することに抵抗感を持つユーザは少なくない。 Conventionally, when a customer's personal information is used for statistics and machine learning, a system that asks the customer for consent to use the personal information is already known. On the other hand, there are many users who are reluctant to provide personal information, no matter how much consent is obtained.

他方、例えば特許文献１には、プライバシー保護協調フィルタリングの技術を用いて、利用者の提供する各アイテムに対する評価値を暗号化することや、提供する各アイテムに対する評価値に、乱数を付与して、元の評価値を求めることを困難にすることで、使用する側に個人情報を直接渡さないようにすることが開示されている。 On the other hand, for example, Patent Document 1 uses a privacy protection collaborative filtering technique to encrypt an evaluation value for each item provided by the user, or to assign a random number to the evaluation value for each item provided. It is disclosed that it is difficult to obtain the original evaluation value so that personal information is not directly passed to the user.

しかし、例えば特許文献１のような従来技術における特定の手法に特化したデータでは、一般的な統計・機械学習に使用するデータを収集できない。 However, for example, data specialized for a specific method in the prior art such as Patent Document 1 cannot collect data used for general statistical / machine learning.

本発明は、上述した実情に鑑みてなされたものであって、統計・機械学習において利用可能な情報を収集しつつ個人情報を保護することを目的とする。 The present invention has been made in view of the above-described circumstances, and an object thereof is to protect personal information while collecting information that can be used in statistics and machine learning.

上述した課題を解決するため、本発明のサーバは、既存のユーザに関するデータを記憶する既存データ記憶手段と、既存のユーザとは別のユーザに関するデータを取得するデータ取得手段と、既存データ記憶手段に記憶されている既存のユーザに関するデータと、データ取得手段により取得された別のユーザに関するデータとの差分データを算出する差分算出手段と、差分算出手段により算出された差分データを記憶する差分データ記憶手段と、既存のユーザに関するデータと差分データを用いてユーザに関するデータの統計をとる統計手段と、を備えることを特徴とする。 In order to solve the above-described problems, the server of the present invention includes an existing data storage unit that stores data related to an existing user, a data acquisition unit that acquires data related to a user different from the existing user, and an existing data storage unit. Difference calculation means for calculating difference data between data relating to an existing user stored in the data and data relating to another user acquired by the data acquisition means, and difference data for storing difference data calculated by the difference calculation means Storage means, and statistical means for collecting statistics on user data using data and difference data on existing users are provided.

本発明によれば、統計・機械学習において利用可能な情報を収集しつつ個人情報を保護することができる。 According to the present invention, personal information can be protected while collecting information that can be used in statistics and machine learning.

本発明の実施形態におけるサーバを含むシステムの概略構成図である。It is a schematic block diagram of the system containing the server in embodiment of this invention. 本発明の実施形態におけるサーバのハードウェア構成図である。It is a hardware block diagram of the server in embodiment of this invention. 本発明の実施形態におけるサーバの機能ブロック図である。It is a functional block diagram of a server in an embodiment of the present invention. 本発明の実施形態における処理例を示すイメージ図である。It is an image figure which shows the process example in embodiment of this invention. 本発明の実施形態における処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in embodiment of this invention. 本発明の実施形態における分散型サーバシステムの機能ブロック図である。It is a functional block diagram of a distributed server system in an embodiment of the present invention. 本発明の実施形態における分散型サーバシステムの機能ブロック図である。It is a functional block diagram of a distributed server system in an embodiment of the present invention. 本発明の実施形態における分散型サーバシステムの機能ブロック図である。It is a functional block diagram of a distributed server system in an embodiment of the present invention. 本発明の実施形態における分散型サーバシステムの機能ブロック図である。It is a functional block diagram of a distributed server system in an embodiment of the present invention.

本発明の実施形態のサーバ、分散型サーバシステムについて図面を用いて以下説明する。なお、各図中、同一又は相当する部分には同一の符号を付しており、その重複説明は適宜に簡略化乃至省略する。また、以下に記載する実施形態は本発明の最良の形態であって、本発明に係る特許請求の範囲を限定するものではない。 A server and a distributed server system according to an embodiment of the present invention will be described below with reference to the drawings. In addition, in each figure, the same code | symbol is attached | subjected to the part which is the same or it corresponds, The duplication description is simplified thru | or abbreviate | omitted suitably. The embodiment described below is the best mode of the present invention, and does not limit the scope of the claims according to the present invention.

なお、本実施形態においては、「サーバ」は、所謂クライアントサーバシステムにおいて、クライアントに何らかのサービスを提供するためのプログラムを指すものである。他方、「サーバ」には、そのプログラムに従って動作する情報処理装置の意も含まれることは言うまでもない。 In the present embodiment, the “server” refers to a program for providing some service to a client in a so-called client server system. On the other hand, it goes without saying that the “server” includes the meaning of an information processing apparatus that operates according to the program.

＜本実施形態におけるサーバを含むシステムの概略構成について＞
本実施形態におけるサーバを含むシステムの概略構成について図１を参照して説明する。本実施形態におけるシステム１は、サーバ１０とクライアント２０とがネットワーク３０を介して接続されて構成されている。なお、図１では、サーバ１０及びクライアント２０は１台であるが、複数のサーバ、複数のクライアントがネットワークを介して接続された構成であってもよい。 <About the schematic configuration of a system including a server in this embodiment>
A schematic configuration of a system including a server in the present embodiment will be described with reference to FIG. The system 1 in this embodiment is configured by connecting a server 10 and a client 20 via a network 30. In FIG. 1, the server 10 and the client 20 are one, but a configuration in which a plurality of servers and a plurality of clients are connected via a network may be used.

サーバ１０の機能は、パーソナルコンピュータ（以下「ＰＣ」という。）等の情報処理装置、情報処理機能を有するストレージやハードディスクが担うことになる。また、サーバ機能を実現する処理能力を有するのであれば、ＰＣ等に限定されない。 The server 10 functions as an information processing apparatus such as a personal computer (hereinafter referred to as “PC”), a storage or a hard disk having an information processing function. Moreover, as long as it has the processing capability which implement | achieves a server function, it is not limited to PC etc.

クライアント２０は、サーバ１０を利用するユーザが使用するＰＣ等の情報処理装置、スマートフォンやタブレット等の携帯情報端末に相当する。 The client 20 corresponds to an information processing apparatus such as a PC used by a user who uses the server 10 and a portable information terminal such as a smartphone or a tablet.

ネットワーク３０の通信形態としては、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）でもＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）でもよい。また、ネットワーク３０は、無線通信ネットワークを利用したものでも、有線通信ネットワークを利用したものであってもよい。 The communication form of the network 30 may be a LAN (Local Area Network) or a WAN (Wide Area Network). The network 30 may be a wireless communication network or a wired communication network.

＜本実施形態におけるサーバのハードウェア構成について＞
本実施形態におけるサーバ１０のハードウェア構成について図２を参照して説明する。本実施形態におけるサーバ１０は、ハードウェア構成として、ＣＰＵ１１と、ＲＡＭ１２と、ＲＯＭ１３と、ＮＷＩ／Ｆ１４と、ＨＤＤ１５と、入力部１６と、出力部１７を備えている。なお、これらは、サーバ１０が後述する機能（処理）を実行するための構成の一例を示したものであり、これら以外のハードウェアを排除するものではない。 <About the hardware configuration of the server in this embodiment>
The hardware configuration of the server 10 in this embodiment will be described with reference to FIG. The server 10 in the present embodiment includes a CPU 11, a RAM 12, a ROM 13, an NW I / F 14, an HDD 15, an input unit 16, and an output unit 17 as hardware configurations. Note that these are examples of configurations for the server 10 to execute functions (processes) described later, and hardware other than these is not excluded.

ＣＰＵ１１は、サーバ１０における後述する各処理を実現するためのメイン制御部である。ＣＰＵ１１は、ＲＯＭ１３に格納された各処理を規定する処理プログラムがＲＡＭ１２に読み込まれたものを実行することで、サーバ１０における各機能を実現する。 The CPU 11 is a main control unit for realizing each process to be described later in the server 10. The CPU 11 implements each function in the server 10 by executing a program in which a processing program defining each processing stored in the ROM 13 is read into the RAM 12.

ＲＡＭ１２は、上記のようにＣＰＵ１１のワークメモリとして機能する記憶部である。ＲＯＭ１３は、上記のように各処理を規定した処理プログラムやその他サーバ１０の制御に要する各種パラメータ等を記憶する記憶部である。 The RAM 12 is a storage unit that functions as a work memory of the CPU 11 as described above. The ROM 13 is a storage unit that stores a processing program that defines each process as described above, various parameters required for control of the server 10, and the like.

ＮＷＩ／Ｆ１４は、図１に示したネットワーク３０を介してサーバ１０とクライアント２０とが相互に接続するためのネットワークインタフェースである。ＨＤＤ１５は、例えばクライアント２０から取得したデータ等を記憶する大容量記憶部である。 The NW I / F 14 is a network interface for connecting the server 10 and the client 20 to each other via the network 30 shown in FIG. The HDD 15 is a large-capacity storage unit that stores, for example, data acquired from the client 20.

入力部１６は、例えばキーボードやマウス等の入力デバイスである。また、後述する表示部に重畳されたタッチパネルのようにユーザのタッチ操作を受け付けるデバイスであってもよい。さらに、入力部１６には、映像を撮影することで画像を取得するカメラや、音声入力を受け付けるマイクが含まれてもよい。 The input unit 16 is an input device such as a keyboard or a mouse. Moreover, the device which receives a user's touch operation like the touch panel superimposed on the display part mentioned later may be sufficient. Furthermore, the input unit 16 may include a camera that acquires an image by capturing a video and a microphone that receives audio input.

出力部１７は、例えばディスプレイ等の表示部である。また、出力部１７には、音声を出力するスピーカが含まれてもよい。 The output unit 17 is a display unit such as a display. The output unit 17 may include a speaker that outputs sound.

＜本発明の実施形態におけるサーバの機能ブロックについて＞
本発明の実施形態におけるサーバ１０の機能ブロックについて図３を参照して説明する。本実施形態におけるサーバ１０は、機能ブロックとして、既存データ記憶部２１、データ取得部２２、差分算出部２３、差分データ記憶部２４、統計部２５、表示部２６を含み構成される。 <Regarding Server Functional Blocks in the Embodiment of the Present Invention>
Functional blocks of the server 10 in the embodiment of the present invention will be described with reference to FIG. The server 10 in the present embodiment includes an existing data storage unit 21, a data acquisition unit 22, a difference calculation unit 23, a difference data storage unit 24, a statistics unit 25, and a display unit 26 as functional blocks.

既存データ記憶部２１は、既存のユーザに関するデータを記憶する既存データ記憶手段であり、図２に示したＣＰＵ１１、ＲＯＭ１３、ＨＤＤ１５によりその機能が実現される。「ユーザに関するデータ」としては、本実施形態ではユーザの氏名、年齢、身長、体重、血圧、病歴等の主に医療の現場で患者から提供される個人情報（以下「患者データ」という。）を例として説明する。他方、ユーザに関するデータはこの例に限定されない。また、「既存の」とは、ＨＤＤ１５等の記憶装置に既に記憶されていることを意味する。 The existing data storage unit 21 is an existing data storage unit that stores data relating to an existing user, and the function is realized by the CPU 11, the ROM 13, and the HDD 15 shown in FIG. In this embodiment, the “data related to the user” is personal information (hereinafter referred to as “patient data”) provided by the patient mainly in the medical field such as the user's name, age, height, weight, blood pressure, medical history, and the like. This will be described as an example. On the other hand, the data regarding the user is not limited to this example. Further, “existing” means that it is already stored in a storage device such as the HDD 15.

データ取得部２２は、既存のユーザとは別のユーザに関するデータを取得するデータ取得手段であり、図２に示したＣＰＵ１１、ＲＡＭ１２、ＮＷＩ／Ｆ１４によりその機能が実現される。取得する別のユーザに関するデータも上記の患者データであるが、例えば既存データ記憶部２１に記憶されていない、既存のユーザと異なるユーザから取得するデータ等である。なお、患者データ等を必ずしもネットワークを介さずともよい。例えば、図２に示した入力部１６を介してユーザから取得するようにしてもよい。また、外部記憶媒体に記憶されたデータを読み込むことで取得するようにしてもよい。 The data acquisition unit 22 is a data acquisition unit that acquires data related to a user other than the existing user, and the function is realized by the CPU 11, the RAM 12, and the NW I / F 14 illustrated in FIG. Although the data regarding another user to acquire is also said patient data, it is the data etc. which are not memorize | stored in the existing data storage part 21, for example, are acquired from the user different from the existing user. Note that patient data or the like does not necessarily have to go through a network. For example, you may make it acquire from a user via the input part 16 shown in FIG. Further, it may be acquired by reading data stored in an external storage medium.

差分算出部２３は、既存データ記憶部２１に記憶されている既存のユーザに関するデータと、データ取得部２２により取得された別のユーザに関するデータとの差分データを算出する差分算出手段である。差分算出部２３は、図２に示したＣＰＵ１１、ＲＡＭ１２によりその機能が実現される。 The difference calculation unit 23 is a difference calculation unit that calculates difference data between data related to an existing user stored in the existing data storage unit 21 and data related to another user acquired by the data acquisition unit 22. The function of the difference calculation unit 23 is realized by the CPU 11 and the RAM 12 illustrated in FIG.

本実施形態においては、差分算出部２３は、既存データ記憶部２１に記憶されている複数の既存のユーザに関するデータの平均値と、上記の取得された別のユーザに関するデータとの差分データを算出する。例えば、取得された別のユーザに関するデータにおけるユーザと同年代のユーザの平均値との差分データでも、取得された別のユーザに関するデータにおけるユーザと同身長又は同体重のユーザの平均値との差分データでもよい。 In the present embodiment, the difference calculation unit 23 calculates difference data between the average value of data related to a plurality of existing users stored in the existing data storage unit 21 and the data related to the acquired other users. To do. For example, even in the difference data between the average value of the user of the same age and the user in the data regarding the acquired another user, the difference data between the average value of the user having the same height or the same weight in the data regarding the acquired another user But you can.

差分データ記憶部２４は、差分算出部２３により算出された差分データを記憶する差分データ記憶手段であり、図２に示したＣＰＵ１１、ＲＯＭ１３、ＨＤＤ１５によりその機能が実現される。また、差分データ記憶部２４は、既存のユーザに関するデータとの差分が所定の閾値より大きい差分データを記憶してもよい。 The difference data storage unit 24 is a difference data storage unit that stores the difference data calculated by the difference calculation unit 23, and its function is realized by the CPU 11, the ROM 13, and the HDD 15 shown in FIG. Further, the difference data storage unit 24 may store difference data in which a difference from data relating to an existing user is greater than a predetermined threshold.

また、差分データ記憶部２４は、全ての項目の差分データを記憶しない場合は次のうち必要な差分データのみを選択すればよい。例えば、［１］差の絶対値が大きい項目、［２］［１］を、差分をとった集団の標準偏差で割った値の大きい項目、［３］［２］に、これまでの学習結果から導かれた寄与率などの重要度に関する値を乗算してそれが大きい項目のうち、上位の数項目を選択すればよい。［２］のように、差分をとった集団の標準偏差で割ることにより、数値のもともとのレンジが広くても狭くても同等に扱うことができる。また、[３]のように、寄与率を乗算することにより、機械学習上重要な項目を選択して情報を入手することができる。 Moreover, the difference data memory | storage part 24 should just select only the required difference data among the following, when not memorize | storing the difference data of all the items. For example, [1] an item with a large absolute value of difference, [2] [1] is an item with a large value obtained by dividing the difference by the standard deviation of the group, and [3] [2] It is only necessary to select a higher number of items among items having a large value by multiplying a value related to importance such as a contribution rate derived from. As in [2], by dividing the difference by the standard deviation of the group, it can be handled equally whether the original range of the numerical value is wide or narrow. Further, as in [3], by multiplying the contribution rate, it is possible to obtain information by selecting an item important for machine learning.

また、差分データ記憶部２４は、最も差が大きい項目として例えばＮ項目に絞って差分を記憶するようにしてもよい。ここでは、差の大きさを比較するために例えばマハラノビスの距離やノルムの定義から求められる距離等を用いてもよい。Ｎをあらかじめ決めず、差が一定以上の項目に絞って差分を残すようにしてもよい。距離を定義するパターン認識手法により、より距離が大きくなる項目のみを残すことができる。 Further, the difference data storage unit 24 may store the difference by narrowing down to N items, for example, as items having the largest difference. Here, in order to compare the magnitude of the difference, for example, a Mahalanobis distance, a distance obtained from the definition of the norm, or the like may be used. N may not be determined in advance, and the difference may be left by narrowing down to items having a certain difference or more. With the pattern recognition method that defines the distance, only items with a larger distance can be left.

さらに、差分データ記憶部２４は、近いかどうかの判断に標準偏差の数％を閾値として、類似の値を持つ項目がいくつあるか否かにより差分データを選択してもよい。また、差分データ記憶部２４は、統計・機械学習における寄与率の高い項目、又は寄与率に標準偏差を加味した差を乗じたものを記憶してもよい。これにより、距離を定義するパターン認識手法において、珍しいデータを持つユーザのみを取ることができる。 Further, the difference data storage unit 24 may select difference data depending on how many items have similar values with a threshold of several percent of the standard deviation in determining whether they are close. Further, the difference data storage unit 24 may store an item with a high contribution rate in statistics and machine learning, or a product obtained by multiplying a contribution rate by adding a standard deviation to the contribution rate. Thereby, only the user with unusual data can be taken in the pattern recognition method which defines distance.

さらに、差分データ記憶部２４は、特定の項目の差分ではなく、各項目について主成分分析した成分の差分としてもよい。この際、寄与率の高い成分としてＮ項目に絞って差分を残してもよい。さらにランダムノイズを加えてもよい。元のデータと大きく違うデータに変換して取得することにより、データが納得されやすくなり、主成分分析において重要度が下の方の項目まで取得することができる。 Furthermore, the difference data storage unit 24 may use not the difference between specific items but the difference between components obtained by principal component analysis for each item. At this time, the difference may be left as N components as components having a high contribution rate. Further, random noise may be added. By converting the data into data that is significantly different from the original data and acquiring the data, it becomes easier to convince the data, and it is possible to acquire items with lower importance in the principal component analysis.

統計部２５は、既存のユーザに関するデータと差分データを用いてユーザに関するデータの統計をとる統計手段であり、図２に示したＣＰＵ１１、ＲＡＭ１２、ＲＯＭ１３、ＨＤＤ１５によりその機能が実現される。統計部２５は、ユーザに関するデータの統計をとる際、差分データ記憶部２４に記憶された差分データのうち必要な部分のみを用いる。これにより、サーバ１０に係る処理負荷を軽減することが可能になる。 The statistical unit 25 is a statistical unit that collects statistics on the data on the user using the existing data on the user and the difference data, and the function is realized by the CPU 11, the RAM 12, the ROM 13, and the HDD 15 shown in FIG. The statistics unit 25 uses only necessary portions of the difference data stored in the difference data storage unit 24 when collecting statistics on data related to the user. Thereby, the processing load concerning the server 10 can be reduced.

統計部２５による「統計」処理には、まず、既存のユーザに関するデータと差分データを用いた平均値の計算処理がある。本実施形態では、既存のデータの参照回数に項目の値を乗じた値に差分データの値を加算したものを使用する。 The “statistics” process by the statistics unit 25 includes an average value calculation process using data relating to an existing user and difference data. In the present embodiment, a value obtained by adding the value of the difference data to the value obtained by multiplying the existing data reference count by the item value is used.

統計部２５による「統計」処理には、分散の計算処理があり、これにより計算を所望する項目だけについて差分データを元に元データを復元する。より詳細には、差分と平均値を加算することで元データを復元する。また、統計部２５による「クロス集計」処理においては、クロスで集計したい複数の項目について差分データを元に元データを復元する。また、統計部２５による「特徴量計算」処理においては、特徴量を一つずつ計算後復元した元データは破棄される。これらの処理により、元データを最小限に復元しての統計処理、学習処理が可能となる。 The “statistics” process by the statistical unit 25 includes a variance calculation process, whereby the original data is restored based on the difference data for only the item desired to be calculated. More specifically, the original data is restored by adding the difference and the average value. Further, in the “cross tabulation” processing by the statistics unit 25, the original data is restored based on the difference data for a plurality of items to be tabulated. Further, in the “feature amount calculation” process by the statistical unit 25, the original data restored after calculating the feature amounts one by one is discarded. With these processes, it is possible to perform statistical processing and learning processing by restoring the original data to the minimum.

表示部２６は、差分データ記憶部２４に、差分算出部２３により算出された差分データを記憶する際、該差分データの記憶についてユーザの許可を得るための許可確認画面を表示する表示手段である。例えば、ある会員組織からユーザが退会する場合に、差分データの全てを記憶してもよいか同意を求め、拒否された場合に本画面を表示する等の利用形態がある。なお、表示部２６は、その他、サーバ１０によって処理された処理結果を表示することができることは言うまでもない。 When the difference data calculated by the difference calculation unit 23 is stored in the difference data storage unit 24, the display unit 26 is a display unit that displays a permission confirmation screen for obtaining user permission for the storage of the difference data. . For example, when a user withdraws from a certain member organization, there is a usage form such as asking for consent to store all of the difference data, and displaying this screen when rejected. In addition, it cannot be overemphasized that the display part 26 can display the process result processed by the server 10 other than that.

＜本実施形態における処理例について＞
本実施形態における処理例について図４を参照して説明する。ここでは、既存のデータ群１０１として「ＩＤ：１、名前：田中一郎、身長：１８０、体重：７５、年齢：３８、血圧：１３０、病気：なし」、「ＩＤ：２、名前：鈴木次郎、身長：１７４、体重：６５、年齢：５５、血圧：１４５、病気：糖尿病」、「ＩＤ：１００１、名前：４０代高身長、身長：１７５、体重：６５、年齢：４５、血圧：１２８、病気：なし」を例として表示している。 <Example of processing in this embodiment>
A processing example in the present embodiment will be described with reference to FIG. Here, as the existing data group 101, “ID: 1, name: Ichiro Tanaka, height: 180, weight: 75, age: 38, blood pressure: 130, disease: none”, “ID: 2, name: Jiro Suzuki, Height: 174, Weight: 65, Age: 55, Blood pressure: 145, Disease: Diabetes, "ID: 1001, Name: 40's tall, Height: 175, Weight: 65, Age: 45, Blood pressure: 128, Disease : None "is displayed as an example.

また、取得した別のユーザに関するデータ２０１として「名前：佐藤三郎、身長：１７３、体重：５５、年齢：４９、血圧：１３０、病気：なし」を表示している。 Further, “name: Saburo Sato, height: 173, weight: 55, age: 49, blood pressure: 130, illness: none” is displayed as the data 201 relating to another acquired user.

さらに、別のユーザに関するデータ２０１と、既存のデータ群１０１におけるＩＤ：１００１で示される既存のデータとの差分データ３０１として「名前：×××、身長：−２、体重：−１０、年齢：＋４、血圧：＋２、病気：なし」を表示している。 Further, the difference data 301 between the data 201 relating to another user and the existing data indicated by ID: 1001 in the existing data group 101 is “name: xxx, height: −2, weight: −10, age: +4, blood pressure: +2, disease: none ”is displayed.

さらに、許可確認画面４０１として「あなたは４０代高身長平均データと比較して、体重が−１０、年齢が＋４という数値になっています。この差分をデータとして提供してもらえませんか？」という確認メッセージが表示された画面を表示している。このように、ユーザの情報としては差分を記憶し、その差分を記憶する場合に、そのユーザに許可を求めるため、統計・機械学習において利用可能な情報を収集しつつ個人情報を保護することができる。 Furthermore, the permission confirmation screen 401 shows that “Your weight is −10 and age is +4 compared to the average height information in your 40s. Can you provide this difference as data?” A screen with a confirmation message is displayed. As described above, the user information is stored as a difference, and when the difference is stored, the user information is required to be collected, so that the personal information can be protected while collecting information usable in statistics and machine learning. it can.

表１は、既存データ記憶部２１に記憶される既存データベースの例を示したもので、図４に示した既存のデータ群１０１に含まれるデータに対応している。つまり、表１には、ＩＤがｘ１である田中一郎のデータと、ＩＤがｘ２である鈴木次郎のデータが示されている。加えて、各データの参照回数を記憶する。データベースに参照回数を入れることで、平均値の計算を、元データの復元を行わずに実施することができる。 Table 1 shows an example of an existing database stored in the existing data storage unit 21, and corresponds to data included in the existing data group 101 shown in FIG. That is, Table 1 shows data of Ichiro Tanaka with ID x1, and data of Jiro Suzuki with ID x2. In addition, the reference count of each data is stored. By putting the number of references in the database, the average value can be calculated without restoring the original data.

表２は、差分データ記憶部２４に記憶される差分データベースの例を示したものである。この差分データベースには、「ＩＤ：ｙ１、参照ＩＤ：ｘ２、身長：＋５、体重＋１０、年齢＋４、血圧−、病気：なし」、「ＩＤ：ｙ２、参照ＩＤ：ｘ２、身長：＋１０、体重−１０、−、血圧＋１０、病気：なし」が示されている。ここでは、２つの別のユーザに関するデータについて、既存データベースにおける鈴木次郎のデータとの差分を示している。 Table 2 shows an example of the difference database stored in the difference data storage unit 24. The difference database includes “ID: y1, reference ID: x2, height: +5, weight +10, age +4, blood pressure −, illness: none”, “ID: y2, reference ID: x2, height: +10, weight − 10,-, blood pressure +10, illness: none ". Here, the difference between the data relating to two different users and the data of Jiro Suzuki in the existing database is shown.

＜本実施形態における処理手順について＞
本実施形態における処理手順について図５を参照して説明する。まず、差分データの算出・記憶に係る処理手順を図５（ａ）を参照して説明する。前提として、既存データ記憶部２１には、既存データとして、これまでにユーザから取得した個人情報等のデータが記憶されているものとする。 <About the processing procedure in this embodiment>
A processing procedure in the present embodiment will be described with reference to FIG. First, a processing procedure related to calculation / storage of difference data will be described with reference to FIG. As a premise, it is assumed that data such as personal information acquired from the user so far is stored in the existing data storage unit 21 as existing data.

まず、データ取得部２２が別のユーザに関するデータを取得する（ステップＳ１）。次に、差分算出部２３は、既存データ記憶部２１に記憶されている既存のユーザに関するデータから別のユーザに関するデータとデータが近似するものを探索する（ステップＳ２）。 First, the data acquisition unit 22 acquires data related to another user (step S1). Next, the difference calculation unit 23 searches for data similar to data related to another user from data related to an existing user stored in the existing data storage unit 21 (step S2).

差分算出部２３は、探索した既存のユーザに関するデータと別のユーザに関するデータとの差分データを算出する（ステップＳ３）。差分データ記憶部２４は、算出された差分データのうち、必要な差分のみを選択する（ステップＳ４）。 The difference calculating unit 23 calculates difference data between the searched data relating to the existing user and data relating to another user (step S3). The difference data storage unit 24 selects only a necessary difference from the calculated difference data (step S4).

必要な差分のみが選択されると、表示部２６はユーザに差分データを記憶してもよいかどうか図４に示した許可確認画面を表示し、ユーザに差分データの記憶についての許可を求める（ステップＳ５）。ユーザから許可が得られた場合、差分データ記憶部２４は差分データを記憶する（ステップＳ６）。 When only the necessary difference is selected, the display unit 26 displays the permission confirmation screen shown in FIG. 4 as to whether or not the difference data may be stored for the user, and asks the user for permission to store the difference data ( Step S5). When permission is obtained from the user, the difference data storage unit 24 stores the difference data (step S6).

次に、本実施形態における統計処理に係る処理手順について図５（ｂ）を参照して説明する。まず、統計部２５は、サーバ１０を操作するユーザから既存データ、差分データに基づいた統計処理の要求を受ける（ステップＳ１１）。 Next, a processing procedure relating to statistical processing in the present embodiment will be described with reference to FIG. First, the statistical unit 25 receives a request for statistical processing based on existing data and difference data from a user operating the server 10 (step S11).

統計部２５は、既存データ、差分データを元に最小限のデータのみ元データに復元し（ステップＳ１２）、復元した元データを使用して統計処理を実施する（ステップＳ１３）。その後、統計部２５は、統計処理を完了後、復元した元データを破棄する（ステップＳ１４）。 The statistical unit 25 restores only the minimum data to the original data based on the existing data and the difference data (step S12), and performs statistical processing using the restored original data (step S13). Thereafter, the statistical unit 25 discards the restored original data after completing the statistical processing (step S14).

＜本実施形態における分散型サーバシステムの機能ブロックについて＞
本実施形態における分散型サーバシステムＡの機能ブロックについて図６を参照して説明する。なお、各機能ブロックについて、図３を用いて説明した内容と重複する内容については記載を省略する。 <Regarding Functional Blocks of Distributed Server System in This Embodiment>
Functional blocks of the distributed server system A in this embodiment will be described with reference to FIG. In addition, about each functional block, description is abbreviate | omitted about the content which overlaps with the content demonstrated using FIG.

図６は、サーバ５００とサーバ６００とがネットワークを介して接続された構成を示している。サーバ５００が既存データ記憶部２１と差分データ記憶部２４を有し、サーバ６００がデータ取得部２２、差分算出部２３、統計部２５、及び表示部２６を有している。 FIG. 6 shows a configuration in which the server 500 and the server 600 are connected via a network. The server 500 has an existing data storage unit 21 and a difference data storage unit 24, and the server 600 has a data acquisition unit 22, a difference calculation unit 23, a statistics unit 25, and a display unit 26.

分散型サーバシステムＡでは、サーバ５００にデータを記憶する処理をメインにさせ、サーバ６００にデータの取得から算出処理までをメインにさせるように処理を分散している。これにより、サーバ５００にデータが集約して記憶され、実処理はサーバ６００が行うため、データの一括管理と実処理を行うサーバ６００の処理負荷を軽減することが可能となる。 In the distributed server system A, the processing is distributed so that the processing for storing data in the server 500 is main, and the processing from data acquisition to calculation processing is main in the server 600. As a result, data is collected and stored in the server 500, and the actual processing is performed by the server 600. Therefore, it is possible to reduce the processing load of the server 600 that performs batch management of data and actual processing.

次に、本実施形態における分散型サーバシステムＢの機能ブロックについて図７を参照して説明する。なお、各機能ブロックについて、図３及び図６を用いて説明した内容と重複する内容については記載を省略する。 Next, functional blocks of the distributed server system B in the present embodiment will be described with reference to FIG. In addition, about each functional block, description is abbreviate | omitted about the content which overlaps with the content demonstrated using FIG.3 and FIG.6.

図７は、サーバ５００、サーバ７００、及びサーバ８００とがネットワークを介して接続された構成を示している。サーバ７００がデータ取得部２２と差分算出部２３を有し、サーバ８００が統計部２５と表示部２６を有している。 FIG. 7 shows a configuration in which the server 500, the server 700, and the server 800 are connected via a network. The server 700 has a data acquisition unit 22 and a difference calculation unit 23, and the server 800 has a statistics unit 25 and a display unit 26.

分散型サーバシステムＢでは、サーバ７００にデータ取得処理と差分算出処理までをメインにさせ、サーバ８００に統計処理と表示処理をメインにさせるように処理を分散している。統計処理や表示処理を分散させることで、サーバ７００は差分算出処理を負荷なくスムーズに処理することが可能である。 In the distributed server system B, processing is distributed so that the server 700 mainly performs data acquisition processing and difference calculation processing, and the server 800 mainly performs statistical processing and display processing. By distributing statistical processing and display processing, the server 700 can smoothly perform the difference calculation processing without load.

次に、本実施形態における分散型サーバシステムＣの機能ブロックについて図８を参照して説明する。なお、各機能ブロックについて、図３を用いて説明した内容と重複する内容については記載を省略する。 Next, functional blocks of the distributed server system C in this embodiment will be described with reference to FIG. In addition, about each functional block, description is abbreviate | omitted about the content which overlaps with the content demonstrated using FIG.

図８は、サーバ９００及びサーバ１０００とがネットワークを介して接続された構成を示している。サーバ９００が既存データ記憶部２１を有し、サーバ１０００がデータ取得部２２、差分算出部２３、差分データ記憶部２４、統計部２５、表示部２６を有している。 FIG. 8 shows a configuration in which the server 900 and the server 1000 are connected via a network. The server 900 has an existing data storage unit 21, and the server 1000 has a data acquisition unit 22, a difference calculation unit 23, a difference data storage unit 24, a statistics unit 25, and a display unit 26.

分散型サーバシステムＣでは、サーバ９００に既存データを記憶する処理をメインにさせ、サーバ１０００にその他の処理を担わせるように処理を分散している。既存データは膨大かつ個人情報であるため、単体のサーバで権限を有する管理者の下で管理させることが好ましく、このような利用形態に分散型サーバシステムＣは有用である。 In the distributed server system C, the server 900 performs processing for storing existing data as the main, and the server 1000 performs processing for other processing. Since the existing data is enormous and personal information, it is preferably managed by an administrator who has authority with a single server, and the distributed server system C is useful for such usage.

次に、本実施形態における分散型サーバシステムＤの機能ブロックについて図９を参照して説明する。なお、各機能ブロックについて、図３及び図８を用いて説明した内容と重複する内容については記載を省略する。 Next, functional blocks of the distributed server system D in the present embodiment will be described with reference to FIG. In addition, about each functional block, description is abbreviate | omitted about the content which overlaps with the content demonstrated using FIG.3 and FIG.8.

分散型サーバシステムＤでは、サーバ１１００にデータ取得処理をメインにさせ、サーバ１２００に差分算出処理をメインにさせ、サーバ１３００に差分データを記憶する処理をメインにさせるように処理を分散している。また、サーバ１４００に統計処理をメインにさせ、サーバ１５００に表示処理をメインにさせるように処理を分散している。 In the distributed server system D, processing is distributed so that the server 1100 mainly performs data acquisition processing, the server 1200 mainly performs difference calculation processing, and the server 1300 mainly stores difference data. . Further, the processing is distributed so that the server 1400 mainly performs statistical processing and the server 1500 mainly performs display processing.

分散型サーバシステムＤのように、各処理を単体のサーバに行わせることにより、各サーバにかかる処理負荷を一層軽減することが可能になる。 As in the distributed server system D, by causing each server to perform each process, it is possible to further reduce the processing load on each server.

なお、上記した分散型サーバシステムＡ〜Ｄまでの構成は一例にすぎず、これら以外の構成形態をとることができることは言うまでもなく、例えばサーバ自体の処理スペックやサーバの設置場所等に応じて構成形態を種々変更することが可能である。 Note that the configuration of the distributed server systems A to D described above is merely an example, and it is needless to say that configurations other than these can be adopted. For example, the configuration is made according to the processing specifications of the server itself, the installation location of the server, and the like. Various changes can be made to the form.

以上、本実施形態によれば、既存のユーザとは別のユーザから取得したデータのうち、既存のユーザに関するデータとの差分を記憶し、その差分データから元データを算出して統計処理に使用するため、ユーザの個人情報等を直接記憶することがなくなり個人情報保護に貢献する。また、差分データの記憶についてもユーザに許可をとるため、より個人情報保護機能を強化する。さらに、差分データ自体は統計・機械学習に使用することができるデータである。 As described above, according to the present embodiment, the difference between the data acquired from a user different from the existing user and the data related to the existing user is stored, and the original data is calculated from the difference data and used for statistical processing. Therefore, the personal information of the user is not directly stored, which contributes to personal information protection. In addition, since the user is permitted to store the difference data, the personal information protection function is further strengthened. Further, the difference data itself is data that can be used for statistics and machine learning.

なお、上述する各実施の形態は、本発明の好適な実施の形態であり、本発明の要旨を逸脱しない範囲内において種々変更実施が可能である。例えば、上述した本実施形態のサーバ、分散サーバシステムにおける各処理を、ハードウェア、又は、ソフトウェア、あるいは、両者の複合構成を用いて実行することも可能である。 Each of the above-described embodiments is a preferred embodiment of the present invention, and various modifications can be made without departing from the scope of the present invention. For example, each process in the server and distributed server system of the present embodiment described above can be executed using hardware, software, or a combined configuration of both.

なお、ソフトウェアを用いて処理を実行する場合には、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれているコンピュータ内のメモリにインストールして実行させることが可能である。あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。 In the case of executing processing using software, it is possible to install and execute a program in which a processing sequence is recorded in a memory in a computer incorporated in dedicated hardware. Alternatively, the program can be installed and executed on a general-purpose computer capable of executing various processes.

１システム
１０、５００、６００、７００、８００、９００、１０００、１１００、１２００、１３００、１４００、１５００サーバ
２０クライアント
３０ネットワーク
１１ＣＰＵ
１２ＲＡＭ
１３ＲＯＭ
１４ＮＷＩ／Ｆ
１５ＨＤＤ
１６入力部
１７出力部
２１既存データ記憶部
２２データ取得部
２３差分算出部
２４差分データ記憶部
２５統計部
２６表示部 1 System 10, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 Server 20 Client 30 Network 11 CPU
12 RAM
13 ROM
14 NW I / F
15 HDD
16 Input unit 17 Output unit 21 Existing data storage unit 22 Data acquisition unit 23 Difference calculation unit 24 Difference data storage unit 25 Statistics unit 26 Display unit

特開２０１４‐０１６８７２号公報JP 2014-016872 A

Claims

Existing data storage means for storing data relating to existing users;
Data acquisition means for acquiring data relating to a user different from the existing user;
Difference calculation means for calculating difference data between data relating to an existing user stored in the existing data storage means and data relating to another user acquired by the data acquisition means;
Difference data storage means for storing difference data calculated by the difference calculation means;
Statistical means for taking statistics of data relating to the user using the data relating to the existing user and the difference data;
A server comprising:

2. The server according to claim 1, wherein the statistical unit uses only a necessary portion of the differential data stored in the differential data storage unit when collecting statistics on data relating to the user.

The server according to claim 1, wherein the difference data storage unit stores difference data having a difference from data relating to the existing user larger than a predetermined threshold.

When the difference data calculated by the difference calculation unit is stored in the difference data storage unit, the difference data storage unit includes a display unit that displays a permission confirmation screen for obtaining a user permission for storing the difference data. The server according to any one of claims 1 to 3.

The difference calculation means calculates difference data between an average value of data related to a plurality of existing users stored in the existing data storage means and data related to another user acquired by the data acquisition means. The server according to any one of claims 1 to 4, characterized in that:

A distributed server system in which functions are distributed and stored in two or more storages,
Existing data storage means for storing data relating to existing users;
Data acquisition means for acquiring data relating to a user different from the existing user;
Difference calculation means for calculating difference data between data relating to an existing user stored in the existing data storage means and data relating to another user acquired by the data acquisition means;
Difference data storage means for storing difference data calculated by the difference calculation means;
Statistical means for taking statistics of data relating to the user using the data relating to the existing user and the difference data;
A distributed server system comprising:

A computer storing data relating to an existing user in a storage unit;
A computer obtaining data relating to a user different from the existing user;
Calculating a difference data between the data related to the existing user stored in the storage unit and the acquired data related to the other user;
A computer storing the calculated difference data in the storage unit;
A computer taking statistics of data about the user using the data about the existing user and the difference data;
An information processing method comprising: