JP6174527B2

JP6174527B2 - Moving means estimation apparatus, operation method thereof, and program

Info

Publication number: JP6174527B2
Application number: JP2014139441A
Authority: JP
Inventors: 良彦数原; 浩之戸田; 義昌小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-07-07
Filing date: 2014-07-07
Publication date: 2017-08-02
Anticipated expiration: 2034-07-07
Also published as: JP2016018308A

Description

本発明は、利用者がどのような移動手段を利用したかを推定する移動手段推定装置、その方法およびプログラムに関する。 The present invention relates to a moving means estimating apparatus, a method and a program for estimating what moving means a user has used.

ユーザの位置をＧＰＳロガー等の装置で蓄積したＧＰＳ履歴情報を利用して、ユーザがどのような移動手段を利用したかを推定する技術がある（例えば非特許文献１）。ユーザの利用した移動手段を推定するには、ＧＰＳ履歴情報から異なる移動手段を利用したと想定される区間（セグメント）に分割して当該区間を抽出する前処理を行う。そして、抽出されたセグメントに対してＧＰＳ履歴情報から特徴を抽出する。その特徴を分類器で分類して移動手段の判定を行う。 There is a technique for estimating what moving means a user has used by using GPS history information accumulated by a device such as a GPS logger (for example, Non-Patent Document 1). In order to estimate the moving means used by the user, preprocessing is performed to divide the GPS history information into sections (segments) assumed to use different moving means and extract the sections. Then, a feature is extracted from the GPS history information for the extracted segment. The features are classified by a classifier and the moving means is determined.

ここで、移動手段判定の分類器については、予め人手によって用意された正解データを利用して、教師あり機械学習（例えば非特許文献２）の枠組みで生成することが可能である（非特許文献１）。 Here, the classifier for moving means determination can be generated in the framework of supervised machine learning (for example, Non-Patent Document 2) using correct data prepared in advance by hand (Non-Patent Document 2). 1).

Zheng, Y., Liu, L., Wang, L., Xie, X., “Learning transportation mode from raw gps data for geographic applications on the web”, Proc. The 17th international conference on World Wide Web, pp.247-256, 2008.Zheng, Y., Liu, L., Wang, L., Xie, X., “Learning transportation mode from raw gps data for geographic applications on the web”, Proc. The 17th international conference on World Wide Web, pp.247 -256, 2008. Crammer, K., Dekel, O. Keshet, J., Shalev-Shwartz, S. and Singer, Y., “Online Passive-Aggressive Algorithm”, Journal of Machine Learning, Vol.7, pp.551-585, 2006.Crammer, K., Dekel, O. Keshet, J., Shalev-Shwartz, S. and Singer, Y., “Online Passive-Aggressive Algorithm”, Journal of Machine Learning, Vol.7, pp.551-585, 2006 .

しかしながら、正解データを作成するコストが高いため、限られた数の正解データしか用意することができないという課題がある。そのため、予測モデルの予測精度が低下し、移動モード判定の判定精度が悪化してしまう課題がある。 However, since the cost of creating correct data is high, there is a problem that only a limited number of correct data can be prepared. Therefore, there is a problem that the prediction accuracy of the prediction model is lowered and the determination accuracy of the movement mode is deteriorated.

本発明は、この課題に鑑みてなされたものであり、限られた数の正解データしかなくても予測モデルの予測精度を劣化させないようにした移動手段推定装置、その動作方法およびプログラムを提供することを目的とする。 The present invention has been made in view of this problem, and provides a moving means estimation device, an operation method thereof, and a program that do not deteriorate the prediction accuracy of a prediction model even when there is a limited number of correct answer data. For the purpose.

本願発明の移動手段推定装置は、ユーザの位置をＧＰＳで記録したＧＰＳ履歴情報に基づいて移動手段を推定する移動手段推定装置であって、時空間頻度記憶部と訓練事例記憶部とラベルなし事例記憶部と予測モデル生成機能部とを具備する。時空間頻度記憶部は、ユーザを識別するユーザＩＤと、地図上の一定領域を表すメッシュＩＤと、１日内の時間帯と、時間帯にユーザが一定領域内に立ち入った頻度とを記憶する。訓練事例記憶部は、移動手段の事例を正しく表す正解ラベルと、ＧＰＳ履歴情報から求めた事例の特徴ベクトルとを記憶する。ラベルなし事例記憶部は、正解ラベルが与えられていない事例の特徴ベクトルを記憶する。予測モデル生成機能部は、訓練事例記憶部に記憶した全ての事例から、移動手段を予測する予測モデルを生成し、ラベルなし事例記憶部に記憶した事例の数が閾値以下の場合に、予測モデルを出力し、ラベルなし事例記憶部に記憶した事例の数が閾値より大きい場合に、正解ラベルが与えられていない事例に正解ラベルを付与したときに予測モデルの信頼度が向上する程度を表す選択スコアを、時空間頻度記憶部に記憶したデータを用いて求め、当該選択スコアがスコア閾値よりも大きい事例をラベルなし事例記憶部から抽出し、当該抽出した事例を外部に提示して注釈を要求し、入力された注釈を正解ラベルとした抽出した事例を、訓練事例記憶部に追加し、訓練事例記憶部に記憶した全ての事例を用いて、移動手段を予測する予測モデルを再計算する処理を、ラベルなし事例記憶部に記憶した事例の数が閾値以下になるまで繰り返す。 The movement means estimation apparatus of the present invention is a movement means estimation apparatus that estimates movement means based on GPS history information in which a user's position is recorded by GPS, and includes a spatio-temporal frequency storage unit, a training case storage unit, and an unlabeled case A storage unit and a prediction model generation function unit are provided. The spatiotemporal frequency storage unit stores a user ID for identifying a user, a mesh ID representing a certain area on the map, a time zone within a day, and a frequency at which the user has entered the certain area during the time zone. The training case storage unit stores a correct answer label that correctly represents the case of the moving means and a feature vector of the case obtained from the GPS history information. The unlabeled case storage unit stores a feature vector of a case that is not given a correct answer label. The prediction model generation function unit generates a prediction model for predicting the moving means from all cases stored in the training case storage unit, and when the number of cases stored in the unlabeled case storage unit is equal to or less than a threshold value, the prediction model This is a selection that represents the degree to which the reliability of the prediction model is improved when a correct label is assigned to a case that is not given a correct answer label when the number of cases stored in the unlabeled case storage unit is larger than a threshold. The score is obtained using the data stored in the spatio-temporal frequency storage unit, the case where the selected score is larger than the score threshold is extracted from the unlabeled case storage unit, and the extracted case is presented to the outside to request an annotation. A prediction model that adds an extracted case with the input annotation as a correct answer label to the training case storage unit and predicts the moving means using all cases stored in the training case storage unit The process of recalculating is repeated until the number of cases stored in the unlabeled example storage unit falls below the threshold value.

また、本願発明の移動手段推定装置の動作方法は、ユーザの位置をＧＰＳで記録したＧＰＳ履歴情報から移動手段を推定する移動手段推定装置の動作方法であって、予測モデル生成機能部が、移動手段推定装置の訓練事例記憶部に記憶した全ての事例から、移動手段を予測する予測モデルを生成し、移動手段推定装置のラベルなし事例記憶部に記憶した事例の数が閾値以下の場合に、予測モデルを出力し、ラベルなし事例記憶部に記憶した事例の数が閾値より大きい場合に、正解ラベルが与えられていない事例に正解ラベルを付与したときに予測モデルの信頼度が向上する程度を表す選択スコアを、移動手段推定装置の時空間頻度記憶部に記憶したデータを用いて求め、当該選択スコアがスコア閾値よりも大きい事例をラベルなし事例記憶部から抽出し、当該抽出した事例を外部に提示して注釈を要求し、入力された注釈を正解ラベルとした抽出した事例を、訓練事例記憶部に追加し、訓練事例記憶部に記憶した全ての事例を用いて、移動手段を予測する予測モデルを再計算する処理を、ラベルなし事例記憶部に記憶した事例の数が閾値以下になるまで繰り返す。 The operation method of the moving means estimating apparatus of the present invention is an operating method of the moving means estimating apparatus that estimates the moving means from GPS history information in which the position of the user is recorded by GPS, and the prediction model generation function unit moves When all cases stored in the training example storage unit of the means estimation device generate a prediction model for predicting the movement means, and the number of cases stored in the unlabeled case storage unit of the movement means estimation device is less than or equal to the threshold, When the number of cases stored in the unlabeled case storage unit is larger than the threshold, the degree to which the reliability of the prediction model is improved when a correct label is assigned to a case that is not given a correct answer label. A selection score to be expressed is obtained using data stored in the spatio-temporal frequency storage unit of the moving means estimation device, and a case where the selection score is greater than the score threshold is stored as an unlabeled case storage unit The extracted case is presented to the outside, an annotation is requested, the extracted case with the input annotation as the correct answer label is added to the training case storage unit, and all the cases stored in the training case storage unit are stored. Using the cases, the process of recalculating the prediction model for predicting the moving means is repeated until the number of cases stored in the unlabeled case storage unit is equal to or less than the threshold value.

また、本発明のプログラムは、上記の移動手段推定装置の各機能部をコンピュータに機能させるためのプログラムである。 Moreover, the program of this invention is a program for making a computer function each function part of said moving means estimation apparatus.

本発明の移動手段推定装置とその方法とプログラムによれば、移動手段を正しく表すラベルが与えられた事例の数が少なくても、予測モデルの予測精度を向上させることができ、ユーザの移動手段の推定精度を向上させることが可能である。 According to the moving means estimation apparatus, the method, and the program of the present invention, the prediction accuracy of the prediction model can be improved even if the number of cases given labels correctly representing the moving means is small, and the moving means of the user It is possible to improve the estimation accuracy.

本発明の実施形態の移動手段推定装置１００の機能構成例を示す図。The figure which shows the function structural example of the moving means estimation apparatus 100 of embodiment of this invention. 移動手段推定装置１００をコンピュータで構成した場合の機能構成例を示す図。The figure which shows the function structural example at the time of comprising the moving means estimation apparatus 100 with a computer. 移動手段推定装置１００が、予測モデルを生成する動作フローを示す図。The figure which shows the operation | movement flow in which the movement means estimation apparatus 100 produces | generates a prediction model. 移動手段推定装置１００の時空間頻度計算部１０の動作フローを示す図。The figure which shows the operation | movement flow of the spatio-temporal frequency calculation part 10 of the movement means estimation apparatus 100. 移動手段推定装置１００のＧＰＳ履歴情報記憶部６０に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the GPS historical information storage part 60 of the movement means estimation apparatus 100. FIG. 移動手段推定装置１００の正解アノテーション記憶部８５に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the correct annotation memory | storage part 85 of the movement means estimation apparatus 100. FIG. 移動手段推定装置１００の時空間頻度記憶部２０に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the spatiotemporal frequency memory | storage part 20 of the movement means estimation apparatus 100. FIG. 移動手段推定装置１００のセグメント抽出機能部７０の動作フローを示す図。The figure which shows the operation | movement flow of the segment extraction function part 70 of the movement means estimation apparatus 100. FIG. 移動手段推定装置１００のセグメント記憶部６５に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the segment memory | storage part 65 of the moving means estimation apparatus 100. FIG. 移動手段推定装置１００の事例生成機能部６０の動作フローを示す図。The figure which shows the operation | movement flow of the case generation function part 60 of the moving means estimation apparatus 100. 移動手段推定装置１００の訓練事例記憶部４０に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the training example memory | storage part 40 of the movement means estimation apparatus 100. FIG. 移動手段推定装置１００のラベルなし事例記憶部３０に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the unlabeled example memory | storage part 30 of the moving means estimation apparatus 100. FIG. 移動手段推定装置１００の予測モデル生成機能部５０の機能構成例を示す図。The figure which shows the function structural example of the prediction model production | generation function part 50 of the movement means estimation apparatus 100. FIG. 予測モデル生成機能部５０の動作フローを示す図。The figure which shows the operation | movement flow of the prediction model production | generation function part 50. FIG. 移動手段推定装置１００の予測モデル記憶部９０に記憶された情報の例を示す図。The figure which shows the example of the information memorize | stored in the prediction model memory | storage part 90 of the movement means estimation apparatus 100. FIG.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには
同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、本実施形態の移動手段推定装置１００の機能構成例を示す。移動手段推定装置１００は、ユーザの位置をＧＰＳで記録したＧＰＳ履歴情報に基づいて移動手段を推定するものである。 In FIG. 1, the function structural example of the moving means estimation apparatus 100 of this embodiment is shown. The moving means estimation apparatus 100 estimates moving means based on GPS history information in which a user's position is recorded by GPS.

移動手段とは、ユーザが移動するのに用いる徒歩、車、電車、バスなどのことである。移動手段が正しく推定できれば、ユーザが携帯する機器において適切なサービスを提供することが可能である。例えば、ユーザが車を運転中の場合と電車で移動中の場合で、提供するサービスを異ならせることができる。 A moving means is a walk, a car, a train, a bus, or the like used by a user to move. If the moving means can be estimated correctly, it is possible to provide an appropriate service in the device carried by the user. For example, the service to be provided can be different depending on whether the user is driving a car or traveling on a train.

移動手段推定装置１００は、時空間頻度計算部１０、時空間頻度記憶部２０、ラベルなし事例記憶部３０、訓練事例記憶部４０、予測モデル生成機能部５０、事例生成機能部６０、セグメント記憶部６５、セグメント抽出機能部７０、ＧＰＳ履歴情報記憶部８０、正解アノテーション記憶部８５、予測モデル記憶部９０を具備する。移動手段推定装置１００は、例えば、コンピュータで実現することができる。 The movement means estimation device 100 includes a spatiotemporal frequency calculation unit 10, a spatiotemporal frequency storage unit 20, an unlabeled case storage unit 30, a training case storage unit 40, a prediction model generation function unit 50, a case generation function unit 60, and a segment storage unit. 65, a segment extraction function unit 70, a GPS history information storage unit 80, a correct annotation storage unit 85, and a prediction model storage unit 90. The moving means estimation apparatus 100 can be realized by a computer, for example.

移動手段推定装置１００をコンピュータで実現した場合の機構構成例を図２に示す。図２において、時空間頻度記憶部２０、ラベルなし事例記憶部３０、訓練事例記憶部４０、セグメント記憶部６５、ＧＰＳ履歴情報記憶部８０、正解アノテーション記憶部８５、予測モデル記憶部９０、の各記憶部は、図１との対応を分かり易くする目的でそれぞれを個別に表記している。なお、各記憶部はメモリ６に全て集約してもよい。 FIG. 2 shows an example of the mechanism configuration when the moving means estimating apparatus 100 is realized by a computer. In FIG. 2, each of the spatiotemporal frequency storage unit 20, the unlabeled case storage unit 30, the training case storage unit 40, the segment storage unit 65, the GPS history information storage unit 80, the correct annotation storage unit 85, and the prediction model storage unit 90. Each of the storage units is individually described for the purpose of making the correspondence with FIG. 1 easy to understand. Note that all the storage units may be consolidated in the memory 6.

時空間頻度計算部１０と予測モデル生成機能部５０と事例生成機能部６０の各機能部で行う処理は、ＣＰＵ５がプログラムを実行することで実現される。入力部７は例えばキーボードであり、本実施形態では入力部７からユーザの操作によって注釈が入力される。表示部８には、推定した移動手段が例えば地図情報と共に表示される。 The processing performed by the functional units of the spatiotemporal frequency calculation unit 10, the prediction model generation function unit 50, and the case generation function unit 60 is realized by the CPU 5 executing a program. The input unit 7 is, for example, a keyboard. In this embodiment, an annotation is input from the input unit 7 by a user operation. The display unit 8 displays the estimated moving means together with, for example, map information.

図３に、移動手段推定装置１００が予測モデルを生成する全体の動作フローを示す。図１と図３を参照して予測モデルを生成する全体の流れを説明する。 FIG. 3 shows an overall operation flow in which the moving means estimation apparatus 100 generates a prediction model. The overall flow of generating a prediction model will be described with reference to FIGS.

時空間頻度計算部１０は、ＧＰＳ履歴情報記憶部８０に記憶されたログ情報と、正解アノテーション記憶部８５に記憶されたある時間区間において利用した移動手段の情報を用いて、ユーザＩＤと、地図上の一定領域を表すメッシュＩＤと、１日内の時間帯と、ある時間帯にあるユーザがある一定領域に入った頻度を計算し、それらの情報を時空間頻度記憶部２０に記憶する（ステップＳ１０）。 The spatio-temporal frequency calculation unit 10 uses the log information stored in the GPS history information storage unit 80 and the information on the moving means used in a certain time interval stored in the correct annotation storage unit 85, the user ID, the map The mesh ID representing the upper certain area, the time zone within a day, and the frequency at which the user in a certain time zone enters the certain area are calculated, and the information is stored in the spatio-temporal frequency storage unit 20 (step S10).

事例生成機能部６０は、ＧＰＳ履歴情報記憶部８０に記憶されたログ情報と、正解アノテーション記憶部８５に記憶されたある時間区間において利用した移動手段の情報と、セグメント記憶部６５に記憶された移動手段が同一の区間を表すセグメント情報から、予測モデルを生成するための移動手段の事例を正しく表す正解ラベルと特徴量の訓練事例を生成し、訓練事例記憶部４０に記憶する（ステップＳ２０）。また、事例生成機能部６０は、正解ラベルが与えられていない事例の特徴量を生成し、ラベルなし事例記憶部３０に記憶する（ステップＳ３０）。 The case generation function unit 60 stores the log information stored in the GPS history information storage unit 80, the information on the moving means used in a certain time interval stored in the correct answer storage unit 85, and the segment storage unit 65. A training example of a correct answer label and a feature amount that correctly represents a case of the moving means for generating a prediction model is generated from the segment information representing the same section by the moving means, and is stored in the training case storage unit 40 (step S20). . Further, the case generation function unit 60 generates a feature amount of a case that is not given a correct answer label and stores it in the unlabeled case storage unit 30 (step S30).

予測モデル生成機能部５０は、訓練事例記憶部４０に記憶した全ての事例から、移動手段を予測する予測モデルを生成し、ラベルなし事例記憶部３０に記憶した事例の数が閾値以下の場合に、予測モデルを出力し、ラベルなし事例記憶部３０に記憶した事例の数が閾値より大きい場合に、正解ラベルが与えられていない事例に正解ラベルを付与したときに予測モデルの信頼度が向上する程度を表す選択スコアを、時空間頻度記憶部２０に記憶したデータを用いて求め、当該選択スコアがスコア閾値よりも大きい事例をラベルなし事例記憶部３０から抽出し、当該抽出した事例を外部に提示して注釈を要求し、入力された注釈を正解ラベルとした抽出した事例を、訓練事例記憶部４０に追加し、訓練事例記憶部４０に記憶した全ての事例を用いて、移動手段を予測する予測モデルを再計算する処理を、ラベルなし事例記憶部３０に記憶した事例の数が閾値以下になるまで繰り返して予測モデルを生成する（ステップＳ４０）。 The prediction model generation function unit 50 generates a prediction model for predicting the moving means from all cases stored in the training case storage unit 40, and the number of cases stored in the unlabeled case storage unit 30 is equal to or less than a threshold value. When a prediction model is output and the number of cases stored in the unlabeled case storage unit 30 is larger than the threshold, the reliability of the prediction model is improved when a correct label is assigned to a case that is not given a correct answer label. A selection score representing the degree is obtained using data stored in the spatio-temporal frequency storage unit 20, a case where the selection score is larger than the score threshold is extracted from the unlabeled case storage unit 30, and the extracted case is externally provided. Annotation is presented to request an annotation, and the extracted case with the input annotation as a correct answer label is added to the training case storage unit 40, and all cases stored in the training case storage unit 40 are used. The process of recalculating the prediction model for predicting the moving means, the number of cases stored in the unlabeled example storage unit 30 is repeated to generate a predictive model until below a threshold value (step S40).

以降、図面を参照して移動手段推定装置１００の各機能構成部の動作を詳しく説明する。 Hereinafter, the operation of each functional component of the moving means estimating apparatus 100 will be described in detail with reference to the drawings.

〔時空間頻度計算部〕
図４に、時空間頻度計算部１０の動作フローを示す。時空間頻度計算部１０は、ユーザの位置をＧＰＳで記録したＧＰＳ履歴情報記憶部８０から未処理のユーザＩＤを取得してｕとする（ステップＳ１１）。 [Spatial frequency calculation part]
FIG. 4 shows an operation flow of the spatiotemporal frequency calculation unit 10. The spatio-temporal frequency calculation unit 10 acquires an unprocessed user ID from the GPS history information storage unit 80 that records the user's position by GPS and sets it as u (step S11).

図５に、ＧＰＳ履歴情報記憶部６０に記憶されたＧＰＳ履歴情報の例を示す。図５の１行はＧＰＳ履歴情報の１レコードを表す。１レコードは、ログＩＤとユーザＩＤとＧＰＳロガーによって取得された取得時刻、緯度、経度、の情報を含む。図５は、ログＩＤの時間間隔を３秒で示すように３秒周期でＧＰＳ履歴情報が記憶される例を示す。 FIG. 5 shows an example of GPS history information stored in the GPS history information storage unit 60. One line in FIG. 5 represents one record of GPS history information. One record includes information of a log ID, a user ID, and an acquisition time, latitude, and longitude acquired by a GPS logger. FIG. 5 shows an example in which GPS history information is stored in a cycle of 3 seconds so that the log ID time interval is 3 seconds.

レコードはログＩＤで識別できる。ログＩＤは全ユーザに渡って一意である。ユーザＩＤ＝１が未処理であれば、ユーザＩＤ＝１をｕとする。 Records can be identified by log IDs. The log ID is unique across all users. If user ID = 1 is unprocessed, user ID = 1 is set to u.

次に、時空間頻度計算部１０は、正解アノテーション記憶部８５からユーザＩＤ＝ｕに該当するレコードを取得する（ステップＳ１２）。正解アノテーション記憶部８５は、各ユーザについて、ある時間区間においてどのような移動手段（正解ラベル）を利用したのかを正しく表す情報を記憶する。 Next, the spatio-temporal frequency calculation unit 10 acquires a record corresponding to the user ID = u from the correct annotation storage unit 85 (step S12). The correct answer annotation storage unit 85 stores information that correctly indicates what moving means (correct answer label) is used in a certain time interval for each user.

図６に、正解アノテーション記憶部８５に記憶された情報を例示する。図６の１行は正解アノテーション情報の１レコードを表す。１レコードは、ユーザＩＤと、開始時刻と終了時刻によって特定される時間区間と、当該時間区間に利用した移動手段（正解ラベル）の情報を含む。なお、正解アノテーション記憶部８５には、ＧＰＳ履歴情報記憶部８０に記憶されたＧＰＳ履歴情報に対応する全ての情報が記憶されている必要はない。また、ある時間区間に対して、複数の異なる正解ラベルが付与されていることはないものとする。 FIG. 6 illustrates information stored in the correct annotation storage unit 85. One line in FIG. 6 represents one record of correct answer information. One record includes information on the user ID, the time interval specified by the start time and the end time, and the moving means (correct answer label) used for the time interval. The correct answer storage unit 85 need not store all information corresponding to the GPS history information stored in the GPS history information storage unit 80. Further, it is assumed that a plurality of different correct answer labels are not given to a certain time section.

時空間頻度計算部１０は、正解アノテーション情報の時間区間に対応する同一ユーザの測位点を、ＧＰＳ履歴情報の経度／緯度情報を用いて求め、その測位点の集合に対応するメッシュＩＤの集合を取得して時間帯ごとの頻度情報を計算する（ステップＳ１３）。頻度情報は、当該時間帯にメッシュＩＤの領域をユーザが訪問した頻度を表す。なお、頻度情報は、ユーザが訪問したら必ず１を加算するのではなく、ある１日の側位点集合に対して１しか加算しないなど他のカウント方法を用いて計算してもよい。 The spatio-temporal frequency calculation unit 10 obtains the positioning point of the same user corresponding to the time interval of the correct annotation information using the longitude / latitude information of the GPS history information, and obtains a set of mesh IDs corresponding to the set of positioning points. Obtain and calculate frequency information for each time zone (step S13). The frequency information represents the frequency with which the user visited the mesh ID area during the time period. Note that the frequency information is not necessarily incremented when the user visits, but may be calculated using another counting method such as adding only 1 to a set of side points for a certain day.

メッシュＩＤは、地理上の一定領域を表すメッシュに対応するＩＤで、例えば標準地域メッシュ（JISX0410地域メッシュコード）などを用いることができる。時間帯には例えば２４時間を４分割した単位を用いることができる。他にも１時間ごとの単位や、朝、昼、晩、といった単位を用いることもできる。 The mesh ID is an ID corresponding to a mesh representing a certain geographical area. For example, a standard area mesh (JISX0410 area mesh code) can be used. For the time zone, for example, a unit obtained by dividing 24 hours into four can be used. Other units such as an hourly unit or morning, noon, and evening can also be used.

時空間頻度計算部１０は、計算した頻度情報を、時空間頻度記憶部２０に記憶する（ステップＳ１４）。そして、未処理のユーザが無くなるまでステップＳ１１〜Ｓ１４までの処理を繰り返し（ステップＳ１５のＹＥＳ）、未処理のユーザが無くなった時点で上記の処理を終了する（ステップＳ１５のＮＯ）。 The spatiotemporal frequency calculation unit 10 stores the calculated frequency information in the spatiotemporal frequency storage unit 20 (step S14). Then, the processes from step S11 to S14 are repeated until there are no unprocessed users (YES in step S15), and the above process is terminated when there are no unprocessed users (NO in step S15).

図７に、時空間頻度記憶部２０に記憶された時空間頻度情報の例を示す。ユーザＩＤ＝１のユーザが、１日を４分割した時間帯にメッシュＩＤで表される一定領域に訪問した頻度が記憶される。図７に示す例は、ユーザＩＤ＝１のユーザがＡＭ６時−１２時の時間帯に5438-2343のメッシュＩＤで表される領域に３回訪問したことを表している。 FIG. 7 shows an example of the spatiotemporal frequency information stored in the spatiotemporal frequency storage unit 20. The frequency with which the user with the user ID = 1 visits a certain area represented by the mesh ID in the time period obtained by dividing the day into four is stored. The example shown in FIG. 7 represents that the user with the user ID = 1 visited the area represented by the mesh ID of 5438-2343 three times in the time zone from 6 am to 12 am.

〔セグメント抽出機能部〕
図８に、セグメント抽出機能部７０の動作フローを示す。セグメント抽出機能部７０は、ＧＰＳ履歴情報記憶部８０から、同じ移動手段で移動した連続する軌跡の最小単位であるセグメントを抽出する。 [Segment Extraction Function]
FIG. 8 shows an operation flow of the segment extraction function unit 70. The segment extraction function unit 70 extracts, from the GPS history information storage unit 80, a segment that is the minimum unit of continuous trajectories moved by the same moving means.

セグメントは、ＧＰＳ履歴情報記憶部６０に記憶された連続する複数のレコード（図５の１行）で構成され、複数のセグメントは連続するＧＰＳ軌跡(GPS Trajectory)の単位を構成する。１つのセッションは、例えば、バスで移動したセグメントと、徒歩で移動したセグメントと、バイクで移動したセグメントの３つのセグメントで構成される。 The segment is composed of a plurality of continuous records (one line in FIG. 5) stored in the GPS history information storage unit 60, and the plurality of segments constitutes a unit of a continuous GPS trajectory (GPS Trajectory). One session is composed of, for example, three segments: a segment moved by bus, a segment moved by foot, and a segment moved by motorcycle.

セグメント抽出機能部７０は、ＧＰＳ履歴情報記憶部８０から未処理のユーザＩＤを取得してｕとする（ステップＳ７１）。そして、例えばユーザＩＤ＝１をｕとして、該当するレコード集合をＧＰＳ履歴情報記憶部８０から取得し、当該レコード集合をセッションに分割する（ステップＳ７２）。セッション分割の方法としては、連続するレコードの時刻に、閾値θｓ秒以上の差がある場合に分割点とする。閾値θｓは予め設定されているものとする。セッション分割の他の方法としては、例えば非特許文献１に記載されている方法を用いることができる。 The segment extraction function unit 70 acquires an unprocessed user ID from the GPS history information storage unit 80 and sets it as u (step S71). Then, for example, with user ID = 1 as u, the corresponding record set is acquired from the GPS history information storage unit 80, and the record set is divided into sessions (step S72). As a session division method, a division point is set when there is a difference of the threshold θs seconds or more in the time of successive records. It is assumed that the threshold value θs is set in advance. As another method of session division, for example, the method described in Non-Patent Document 1 can be used.

次に、セグメント抽出機能部７０は、セッションをセグメントに分割する（ステップＳ７３）。セグメント分割の方法としては、非特許文献１に記載された公知の方法を用いることができる。 Next, the segment extraction function unit 70 divides the session into segments (step S73). As a segment dividing method, a known method described in Non-Patent Document 1 can be used.

セグメント抽出機能部７０は、セッションを分割して抽出したセグメントを、セグメント記憶部６５に記憶する（ステップＳ７４）。そして、未処理のユーザが無くなるまでステップＳ７１〜Ｓ７４までの処理を繰り返し（ステップＳ７５のＹＥＳ）、未処理のユーザが無くなった時点で上記の処理を終了する（ステップＳ７５のＮＯ）。 The segment extraction function unit 70 stores the segment extracted by dividing the session in the segment storage unit 65 (step S74). Then, the processes in steps S71 to S74 are repeated until there are no unprocessed users (YES in step S75), and the above process is terminated when there are no unprocessed users (NO in step S75).

図９に、セグメント記憶部６５に記憶されたセグメント情報の例を示す。セグメント情報は、セグメントＩＤ、セッションＩＤ、ユーザＩＤ、開始ログＩＤ、終了ログＩＤからなる。開始ログＩＤと終了ログＩＤは、ＧＰＳ履歴情報記憶部８０に記憶されたログＩＤのＩＤ番号と対応している。 FIG. 9 shows an example of segment information stored in the segment storage unit 65. The segment information includes a segment ID, a session ID, a user ID, a start log ID, and an end log ID. The start log ID and the end log ID correspond to the ID number of the log ID stored in the GPS history information storage unit 80.

この例では、開始ログＩＤ＝１〜終了ログＩＤ＝１０までのセグメントＩＤ＝１と、開始ログＩＤ＝１１〜終了ログＩＤ＝２０４までのセグメントＩＤ＝２の２つで、１つのセッション１が構成される。 In this example, one session 1 has two segment IDs = 1, starting log ID = 1 to ending log ID = 10 and segment ID = 2 starting log ID = 11 to ending log ID = 204. Composed.

〔事例生成機能部〕
図１０に、事例生成機能部６０の動作フローを示す。事例生成機能部６０は、ＧＰＳ履歴情報と正解アノテーション情報とセグメント情報を入力として、移動手段の事例を正しく表す正解ラベルと当該事例の特徴ベクトルと、正解ラベルが与えられていない事例の特徴ベクトルを生成する。 [Case generation function part]
FIG. 10 shows an operation flow of the case generation function unit 60. The case generation function unit 60 receives the GPS history information, the correct answer annotation information, and the segment information, and inputs the correct label that correctly represents the case of the moving means, the feature vector of the case, and the feature vector of the case that is not given the correct answer label. Generate.

事例生成機能部６０は、セグメント記憶部６５に記憶されたセグメント情報から未処理のセグメント情報を選択してｓとする（ステップＳ６１）。次に、選択したセグメント情報ｓの開始ログＩＤと終了ログＩＤの区間に該当するレコードを、ＧＰＳ履歴情報記憶部８０から読み出す。そして、読み出したレコードからセグメント情報ｓの特徴を抽出する（ステップＳ６２）。 The case generation function unit 60 selects unprocessed segment information from the segment information stored in the segment storage unit 65 and sets it as s (step S61). Next, the record corresponding to the section of the start log ID and the end log ID of the selected segment information s is read from the GPS history information storage unit 80. Then, the feature of the segment information s is extracted from the read record (step S62).

特徴には、例えばセグメント情報ｓの開始ログＩＤ〜終了ログＩＤ間における平均速度などを利用する。特徴抽出には、例えば非特許文献１に記載された公知の方法を用いることができる。抽出された特徴はＭ次元のベクトルｘ_１，ｘ_２，…，ｘ_Ｍで表現される。 As the feature, for example, an average speed between the start log ID and the end log ID of the segment information s is used. For the feature extraction, for example, a known method described in Non-Patent Document 1 can be used. The extracted feature vector _x 1 _{M-dimensional,} x _{2, ...,} expressed in _{x M.}

次に、事例生成機能部６０は、セグメント情報ｓに対応するラベル（移動手段）を、正解アノテーション記憶部８５から読み込む（ステップＳ６３）。具体的に説明する。 Next, the case generation function unit 60 reads a label (moving means) corresponding to the segment information s from the correct annotation storage unit 85 (step S63). This will be specifically described.

正解アノテーション情報のうち、セグメント情報ｓのユーザＩＤと同じユーザＩＤを持つ各レコード（図６の１行）の開始時刻と終了時刻の時刻範囲に対する、セグメント情報ｓの開始ログＩＤがＧＰＳ履歴情報記憶部８０で持つ時刻（開始時刻）と、終了ログＩＤがＧＰＳ履歴情報記憶部８０で持つ時刻（終了時刻）とで決まる時刻範囲の被覆率が、最大のレコードを選択し、そのレコードが持つ移動手段を正解ラベルとして採用する。 Among the correct annotation information, the start log ID of the segment information s for the time range of the start time and end time of each record (one line in FIG. 6) having the same user ID as the user ID of the segment information s is stored in the GPS history information. The record having the maximum coverage in the time range determined by the time (start time) held by the section 80 and the time (end time) whose end log ID is held by the GPS history information storage section 80 is selected, and the movement that the record has The means is adopted as the correct answer label.

正解アノテーション情報の移動手段（正解ラベル）を、セグメント情報ｓから求めた特徴に与えるか否かは、被覆率で判定する。被覆率が閾値θｃ以上で且つ最大のセグメント情報ｓの特徴は、移動手段の事例を正しく表すものとして、当該特徴に正解ラベルを与える。ラベルが与えられた特徴は、訓練事例として訓練事例記憶部４０に記憶する（ステップＳ６５）。 Whether or not to give the correct annotation information moving means (correct label) to the feature obtained from the segment information s is determined by the coverage. The feature of the segment information s with the coverage ratio equal to or greater than the threshold value θc and correctly representing the case of the moving means is given a correct answer label. The feature given the label is stored in the training example storage unit 40 as a training example (step S65).

被覆率が閾値θｃ未満のセグメント情報ｓは正解ラベルがないものとして、当該セグメント情報ｓのユーザＩＤ、セッションＩＤ、特徴、メッシュＩＤ、時間帯、をラベルなし事例としてラベルなし事例記憶部３０に記憶する（ステップＳ６６）。ここでのメッシュＩＤは、セグメントの開始位置又は終了位置又は中間位置に対応するメッシュＩＤを用いることができる。また、時簡帯についても開始位置又は終了位置又は中間位置に対応する時簡帯などを用いることができる。 The segment information s whose coverage is less than the threshold θc has no correct answer label, and the user ID, session ID, feature, mesh ID, and time zone of the segment information s are stored in the unlabeled case storage unit 30 as unlabeled cases. (Step S66). As the mesh ID here, a mesh ID corresponding to the start position, end position, or intermediate position of the segment can be used. Also, the time zone corresponding to the start position, the end position, or the intermediate position can be used for the time zone.

事例生成機能部６０は、未処理のセグメント情報ｓが無くなるまでステップＳ６１〜Ｓ６６までの処理を繰り返し（ステップＳ６７のＹＥＳ）、未処理のセグメント情報ｓが無くなった時点で上記の処理を終了する（ステップＳ６７のＮＯ）。 The case generation function unit 60 repeats the processing from step S61 to S66 until there is no unprocessed segment information s (YES in step S67), and ends the above processing when there is no unprocessed segment information s ( NO of step S67).

図１１に、事例生成機能部６０が生成した訓練事例の例を示す。移動手段（正解ラベル）に対応する特徴ベクトルが記憶されている。図１２に、事例生成機能部６０が生成したラベルなし事例の例を示す。セッションＩＤごとに特徴ベクトルとメッシュＩＤと時間帯が、ユーザＩＤごとに記憶されている。 In FIG. 11, the example of the training example which the example generation function part 60 produced | generated is shown. A feature vector corresponding to the moving means (correct answer label) is stored. FIG. 12 shows an example of an unlabeled case generated by the case generation function unit 60. A feature vector, a mesh ID, and a time zone are stored for each user ID for each session ID.

〔予測モデル生成機能部〕
図１３に、予測モデル生成機能部５０のより具体的な機能構成例を示す。予測モデル生成部機能５０は、予測モデル生成部５２と、予測モデル出力部５３と、選択スコア計算部５４と、アノテーション要求部５５と、ラベルなし事例記憶更新部５６と、予測モデル再計算部５７とを具備する。 [Prediction model generation function]
FIG. 13 shows a more specific functional configuration example of the prediction model generation function unit 50. The prediction model generation unit function 50 includes a prediction model generation unit 52, a prediction model output unit 53, a selection score calculation unit 54, an annotation request unit 55, an unlabeled case storage update unit 56, and a prediction model recalculation unit 57. It comprises.

図１４に、予測モデル生成機能部５０の動作フローを示す。図１３と図１４を参照して予測モデル生成機能部５０の動作を説明する。予測モデル生成部５２は、訓練事例記憶部４０に記憶された全ての訓練事例を読み込み訓練事例集合Ｔとする（ステップＳ５１）。そして、訓練事例集合Ｔを用いて予測モデルＣを生成する（ステップＳ５２）。予測モデルＣは、訓練事例記憶部４０に記憶した全ての事例を用いたマルチクラス分類によって生成される。 FIG. 14 shows an operation flow of the prediction model generation function unit 50. The operation of the prediction model generation function unit 50 will be described with reference to FIGS. 13 and 14. The prediction model generation unit 52 reads all the training cases stored in the training case storage unit 40 and sets it as a training case set T (step S51). And the prediction model C is produced | generated using the training example set T (step S52). The prediction model C is generated by multi-class classification using all cases stored in the training case storage unit 40.

マルチクラス分類とは、複数のクラスの中から一つを予測する分類法である。その予測モデルの構築は、例えば非特許文献２に記載された公知の方法を用いて行う。この方法以外でも、マルチクラス分類が可能な教師あり機械学習方法であれば任意の方法を利用することができる。本実施形態では、特徴ベクトルと同じ次元の重みベクトルの内積によって予測を行う線形識別モデルの例で説明を行うが、本実施形態に利用可能なアルゴリズムは線形識別モデルに限定されない。非線形モデルを適用する場合には、予測モデルを記憶する記憶部のデータ構造にアルゴリズムに合わせたスキーマを用いる。 Multi-class classification is a classification method that predicts one of a plurality of classes. The prediction model is constructed using a known method described in Non-Patent Document 2, for example. Other than this method, any method can be used as long as it is a supervised machine learning method capable of multi-class classification. In the present embodiment, an example of a linear identification model that performs prediction based on an inner product of weight vectors of the same dimension as a feature vector will be described. However, an algorithm that can be used in the present embodiment is not limited to a linear identification model. When a nonlinear model is applied, a schema that matches the algorithm is used for the data structure of the storage unit that stores the prediction model.

予測モデル出力部５３は、ラベルなし事例記憶部３０に記憶したラベルなし事例集合Ｕの要素数（事例の数）が閾値θ以下の場合に、予測モデルＣを外部に出力（ステップＳ５３のＹＥＳ）し、予測モデル記憶部９０に記憶（ステップＳ５８）して動作を終了する。当該事例の数が閾値θより大きい場合に、予測モデルＣを選択スコア計算部５４に出力する（ステップＳ５３のＮＯ）。閾値θは予め設定されているものとする。 The prediction model output unit 53 outputs the prediction model C to the outside when the number of elements (number of cases) of the unlabeled case set U stored in the unlabeled case storage unit 30 is equal to or less than the threshold θ (YES in step S53). And it memorize | stores in the prediction model memory | storage part 90 (step S58), and operation | movement is complete | finished. When the number of cases is larger than the threshold θ, the prediction model C is output to the selection score calculation unit 54 (NO in step S53). The threshold value θ is set in advance.

選択スコア計算部５４は、予測モデルＣを用いてラベルなし事例記憶部３０に記憶した各事例について移動手段の予測を行い、当該予測の確からしさの程度を表す確信度Ｆと、時空間頻度記憶部２０に記憶したメッシュＩＤと時間帯とから時空間的な珍しさの程度を表す地理スコアＧを求め、確信度Ｆと地理スコアＧから、予測モデルの性能が向上する程度を表す選択スコアＳを計算する（ステップＳ５４）。確信度Ｆには、線形識別モデルの場合、例えば特徴ベクトルと重みベクトルの内積の値の最大値などを用いることができる。他には、例えば非特許文献２に記載された公知の方法を用いることができる。このような確信度Ｆを計算できない予測モデルを利用する場合には、確信度Ｆは１であるものとする。 The selection score calculation unit 54 performs prediction of the moving means for each case stored in the unlabeled case storage unit 30 using the prediction model C, and a certainty factor F indicating the degree of certainty of the prediction and a spatio-temporal frequency storage. The geographic score G representing the degree of spatiotemporal rarity is obtained from the mesh ID and time zone stored in the unit 20, and the selection score S representing the degree to which the prediction model performance is improved from the certainty factor F and the geographic score G. Is calculated (step S54). For the certainty factor F, in the case of the linear identification model, for example, the maximum value of the inner product of the feature vector and the weight vector can be used. In addition, the well-known method described in the nonpatent literature 2, for example can be used. When such a prediction model that cannot calculate the certainty factor F is used, the certainty factor F is assumed to be 1.

地理スコアＧは、ラベルなし事例のメッシュＩＤと時間帯情報を用いて計算する。ラベルなし事例集合Ｕに含まれるある事例のユーザＩＤをｕ、メッシュＩＤをｍ、時間帯をｔ、とすると地理スコアＧは式（１）で算出する。 The geographic score G is calculated using the mesh ID of the unlabeled case and the time zone information. If the user ID of a case included in the unlabeled case set U is u, the mesh ID is m, and the time zone is t, the geographic score G is calculated by the equation (1).

ここでc(u,m,t)は、時空間頻度記憶部２０におけるユーザｕ，メッシュＩＤ_ｍ，時間帯
ｔの頻度を表す。同様にc(u,m)はユーザｕ，メッシュＩＤ_ｍの頻度、c(u,t)はユーザｕ,時間帯ｔの頻度、c(m,t)はメッシュＩＤ_ｍ, 時間帯ｔの頻度、c(u)はユーザｕの頻度、c(m)はメッシュＩＤ_ｍの頻度、c(t)は時間帯ｔの頻度を表す。また、λ_１，…，λ_７は各頻度の重み係数でありλ_１＋…＋λ_７＝１となるように予め設定されているものとする。

Here, c (u, m, t) represents the frequency of the user u, mesh ID _m , and time zone t in the spatio-temporal frequency storage unit 20. Similarly, c (u, m) is the frequency of user u and mesh ID _m , c (u, t) is the frequency of user u and time zone t, and c (m, t) is the frequency of mesh ID _m and time zone t. , C (u) represents the frequency of user u, c (m) represents the frequency of mesh ID _m , and c (t) represents the frequency of time zone t. In addition, λ ₁ ,..., Λ ₇ are weighting coefficients for each frequency, and are set in advance so that λ ₁ +... + Λ ₇ = 1.

確信度Ｆと地理スコアＧを用いて選択スコアＳは、例えば式（２）で算出できる。 Using the certainty factor F and the geographic score G, the selection score S can be calculated by, for example, Expression (2).

ここでαは重みパラメータであり予め設定されているものとする。選択スコアＳが大きいほど、現在の予測モデルＣでは正確な移動手段の予測が困難であることを表す。

Here, α is a weight parameter and is set in advance. The larger the selection score S, the more difficult it is to accurately predict the moving means in the current prediction model C.

選択には、例えば選択スコアＳに対する閾値θｓ以上の事例のみを選択する方法を用いることができる。閾値θｓは予め設定されているものとする。また、同じユーザＩＤを持つラベルなし事例集合Ｕに対する選択スコアＳの最小値に対して閾値θｓを設けてもよい。選択された事例集合を追加事例集合Ｖとし、ラベルなし事例集合Ｕから取り除く（式（３））。 For the selection, for example, a method of selecting only cases having a threshold value θs or more with respect to the selection score S can be used. It is assumed that the threshold value θs is set in advance. Further, a threshold value θs may be provided for the minimum value of the selection score S for the unlabeled case set U having the same user ID. The selected case set is set as an additional case set V and removed from the unlabeled case set U (formula (3)).

更新したラベルなし事例集合Ｕをラベルなし事例記憶部３０に出力する（式（４））。

The updated unlabeled case set U is output to the unlabeled case storage unit 30 (formula (4)).

アノテーション要求部５５は、選択スコアＳがスコア閾値θｓ以上のラベルなし事例記憶部３０に記憶した各事例を抽出し、当該抽出した事例を外部に提示してアノテーションを要求する（ステップＳ５５）。アノテーションを要求するとは、例えばユーザに事例を提示して、当該事例でユーザが利用した移動手段（注釈：正解ラベル）の入力を要求することである。選択された追加事例集合Ｖをユーザに提示し、アノテーションを要求する。追加事例集合Ｖの提示方法としては、セグメントに対応する事例を地図上に可視化するなどの方法を用いることができる。ユーザによって入力されたアノテーションを正解ラベルとして採用する。

The annotation request unit 55 extracts each case stored in the unlabeled case storage unit 30 with the selection score S equal to or greater than the score threshold θs, presents the extracted case to the outside, and requests an annotation (step S55). Requesting an annotation means, for example, presenting a case to the user and requesting input of moving means (annotation: correct answer label) used by the user in the case. The selected additional case set V is presented to the user and an annotation is requested. As a method for presenting the additional case set V, a method of visualizing a case corresponding to a segment on a map can be used. The annotation entered by the user is adopted as the correct answer label.

ラベルなし事例記憶更新部５６は、入力されたアノテーションをラベルとした抽出した事例を、ラベルなし事例記憶部３０から削除すると共に、抽出した事例を訓練事例記憶部４０に追加する（ステップＳ５６）。追加する事例は、選択スコアＳを用いてラベルなし事例記憶部３０の事例集合Ｕの中から追加学習に用いる事例を選択する。 The unlabeled case storage update unit 56 deletes the extracted case using the input annotation as a label from the unlabeled case storage unit 30 and adds the extracted case to the training case storage unit 40 (step S56). For the case to be added, a case to be used for additional learning is selected from the case set U of the unlabeled case storage unit 30 using the selection score S.

予測モデル再計算部５７は、訓練事例記憶部４０に記憶した全ての事例（訓練事例集合Ｔ）から、移動手段を予測する予測モデルＣを再計算して予測モデル出力部５３に出力する（ステップＳ５７）。予測モデル再計算部５７は、新たに設定された訓練事例集合Ｔを用いて予測モデルＣの生成を行う。ここで予測モデルＣを再計算する方法は、上記のステップＳ５２と同じアルゴリズムを用いる。 The prediction model recalculation unit 57 recalculates the prediction model C that predicts the moving means from all cases (training case set T) stored in the training case storage unit 40 and outputs the prediction model C to the prediction model output unit 53 (step). S57). The prediction model recalculation unit 57 generates a prediction model C using the newly set training case set T. Here, a method for recalculating the prediction model C uses the same algorithm as in step S52.

予測モデル出力部５３は、ラベルなし事例記憶部３０に記憶した事例の数が閾値θ以下になるまで上記のステップＳ５３〜ステップＳ５７の処理を繰り返す。したがって、ラベルなし事例記憶部３０に記憶した事例の数が、所定の数より少なってから予測モデルＣが出力される。つまり、所定の数より多い訓練事例から生成した予測モデルＣが出力されることになる。 The prediction model output unit 53 repeats the processes of steps S53 to S57 until the number of cases stored in the unlabeled case storage unit 30 becomes equal to or less than the threshold θ. Therefore, the prediction model C is output after the number of cases stored in the unlabeled case storage unit 30 is less than the predetermined number. That is, the prediction model C generated from more training cases than the predetermined number is output.

以上説明したように本実施形態の移動手段推定装置１００によれば、少量の正解ラベルが与えられている状況において、ラベルが与えられていないラベルなし事例集合から予測モデルの性能を向上させる観点で事例を選択することが可能になる。そして、その選択したラベルなし事例に対してアノテーションを要求し正解ラベルを追加することで予測モデルの性能向上が可能となる。その結果、ユーザの移動手段の推定精度を向上させることができる。 As described above, according to the moving unit estimation apparatus 100 of the present embodiment, in a situation where a small amount of correct answer labels are given, from the viewpoint of improving the performance of the prediction model from the unlabeled case set to which no labels are given. A case can be selected. The prediction model performance can be improved by requesting an annotation for the selected unlabeled case and adding a correct label. As a result, the estimation accuracy of the user's moving means can be improved.

これらの効果を奏する本実施形態の作用をまとめると次のようになる。移動手段推定装置１００の予測モデル生成機能部５０が、現在の訓練事例に含まれていない事例に対して、現在の予測モデルの確からしさの程度を表す確信度Ｆと、時空間頻度記憶部２０を用いて時空間の観点で予測モデルの構築に既に同じ事例が利用されたかという観点を表す地理スコアを計算し、これらの２つの観点を用いることで予測モデルの性能が向上する程度を表す選択スコアを計算する。この選択スコアは、移動手段を推定する予測モデルの性能が向上する程度を表す。 The actions of the present embodiment exhibiting these effects are summarized as follows. The prediction model generation function unit 50 of the moving means estimation apparatus 100 has a certainty factor F representing the degree of certainty of the current prediction model for a case not included in the current training case, and a spatio-temporal frequency storage unit 20. Use this to calculate a geographic score that represents whether the same case has already been used to build a prediction model from a spatio-temporal perspective, and use these two perspectives to indicate the degree to which the performance of the prediction model is improved Calculate the score. This selection score represents the degree to which the performance of the prediction model for estimating the moving means is improved.

予測モデル生成機能部５０のアノテーション要求部５５は、予測モデルの性能が向上する程度の高い事例をユーザに提示してアノテーションを要求する。そして、ラベルなし事例記憶更新部５６が、入力されたアノテーションを正解ラベルとした事例を訓練事例記憶部４０に追加する。この訓練事例が追加される処理は、ラベルなし事例の数が所定数より少なくなるまで繰り返される。予測モデル再計算部５７は、このように追加された訓練事例集合から予測モデルを再計算する。したがって、移動手段を推定する予測モデルの予測精度を向上させることができる。 The annotation request unit 55 of the prediction model generation function unit 50 requests an annotation by presenting a user with a high degree of improvement in the performance of the prediction model. Then, the unlabeled case storage update unit 56 adds a case in which the input annotation is a correct label to the training case storage unit 40. This process of adding training cases is repeated until the number of unlabeled cases is less than a predetermined number. The prediction model recalculation unit 57 recalculates the prediction model from the training case set added in this way. Therefore, the prediction accuracy of the prediction model for estimating the moving means can be improved.

なお、本実施形態では、アノテーションをユーザに対して要求する例で説明を行ったが、正解ラベルが例えば行動ログに記憶されていれば、その行動ログを参照して正解ラベルを自動的に取得するように構成してもよい。このように、本発明は、その要旨の範囲内で変形が可能である。 In this embodiment, an example is described in which an annotation is requested from the user. However, if the correct answer label is stored in, for example, the action log, the correct answer label is automatically acquired with reference to the action log. You may comprise. Thus, the present invention can be modified within the scope of the gist thereof.

上記装置における処理部をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理部がコンピュータ上で実現される。 When the processing unit in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing unit in each device is realized on the computer.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としても良い。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of the server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしても良い。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

１０：時空間頻度計算部
２０：時空間頻度記憶部
３０：ラベルなし事例記憶部
４０：訓練事例記憶部
５０：予測モデル生成機能部
５２：予測モデル生成部
５３：予測モデル出力部
５４：選択スコア計算部
５５：アノテーション要求部
５６：ラベルなし事例記憶更新部
５７：予測モデル再計算部
６０：事例生成機能部
６５：セグメント記憶部
７０：セグメント抽出機能部
８０：ＧＰＳ履歴情報記憶部
８５：正解アノテーション記憶部
１００：移動手段推定装置 10: Spatiotemporal frequency calculation unit 20: Spatiotemporal frequency storage unit 30: Unlabeled case storage unit 40: Training case storage unit 50: Prediction model generation function unit 52: Prediction model generation unit 53: Prediction model output unit 54: Selection score Calculation unit 55: Annotation request unit 56: Unlabeled case storage update unit 57: Prediction model recalculation unit 60: Case generation function unit 65: Segment storage unit 70: Segment extraction function unit 80: GPS history information storage unit 85: Correct answer annotation Storage unit 100: moving means estimation device

Claims

A moving means estimating device for estimating a moving means based on GPS history information in which a user's position is recorded by GPS,
Spatio-temporal frequency storage unit that stores a user ID for identifying a user, a mesh ID representing a certain area on the map, a time zone within a day, and a frequency at which the user has entered the certain area during the time zone When,
A training case storage unit that stores a correct answer label that correctly represents a case of the moving means, and a feature vector of the case obtained from the GPS history information;
An unlabeled case storage unit that stores feature vectors of the case that are not given the correct answer label;
A prediction model for predicting the moving means is generated from all the cases stored in the training case storage unit, and when the number of cases stored in the unlabeled case storage unit is equal to or less than a threshold, the prediction model is When the number of cases stored in the unlabeled case storage unit is greater than the threshold, the reliability of the prediction model is determined when the correct label is assigned to the case that is not given the correct label. A selection score representing the degree of improvement is obtained using data stored in the spatio-temporal frequency storage unit, the case where the selection score is larger than the score threshold is extracted from the unlabeled case storage unit, and the extracted case Is added to the training case storage unit, and the extracted case with the input annotation as the correct answer label is added to the training case storage unit. A prediction model generation function that repeats the process of recalculating the prediction model for predicting the moving means using all the stored cases until the number of cases stored in the unlabeled case storage unit is equal to or less than the threshold value A moving means estimation device.

In the movement means estimation apparatus according to claim 1,
The prediction model generation function unit
A prediction model generation unit that generates a prediction model by performing multi-class classification using all the cases stored in the training case storage unit;
When the number of cases stored in the unlabeled case storage unit is less than or equal to a threshold, the prediction model is output to the outside, and when the number of cases is greater than the threshold, the prediction model is output to a selection score calculation unit. A prediction model output unit,
Predicting the moving means for each case stored in the unlabeled case storage unit using the prediction model, a certainty factor representing the degree of certainty of the prediction, and the mesh stored in the spatio-temporal frequency storage unit Selection score calculation that obtains a geographic score representing the degree of spatiotemporal rarity from the ID and the time zone, and calculates a selection score that represents the degree of improvement in the performance of the prediction model from the certainty factor and the geographic score And
Extracting each case stored in the unlabeled case storage unit having the selection score equal to or higher than a score threshold, and presenting the extracted case to the outside and requesting an annotation;
The extracted case with the input annotation as a correct label is deleted from the unlabeled case storage unit, and the unlabeled case storage update unit adds the extracted case to the training case storage unit;
And a prediction model recalculation unit that recalculates a prediction model for predicting the moving means from all the cases stored in the training case storage unit and outputs the prediction model to the prediction model generation unit. Means estimation device.

In the movement means estimation apparatus according to claim 2,
The geographic score is calculated by the following formula:

Here, G is the geographic score, c (u, m, t) is the frequency of user u, mesh ID _m and time zone t in the spatio-temporal frequency storage unit, and c (u, m) is the user u and mesh ID _m . Frequency, c (u, t) is frequency of user u, time zone t, c (m, t) is mesh ID _m , frequency of time zone t, c (u) is frequency of user u, c (m) is The frequency of the mesh ID _m , c (t) represents the frequency of the time zone t, λ is a weighting factor that becomes λ ₁ +... + Λ ₇ = 1,
The selection score is calculated by the following formula:

Here, S is a selection score, F is the certainty factor, and α is a weight parameter.

An operation method of a moving means estimating device for estimating a moving means from GPS history information in which a user's position is recorded by GPS,
The prediction model generation function unit of the moving unit estimation device generates a prediction model for predicting the moving unit from all the cases stored in the training case storage unit of the moving unit estimation device, and the label of the moving unit estimation device When the number of cases stored in the none case storage unit is less than or equal to a threshold, the prediction model is output, and when the number of cases stored in the unlabeled case storage unit is greater than the threshold, the moving means A selection score representing the degree of improvement in reliability of the prediction model when the correct label is assigned to the case that is not given a correct label that correctly represents the case is stored in the spatio-temporal frequency storage unit of the moving means estimating device. Using the stored data, find the case where the selection score is greater than the score threshold from the unlabeled case storage unit, and present the extracted case to the outside Requesting an annotation, adding the extracted case with the input annotation as the correct answer label to the training case storage unit, and using all the cases stored in the training case storage unit, the moving means The operation method of the moving means estimating apparatus is characterized in that the process of recalculating the prediction model for predicting is repeated until the number of cases stored in the unlabeled case storage unit becomes equal to or less than the threshold value.

The program for functioning a computer as a moving means estimation apparatus in any one of Claims 1 thru | or 3.