JP6082085B1

JP6082085B1 - Genre estimation server, genre estimation method, and genre estimation program

Info

Publication number: JP6082085B1
Application number: JP2015220142A
Authority: JP
Inventors: 由佳東島; 征男小川; 平野　智也; 智也平野
Original assignee: Nippon Telegraph and Telephone West Corp
Current assignee: Nippon Telegraph and Telephone West Corp
Priority date: 2015-11-10
Filing date: 2015-11-10
Publication date: 2017-02-15
Anticipated expiration: 2035-11-10
Also published as: JP2017091194A

Abstract

【課題】対応データベースに準備されていない未知のスポットに対するジャンル付与を効率的に実施することができるジャンル推定サーバ、ジャンル推定方法、およびジャンル推定プログラムを提供する。【解決手段】ジャンル推定サーバ１０は、ユーザの滞在スポット情報に対応するジャンルを推定するジャンル推定サーバであって、ユーザの滞在スポット情報を取得するスポット情報取得部１３と、スポット情報取得部１３により取得されたスポット情報に含まれるスポット名の接尾語または接頭語の特徴、およびスポット名のスポットを説明するインターネット上の説明文を組み合わせてジャンルを分類するジャンル分類部１４とを備える。【選択図】図３A genre estimation server, a genre estimation method, and a genre estimation program capable of efficiently performing genre assignment for unknown spots not prepared in a correspondence database. A genre estimation server 10 is a genre estimation server that estimates a genre corresponding to a user's stay spot information. The genre estimation server 10 includes a spot information acquisition unit 13 that acquires user's stay spot information, The genre classification unit 14 classifies the genre by combining the suffix name prefix or prefix feature included in the acquired spot information and the description on the Internet describing the spot name spot. [Selection] Figure 3

Description

本発明は、ユーザの滞在スポット情報に対応するジャンルを推定するジャンル推定サーバ、ジャンル推定方法、およびジャンル推定プログラムに関する。 The present invention relates to a genre estimation server, a genre estimation method, and a genre estimation program for estimating a genre corresponding to user stay spot information.

近年、ユーザの位置情報を分析し、インフラ整備・マーケティングに活用することが期待されている。例えば、スマートフォンなどに付属するＧＰＳセンサを用いてユーザの位置情報（緯度・経度）を取得し、それを住所に変換してユーザの滞在スポットを特定する。滞在スポット情報を定期的に取得することで、ユーザの１日の行動を容易に推測することができる。 In recent years, it is expected to analyze user location information and use it for infrastructure development and marketing. For example, the user's location information (latitude / longitude) is acquired using a GPS sensor attached to a smartphone or the like, and converted into an address to identify the stay spot of the user. By regularly acquiring the stay spot information, the user's daily behavior can be easily estimated.

一方、詳細な滞在スポット情報の取得はユーザのプライバシーを侵害する危険性がある。そこで、図２１に示すように、滞在スポットを「駅」「商店街」「神社」などの「ジャンル」に抽象化することで、ユーザのプライバシーを保護しながら行動分析を実施することが検討されている（特許文献１、２）。滞在スポットをジャンルに抽象化するためには、スポットとジャンルの対応データベースを利用することが一般的である。このような対応データベースを利用すれば、図２２に示すように、「○○カントリークラブ」などの滞在スポットを「ゴルフ場」などのジャンルに抽象化することができる。 On the other hand, acquiring detailed stay spot information may infringe on user privacy. Therefore, as shown in FIG. 21, it is considered to perform behavior analysis while protecting user privacy by abstracting stay spots into “genres” such as “station”, “shopping street”, and “shrine”. (Patent Documents 1 and 2). In order to abstract stay spots into genres, it is common to use a correspondence database of spots and genres. If such a correspondence database is used, as shown in FIG. 22, a stay spot such as “XX country club” can be abstracted into a genre such as “golf course”.

特開２０１２−１６４２２７号公報JP 2012-164227 A 特開２０１１−２２１６６５号公報JP2011-221665A

スポットとジャンルの対応データベースの構築は、現存するスポットの数だけ準備できていることが望ましい。しかし、対応データベースの準備作業は煩雑であるため、ジャンルに抽象化できるスポットに偏りが生じる。スポットとジャンルの対応付けは基本的には手作業であり、量が多いほど煩雑となる。そのため、図２３（ａ）に示すように、対応データベース上で準備できていないスポットについては事前にジャンル付与できず、未知のスポットＵ１となる。 It is desirable that the number of existing spots be prepared for the construction of the correspondence database of spots and genres. However, since the preparation work of the correspondence database is complicated, the spots that can be abstracted into genres are biased. The association between the spot and the genre is basically manual work, and the more the amount, the more complicated it becomes. Therefore, as shown in FIG. 23 (a), a spot that is not prepared on the correspondence database cannot be given a genre in advance and becomes an unknown spot U1.

対応データベースの準備作業を低減するために、インターネット上からスポットとジャンルの対応情報を収集する技術も検討されている（特許文献２）。例えば、図２３（ｂ）に示すように、インターネット上から対応情報相当の情報（施設名「××体育館」、カテゴリ「体育館」）を抽出することができる場合もある。しかし、インターネット上に対応情報相当の情報が無いスポットについては事前にジャンル付与できず、未知のスポットＵ２・Ｕ３となる。すなわち、インターネット上からスポットとジャンルの対応情報を収集する技術には、収集可能な個数に限界がある。 In order to reduce the preparation work of the correspondence database, a technique for collecting correspondence information between spots and genres from the Internet has been studied (Patent Document 2). For example, as shown in FIG. 23B, information corresponding to the corresponding information (facility name “xx gymnasium”, category “gymnasium”) may be extracted from the Internet. However, a genre cannot be assigned in advance to spots that do not have information corresponding to the corresponding information on the Internet, and become unknown spots U2 and U3. That is, there is a limit to the number that can be collected in the technology for collecting the correspondence information between spots and genres from the Internet.

本発明は、上述した従来の技術に鑑み、対応データベースに準備されていない未知のスポットに対するジャンル付与を効率的に実施することができるジャンル推定サーバ、ジャンル推定方法、およびジャンル推定プログラムを提供することを目的とする。 The present invention provides a genre estimation server, a genre estimation method, and a genre estimation program that can efficiently perform genre assignment for an unknown spot that is not prepared in a correspondence database in view of the above-described conventional technology. With the goal.

上記目的を達成するため、第１の態様に係る発明は、ユーザの滞在スポット情報に対応するジャンルを推定するジャンル推定サーバであって、ユーザの滞在スポット情報を取得するスポット情報取得部と、前記スポット情報取得部により取得されたスポット情報に含まれるスポット名の接尾語または接頭語の特徴、および前記スポット名のスポットを説明するインターネット上の説明文を組み合わせてジャンルを分類するジャンル分類部とを備え、前記ジャンル分類部は、前記スポット名の接尾語または接頭語の特徴からジャンルを判定するスポット名判定処理を実施し、前記スポット名判定処理において２つ以上のジャンルに合致した場合、その合致した２つ以上のジャンルのみに候補を絞って、前記インターネット上の説明文を解析してジャンルを判定する説明文判定処理を実施することを要旨とする。 In order to achieve the above object, the invention according to the first aspect is a genre estimation server for estimating a genre corresponding to a user's stay spot information, the spot information acquisition unit for acquiring the user's stay spot information, A genre classification unit that classifies a genre by combining a suffix or prefix feature of a spot name included in the spot information acquired by the spot information acquisition unit and a description on the Internet that describes the spot of the spot name. The genre classification unit performs a spot name determination process for determining a genre based on a suffix or prefix feature of the spot name, and matches two or more genres in the spot name determination process. Analyzing the description on the Internet, focusing on only two or more genres And summarized in that to implement the descriptions determination process determines genre.

第２の態様に係る発明は、第１の態様に係る発明において、前記スポット情報取得部が、前記スポット名判定処理に先立って、前記インターネット上の説明文中に頻出する単語である頻出語をカウントし、前記ジャンル分類部が、前記スポット名判定処理を実施した後、前記スポット情報取得部によりカウントされた頻出語を用いて前記説明文判定処理を実施することを要旨とする。 The invention according to a second aspect is the invention according to the first aspect, wherein the spot information acquisition unit counts frequent words that are frequent words in the description on the Internet prior to the spot name determination process. Then, the genre classification unit performs the explanatory sentence determination process using the frequent words counted by the spot information acquisition unit after performing the spot name determination process.

第３の態様に係る発明は、ユーザの滞在スポット情報に対応するジャンルをジャンル推定サーバが推定するジャンル推定方法であって、コンピュータが、ユーザの滞在スポット情報を取得するスポット情報取得ステップと、前記スポット情報取得ステップで取得されたスポット情報に含まれるスポット名の接尾語または接頭語の特徴、および前記スポット名のスポットを説明するインターネット上の説明文を組み合わせてジャンルを分類するジャンル分類ステップとを実行し、前記ジャンル分類ステップでは、前記スポット名の接尾語または接頭語の特徴からジャンルを判定するスポット名判定処理を実施し、前記スポット名判定処理において２つ以上のジャンルに合致した場合、その合致した２つ以上のジャンルのみに候補を絞って、前記インターネット上の説明文を解析してジャンルを判定する説明文判定処理を実施することを要旨とする。 The invention according to a third aspect is a genre estimation method in which a genre estimation server estimates a genre corresponding to a user's stay spot information, wherein the computer acquires a user's stay spot information, A genre classification step for classifying a genre by combining features of the suffix or prefix of the spot name included in the spot information acquired in the spot information acquisition step, and a description on the Internet that describes the spot of the spot name. In the genre classification step, a spot name determination process for determining a genre from the suffix or prefix feature of the spot name is performed, and when the spot name determination process matches two or more genres, Narrow down candidates to only two or more genres that match, And summarized in that to implement the descriptions determination process determines genres by analyzing the description of the Internet.

第４の態様に係る発明は、第１または第２に係る発明の各処理部としてコンピュータを機能させることを特徴とするジャンル推定プログラムであることを要旨とする。 The invention according to a fourth aspect is summarized in that a genre estimation program for causing a computer to function as each processing unit of the invention according to the first or second.

本発明によれば、対応データベースに準備されていない未知のスポットに対するジャンル付与を効率的に実施することができるジャンル推定サーバ、ジャンル推定方法、およびジャンル推定プログラムを提供することが可能である。 According to the present invention, it is possible to provide a genre estimation server, a genre estimation method, and a genre estimation program that can efficiently perform genre assignment for an unknown spot that is not prepared in the correspondence database.

本実施の形態におけるジャンル推定システムの構成図である。It is a block diagram of the genre estimation system in this Embodiment. 本実施の形態におけるジャンル推定方法の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the genre estimation method in this Embodiment. 本実施の形態におけるジャンル推定サーバの機能ブロック図である。It is a functional block diagram of the genre estimation server in this Embodiment. 本実施の形態におけるユーザ緯度経度情報の構成図である。It is a block diagram of the user latitude longitude information in this Embodiment. 本実施の形態におけるスポット情報テーブルの構成図である。It is a block diagram of the spot information table in this Embodiment. 本実施の形態におけるスポット情報キャッシュの構成図である。It is a block diagram of the spot information cache in this Embodiment. 本実施の形態におけるスポット名分類用学習辞書の構成図である。It is a block diagram of the learning dictionary for spot name classification | category in this Embodiment. 本実施の形態における説明文分類用学習データベースの構成図である。It is a block diagram of the learning database for explanatory text classification in this Embodiment. 本実施の形態におけるジャンル化位置情報データベースの構成図である。It is a block diagram of the genreization position information database in this Embodiment. 本実施の形態におけるジャンル推定のシーケンス図である。It is a sequence diagram of the genre estimation in this Embodiment. 本実施の形態におけるスポット情報取得部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the spot information acquisition part in this Embodiment. 本実施の形態におけるスポット情報取得処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the spot information acquisition process in this Embodiment. 本実施の形態におけるジャンル分類部の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the genre classification | category part in this Embodiment. 本実施の形態におけるスポット名判定処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the spot name determination process in this Embodiment. 本実施の形態における説明文判定処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the explanatory note determination process in this Embodiment. 本実施の形態における類似度計算処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the similarity calculation process in this Embodiment. 本実施の形態におけるスポット名判定処理の具体例を説明するための図である。It is a figure for demonstrating the specific example of the spot name determination process in this Embodiment. 本実施の形態における説明文判定処理の具体例を説明するための図である。It is a figure for demonstrating the specific example of the explanatory note determination process in this Embodiment. 本実施の形態における説明文分類用学習データベースを自動作成する方法を説明するための図である。It is a figure for demonstrating the method of automatically producing the learning database for description sentence classification | category in this Embodiment. 先行技術文献と本実施の形態とを比較するための図である。It is a figure for comparing a prior art document and this Embodiment. 背景技術を説明するための図である。It is a figure for demonstrating background art. 背景技術を説明するための図である。It is a figure for demonstrating background art. 背景技術の課題を説明するための図である。It is a figure for demonstrating the subject of background art.

以下、本発明の実施の形態について図面を参照して詳細に説明する。なお、以下の実施の形態は、この発明の技術的思想を具体化するためのジャンル推定システムを例示するものであり、装置の構成やデータの構成等は以下の実施の形態に限定されるものではない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The following embodiments exemplify a genre estimation system for embodying the technical idea of the present invention, and the configuration of the apparatus and the configuration of data are limited to the following embodiments. is not.

（ジャンル推定システム）
図１は、本発明の実施の形態におけるジャンル推定システムの構成図である。このジャンル推定システムは、ユーザの滞在スポット情報に対応するジャンルを推定するシステムであり、例えば、ユーザ行動分析のためにユーザの滞在スポット情報を定期的に収集する場合に利用される。具体的には、図１に示すように、複数の端末１Ａ，１Ｂ，…（以下、一括して「端末１」という。）がインターネット３を介してジャンル推定サーバ１０に接続されている。端末１は、ＧＰＳセンサを有するスマートフォン等である。ジャンル推定サーバ１０は、ユーザの滞在スポット情報に対応するジャンルを推定する装置である。端末１とジャンル推定サーバ１０との間に滞留点抽出などの事前処理を実施するログ収集サーバ２を介在させてもよい。 (Genre estimation system)
FIG. 1 is a configuration diagram of a genre estimation system according to an embodiment of the present invention. This genre estimation system is a system that estimates a genre corresponding to user stay spot information, and is used, for example, when collecting user stay spot information periodically for user behavior analysis. Specifically, as shown in FIG. 1, a plurality of terminals 1A, 1B,... (Hereinafter collectively referred to as “terminal 1”) are connected to the genre estimation server 10 via the Internet 3. The terminal 1 is a smartphone or the like having a GPS sensor. The genre estimation server 10 is a device that estimates a genre corresponding to the user's stay spot information. A log collection server 2 that performs preprocessing such as stay point extraction may be interposed between the terminal 1 and the genre estimation server 10.

ジャンル推定サーバ１０は、インターネット３を介して外部システム４Ａ，４Ｂと接続されている。外部システム４Ａは、緯度経度を住所に変換する住所変換サービスを提供するシステムである。外部システム４Ｂは、Ｗｅｂ検索サービス（検索エンジン）を提供するシステムである。このような住所変換サービスやＷｅｂ検索サービスを提供する機能は、外部システム４Ａ，４Ｂでなくジャンル推定サーバ１０の内部に備えてもよい。 The genre estimation server 10 is connected to external systems 4A and 4B via the Internet 3. The external system 4A is a system that provides an address conversion service for converting latitude and longitude into addresses. The external system 4B is a system that provides a Web search service (search engine). The function of providing such an address conversion service or Web search service may be provided in the genre estimation server 10 instead of the external systems 4A and 4B.

（ジャンル推定方法）
図２は、本発明の実施の形態におけるジャンル推定方法の概要を説明するための図である。すなわち、スポットとジャンルの対応データベースに存在しないスポット（未知のスポット）のジャンルを、接尾語・接頭語の特徴およびインターネット上の説明文を組み合わせて自動推定し、未知のスポットに対するジャンル付与を実現するようになっている。 (Genre estimation method)
FIG. 2 is a diagram for explaining the outline of the genre estimation method according to the embodiment of the present invention. That is, the genre of a spot (unknown spot) that does not exist in the spot-genre correspondence database is automatically estimated by combining the features of the suffix / prefix and the description on the Internet, and genre assignment to the unknown spot is realized. It is like that.

図２（ａ）は、接尾語・接頭語の特徴による推定方法を示している。この図に示すように、各ジャンルの接尾語・接頭語の特徴の辞書を利用し、スポット名からジャンルを推定する。この例では、スポット名の接尾語が「湖」であるため、「湖沼・池」ジャンルと推定している。 FIG. 2A shows an estimation method based on suffix / prefix characteristics. As shown in this figure, a genre is estimated from a spot name by using a suffix / prefix feature dictionary of each genre. In this example, since the suffix of the spot name is “Lake”, it is estimated as “Lake / pond” genre.

図２（ｂ）は、インターネット上の説明文の類似度による推定方法を示している。この図に示すように、接尾語・接頭語の特徴で推定不可能なスポット名について、当該スポットの名称を含むＷｅｂ上の情報からジャンルを推定する。まず、既存のスポット・ジャンル対応データベースを検索キーとして利用して各スポット「ゴルフ場」「神社」「ラーメン」…を表すＷｅｂ上の説明文を収集し、説明文の類似性を判定する文書分類器を構築する。次いで、未知のスポット「ＸＸちゃん」について、Ｗｅｂ上の情報から当該場所を表す説明文を取得する。最後に、このように取得した説明文を文書分類器にかけて類似度計算し、類似度の高いジャンル「ラーメン」を未知のスポット「ＸＸちゃん」に対するジャンルとして推定する。 FIG. 2B shows an estimation method based on the similarity of explanatory texts on the Internet. As shown in this figure, for a spot name that cannot be estimated by the characteristics of the suffix / prefix, the genre is estimated from information on the Web including the name of the spot. First, using the existing spot / genre correspondence database as a search key, collecting explanatory texts on the Web representing each spot “golf course”, “shrine”, “ramen”,... Build a vessel. Next, for the unknown spot “XX-chan”, an explanatory text representing the location is acquired from information on the Web. Finally, the degree of similarity is calculated by applying the explanation sentence obtained in this way to the document classifier, and the genre “Ramen” having a high degree of similarity is estimated as the genre for the unknown spot “XX-chan”.

このように、本発明の実施の形態では、接尾語・接頭語の特徴による推定技術と、インターネット上の説明文の類似度による推定技術とを組み合わせるようにしている。これら２つの推定技術を組み合わせることで、互いのデメリットを相殺することができる。すなわち、○○川、○○海のようなスポット名の接尾語・接頭語の特徴からジャンルを推定する場合、「ＸＸちゃん」など、スポット名に特徴がない語のジャンル推定ができない。一方、スポット名をキーにインターネット上に存在する「スポットを説明する文章」を収集し、文書分類技術を用いてジャンルを推定する場合、「○○川」などのスポット名ではスポットを説明する文章のバリエーションが大きくなりすぎ、文書の特徴を掴みにくく分類を行いにくい。２つの推定技術を組み合わせれば、互いのデメリットを相殺することができるため、ジャンル付与を効率的に実施することが可能である。 As described above, in the embodiment of the present invention, the estimation technique based on the features of the suffix / prefix and the estimation technique based on the similarity of the explanatory text on the Internet are combined. Combining these two estimation techniques can offset each other's disadvantages. That is, when the genre is estimated from the features of the suffix / prefix of the spot name such as XX river and XX sea, the genre cannot be estimated for a word having no feature such as “XX-chan”. On the other hand, when collecting “sentences explaining spots” on the Internet using spot names as keys and estimating the genre using document classification technology, the spot names such as “XX River” will explain the spots. The variations of the document become too large, making it difficult to grasp the characteristics of the document and to perform classification. Combining the two estimation techniques can cancel each other's disadvantages, so that genre assignment can be performed efficiently.

また、位置情報をジャンル情報に抽象化するためには、スポットとジャンルを紐付けるデータベースを事前に準備する必要がある。そこで、本発明の実施の形態では、文書分類技術を応用し、「スポットを説明する文章」を用いてジャンルを推定するようにしている。文書分類技術は、ニュース文章の分類等に利用されているが、位置情報のジャンル推定に適用することで、スポットとジャンルを紐付けるデータベースを作成することなく、スポットのジャンル推定が可能である。ただし、学習用のデータとして、スポットを説明する文章とジャンルを紐付けたデータベースが必要である（後述する）。 In order to abstract position information into genre information, it is necessary to prepare in advance a database that links spots and genres. Therefore, in the embodiment of the present invention, a document classification technique is applied, and a genre is estimated using “sentences that explain a spot”. The document classification technique is used for news sentence classification and the like, but by applying it to position information genre estimation, it is possible to estimate the genre of a spot without creating a database that links the spot and the genre. However, a database in which a sentence explaining a spot and a genre are linked as learning data is necessary (described later).

さらに、本発明の実施の形態では、「スポットを説明する文章」をＷｅｂ上から収集するため、未知のスポットに対しても対応することが可能である。 Furthermore, in the embodiment of the present invention, “sentences explaining the spots” are collected from the Web, so it is possible to deal with unknown spots.

（ジャンル推定サーバ）
図３は、本発明の実施の形態におけるジャンル推定サーバ１０の機能ブロック図である。この図に示すように、ジャンル推定サーバ１０は、スポット情報データベース１１と、ジャンル化位置情報データベース１２と、スポット情報取得部１３と、ジャンル分類部１４と、学習用データベース１５とを備える。 (Genre estimation server)
FIG. 3 is a functional block diagram of the genre estimation server 10 according to the embodiment of the present invention. As shown in this figure, the genre estimation server 10 includes a spot information database 11, a genre position information database 12, a spot information acquisition unit 13, a genre classification unit 14, and a learning database 15.

スポット情報取得部１３は、ユーザの滞在スポット情報などを取得してスポット情報データベース１１に格納する機能部である。スポット情報取得部１３は、インターネット３を介して端末１や外部システム４Ａ，４Ｂと通信可能となっている。 The spot information acquisition unit 13 is a functional unit that acquires user stay spot information and stores the information in the spot information database 11. The spot information acquisition unit 13 can communicate with the terminal 1 and the external systems 4A and 4B via the Internet 3.

ジャンル分類部１４は、スポット情報データベース１１に格納されているスポット情報をジャンルに分類してジャンル化位置情報データベース１２に格納する機能部である。具体的には、スポット情報取得部１３により取得されたスポット情報に含まれるスポット名の接尾語または接頭語の特徴、およびスポット名のスポットを説明するインターネット上の説明文を組み合わせてジャンルを分類する。この分類の際には学習用データベース１５が参照される。以下、スポット名の接尾語または接頭語の特徴からジャンルを判定する機能部を「スポット名判定部１４Ａ」と呼ぶ。また、インターネット上の説明文を解析してジャンルを判定する機能部を「説明文判定部１４Ｂ」と呼ぶ。 The genre classification unit 14 is a functional unit that classifies the spot information stored in the spot information database 11 into genres and stores it in the genreized position information database 12. Specifically, the genre is classified by combining the suffix name prefix or prefix feature included in the spot information acquired by the spot information acquisition unit 13 and the description on the Internet that describes the spot name spot. . In this classification, the learning database 15 is referred to. Hereinafter, the function unit that determines the genre from the feature of the suffix or prefix of the spot name is referred to as “spot name determination unit 14A”. Further, a function unit that analyzes the explanatory text on the Internet and determines the genre is referred to as an “explanatory text determination unit 14B”.

スポット情報データベース１１は、スポット情報を格納するデータベースである。ジャンル化位置情報データベース１２は、ジャンル化位置情報を格納するデータベースである。学習用データベース１５は、学習用データを格納するデータベースである。これらデータベースの詳細は後述する。ここでは、ジャンル推定サーバ１０上にジャンル化位置情報データベース１２を実装しているが、ジャンル化位置情報データベース１２は別のサーバ上に実装してもよい。 The spot information database 11 is a database that stores spot information. The genre position information database 12 is a database that stores genre position information. The learning database 15 is a database that stores learning data. Details of these databases will be described later. Here, the genreized position information database 12 is mounted on the genre estimation server 10, but the genreized position information database 12 may be mounted on another server.

（データベース）
図４は、ユーザ緯度経度情報３Ａの構成図である。ユーザ緯度経度情報３Ａは、インターネット３を介してスポット情報取得部１３に取得されるユーザ位置情報である。この図に示すように、ユーザ緯度経度情報３Ａは、時刻、ユーザＩＤ、緯度、経度などを対応付けたものである。ユーザＩＤとしては端末固有の情報（ＵＵＩＤ）を利用することができる。ユーザ緯度経度情報３ＡはＧＰＳデータでもよいし、ログ収集サーバ２等で計算された計算済みのデータでもよい。 (Database)
FIG. 4 is a configuration diagram of the user latitude / longitude information 3A. The user latitude / longitude information 3 A is user position information acquired by the spot information acquisition unit 13 via the Internet 3. As shown in this figure, the user latitude / longitude information 3A associates time, user ID, latitude, longitude, and the like. As the user ID, terminal-specific information (UUID) can be used. The user latitude / longitude information 3A may be GPS data or calculated data calculated by the log collection server 2 or the like.

図５は、スポット情報テーブル１１Ａの構成図である。スポット情報テーブル１１Ａは、スポット情報を格納するテーブルであり、スポット情報データベース１１に含まれる。この図に示すように、スポット情報テーブル１１Ａは、項番、時刻、ユーザＩＤ、緯度、経度、スポット名、住所、説明文、頻出語などを対応付けたものである。項番は便宜上の通し番号である。 FIG. 5 is a configuration diagram of the spot information table 11A. The spot information table 11 A is a table that stores spot information, and is included in the spot information database 11. As shown in this figure, the spot information table 11A associates item numbers, times, user IDs, latitudes, longitudes, spot names, addresses, explanations, frequent words, and the like. The item number is a serial number for convenience.

図６は、スポット情報キャッシュ１１Ｂの構成図である。スポット情報キャッシュ１１Ｂは、スポット情報を格納するキャッシュであり、スポット情報データベース１１に含まれる。この図に示すように、スポット情報キャッシュ１１Ｂは、緯度、経度、スポット名、住所、説明文、頻出語などを対応付けたものである。頻出語は、説明文に頻出する単語である。 FIG. 6 is a configuration diagram of the spot information cache 11B. The spot information cache 11 B is a cache that stores spot information, and is included in the spot information database 11. As shown in this figure, the spot information cache 11B associates latitude, longitude, spot name, address, description, frequent words, and the like. Frequent words are words that frequently appear in the explanation.

図７は、スポット名分類用学習辞書１５Ａの構成図である。スポット名分類用学習辞書１５Ａは、スポット名を分類する際に用いられる学習辞書であり、学習用データベース１５に含まれる。この図に示すように、スポット名分類用学習辞書１５Ａは、接頭語・設尾語パターン、ジャンルなどを対応付けたものである。スポット名分類用学習辞書１５Ａは手動で作成してもよいし、「日本語語彙体系」のような市販のデータベースを利用して作成してもよい。 FIG. 7 is a configuration diagram of the spot name classification learning dictionary 15A. The spot name classification learning dictionary 15 A is a learning dictionary used when classifying spot names, and is included in the learning database 15. As shown in this figure, the spot name classification learning dictionary 15A associates prefix / suffix patterns, genres, and the like. The spot name classification learning dictionary 15A may be created manually, or may be created using a commercially available database such as “Japanese vocabulary system”.

図８は、説明文分類用学習データベース１５Ｂの構成図である。説明文分類用学習データベース１５Ｂは、説明文を分類する際に用いられる学習データベースであり、学習用データベース１５に含まれる。この図に示すように、説明文分類用学習データベース１５Ｂは、ジャンル名、特徴語などを対応付けたものである。特徴語は、そのジャンルの特徴的な単語である。説明文分類用学習データベース１５Ｂは手動で作成してもよいし、各ジャンルに合致する説明文を手動で収集した後、形態素解析を行い、特徴語を自動で抽出することで作成してもよい。形態素解析は、言語上意味を持つ単位に文章を分割し、分割したそれぞれの単位において品詞を判別する自然言語処理の技術であり、動詞、名詞などの抽出に利用される。 FIG. 8 is a configuration diagram of the explanatory sentence classification learning database 15B. The explanatory sentence classification learning database 15 B is a learning database used when classifying explanatory sentences, and is included in the learning database 15. As shown in this figure, the explanatory sentence classification learning database 15B is an association of genre names, feature words, and the like. A characteristic word is a characteristic word of the genre. The explanatory sentence classification learning database 15B may be created manually, or may be created by manually collecting explanatory sentences matching each genre, performing morphological analysis, and automatically extracting feature words. . Morphological analysis is a natural language processing technique that divides a sentence into linguistically meaningful units and discriminates the part of speech in each divided unit, and is used to extract verbs, nouns, and the like.

図９は、ジャンル化位置情報データベース１２の構成図である。ジャンル化位置情報データベース１２は、ジャンル化位置情報を格納するデータベースであり、時刻、ユーザＩＤ、ジャンルなどを対応付けたものである。 FIG. 9 is a configuration diagram of the genreization position information database 12. The genreized position information database 12 is a database that stores genreized position information, and associates time, user ID, genre, and the like.

（シーケンス）
図１０は、本発明の実施の形態におけるジャンル推定のシーケンス図である。以下、図１０を用いてジャンル推定サーバ１０の構成をその動作とともに説明する。 (sequence)
FIG. 10 is a sequence diagram of genre estimation in the embodiment of the present invention. Hereinafter, the configuration of the genre estimation server 10 will be described together with its operation with reference to FIG.

まず、スポット情報取得部１３は、ユーザ位置情報から緯度・経度を抽出し、外部システム（住所変換サービス）４Ａに送信する。外部システム４Ａは、緯度・経度をスポット名・住所に変換し、スポット情報取得部１３に返す。 First, the spot information acquisition unit 13 extracts latitude / longitude from the user position information and transmits it to the external system (address conversion service) 4A. The external system 4A converts the latitude / longitude into a spot name / address and returns the spot name / address to the spot information acquisition unit 13.

次いで、スポット情報取得部１３は、外部システム４Ａから受信したスポット名・住所を外部システム（Ｗｅｂ検索サービス）４Ｂに送信する。外部システム４Ｂは、スポット情報取得部１３から受信したスポット名・住所に合うＷｅｂ上の情報を収集し、スポット情報取得部１３に返す。 Next, the spot information acquisition unit 13 transmits the spot name / address received from the external system 4A to the external system (Web search service) 4B. The external system 4B collects information on the Web that matches the spot name / address received from the spot information acquisition unit 13 and returns the information to the spot information acquisition unit 13.

次いで、スポット情報取得部１３は、外部システム４Ｂから受信したＷｅｂ上の情報に頻出する単語（頻出語）をカウントする。そして、緯度・経度、スポット名・住所、頻出語などからなるスポット情報をスポット情報データベース１１に格納する。 Next, the spot information acquisition unit 13 counts words (frequent words) that frequently appear in information on the Web received from the external system 4B. Then, spot information composed of latitude / longitude, spot name / address, frequent words, etc. is stored in the spot information database 11.

次いで、スポット名判定部１４Ａは、スポット情報テーブル１１Ａからスポット情報を読み出し、スポット名判定処理を実施する。ここで、スポット名判定処理が正常終了した場合、その結果に基づいてジャンル化位置情報を生成し、ジャンル化位置情報データベース１２に格納する。一方、スポット名判定処理がエラー終了した場合、その旨を説明文判定部１４Ｂに通知する。説明文判定部１４Ｂは、説明文判定処理を実施し、その結果に基づいてジャンル化位置情報を生成し、ジャンル化位置情報データベース１２に格納する。 Next, the spot name determination unit 14A reads spot information from the spot information table 11A and performs spot name determination processing. Here, when the spot name determination process ends normally, genreized position information is generated based on the result and stored in the genreized position information database 12. On the other hand, when the spot name determination process ends in error, the fact is notified to the explanatory note determination unit 14B. The explanatory note determination unit 14B performs an explanatory sentence determination process, generates genreized position information based on the result, and stores it in the genreized position information database 12.

（スポット情報取得部の動作）
図１１は、スポット情報取得部１３の動作を示すフローチャートである。まず、スポット情報取得部１３は、ユーザ位置情報を受信すると、スポット情報取得処理を実施した後（Ｓ１→Ｓ２）、頻出語カウント処理を実施する（Ｓ３→Ｓ４）。具体的には、形態素解析器（Ｍｅｃａｂ等）を利用して説明文中の頻出語をカウントし（Ｓ３）、その頻出語をスポット情報データベース１１のスポット情報テーブル１１Ａに格納する（Ｓ４）。最後に、ジャンル分類部１４（スポット名判定部１４Ａ）にスポット情報テーブル１１Ａの項番を通知する（Ｓ５）。 (Operation of spot information acquisition unit)
FIG. 11 is a flowchart showing the operation of the spot information acquisition unit 13. First, when receiving the user position information, the spot information acquisition unit 13 performs a spot information acquisition process (S1 → S2), and then performs a frequent word count process (S3 → S4). Specifically, the frequent words in the explanatory text are counted using a morphological analyzer (such as Mecab) (S3), and the frequent words are stored in the spot information table 11A of the spot information database 11 (S4). Finally, the item number of the spot information table 11A is notified to the genre classification unit 14 (spot name determination unit 14A) (S5).

図１２は、スポット情報取得処理（図１１のＳ２）の詳細を示すフローチャートである。まず、スポット情報取得部１３は、ユーザ位置情報が緯度・経度であるかどうかを判定する（Ｓ１１）。Ｓ１１においてユーザ位置情報が緯度・経度であると判定した場合、その緯度・経度がスポット情報キャッシュ１１Ｂに含まれるかどうかを判定する（Ｓ１２）。Ｓ１２において緯度・経度がスポット情報キャッシュ１１Ｂに含まれないと判定した場合、緯度・経度を外部システム（住所変換サービス）４Ａに送信し、住所・スポット名を取得する（Ｓ１３）。Ｓ１３の取得処理がエラーでない場合、住所・スポット名がスポット情報キャッシュ１１Ｂに含まれるかどうかを判定する（Ｓ１４→Ｓ１５）。Ｓ１５において住所・スポット名がスポット情報キャッシュ１１Ｂに含まれないと判定した場合、住所・スポット名を送信し、説明文をＷｅｂ検索し、説明文を受信する（Ｓ１６→Ｓ１７）。 FIG. 12 is a flowchart showing details of the spot information acquisition process (S2 in FIG. 11). First, the spot information acquisition unit 13 determines whether or not the user position information is latitude / longitude (S11). When it is determined in S11 that the user position information is latitude / longitude, it is determined whether the latitude / longitude is included in the spot information cache 11B (S12). When it is determined in S12 that the latitude / longitude is not included in the spot information cache 11B, the latitude / longitude is transmitted to the external system (address conversion service) 4A to acquire the address / spot name (S13). If the acquisition process of S13 is not an error, it is determined whether the address / spot name is included in the spot information cache 11B (S14 → S15). If it is determined in S15 that the address / spot name is not included in the spot information cache 11B, the address / spot name is transmitted, the explanatory text is searched for on the web, and the explanatory text is received (S16 → S17).

一方、Ｓ１２において緯度・経度がスポット情報キャッシュ１１Ｂに含まれると判定した場合、Ｓ１５において住所・スポット名がスポット情報キャッシュ１１Ｂに含まれると判定した場合、およびＳ１７の受信処理がエラーでない場合、Ｓ１９に移行する。すなわち、ユーザＩＤ，時刻，緯度，経度，住所，スポット名，説明文をスポット情報データベース１１のスポット情報テーブル１１Ａ、スポット情報キャッシュ１１Ｂに格納する（Ｓ１９）。また、Ｓ１１においてユーザ位置情報が緯度・経度でないと判定した場合、Ｓ１３の取得処理がエラーである場合、およびＳ１７の受信処理がエラーである場合、エラーを返却する（Ｓ２０）。 On the other hand, if it is determined in S12 that the latitude / longitude is included in the spot information cache 11B, if it is determined in S15 that the address / spot name is included in the spot information cache 11B, and if the reception process in S17 is not an error, S19 Migrate to That is, the user ID, time, latitude, longitude, address, spot name, and description are stored in the spot information table 11A and spot information cache 11B of the spot information database 11 (S19). If it is determined in S11 that the user position information is not latitude / longitude, if the acquisition process of S13 is an error, and if the reception process of S17 is an error, an error is returned (S20).

（ジャンル分類部の動作）
図１３は、ジャンル分類部１４の動作を示すフローチャートである。まず、ジャンル分類部１４は、スポット情報テーブル１１Ａから判定すべき項番のデータを読み込み、スポット名判定処理を実施する（Ｓ２１→Ｓ２２）。次いで、スポット名判定処理が正常終了した場合、ジャンル化位置情報をジャンル化位置情報データベース１２に格納する（Ｓ２３）。一方、スポット名判定処理がエラー終了した場合、説明文判定処理を実施する（Ｓ２４）。次いで、説明文判定処理が正常終了した場合、ジャンル化位置情報をジャンル化位置情報データベース１２に格納する（Ｓ２３）。一方、説明文判定処理がエラー終了した場合、「ジャンル不明」としたジャンル化位置情報をジャンル化位置情報データベース１２に格納する（Ｓ２５）。 (Operation of genre classification unit)
FIG. 13 is a flowchart showing the operation of the genre classification unit 14. First, the genre classification unit 14 reads item number data to be determined from the spot information table 11A, and performs spot name determination processing (S21 → S22). Next, when the spot name determination process is normally completed, the genreized position information is stored in the genreized position information database 12 (S23). On the other hand, when the spot name determination process ends in error, an explanatory sentence determination process is performed (S24). Next, when the explanatory sentence determination process ends normally, the genreized position information is stored in the genreized position information database 12 (S23). On the other hand, when the description sentence determination process ends in error, the genreized position information set as “genre unknown” is stored in the genreized position information database 12 (S25).

図１４は、スポット名判定処理（図１３のＳ２２）の詳細を示すフローチャートである。まず、スポット名判定部１４Ａは、スポット名分類用学習辞書１５Ａを参照し、スポット名に紐付くジャンルを特定する（Ｓ３１）。ここで、１つのジャンルにのみ合致する場合、特定されたジャンルを付与したジャンル化位置情報を返却する（Ｓ３２）。一方、２つ以上のジャンルに合致または辞書のいずれにも合致しない場合、エラーを返却する（Ｓ３３）。 FIG. 14 is a flowchart showing details of the spot name determination process (S22 in FIG. 13). First, the spot name determination unit 14A refers to the spot name classification learning dictionary 15A and identifies the genre associated with the spot name (S31). Here, when only one genre is matched, the genre position information to which the specified genre is assigned is returned (S32). On the other hand, if it matches two or more genres or does not match any of the dictionaries, an error is returned (S33).

図１５は、説明文判定処理（図１３のＳ２４）の詳細を示すフローチャートである。まず、説明文判定部１４Ｂは、説明文中の頻出語と説明文分類用学習データベース１５Ｂを利用し、各ジャンルとの類似度を計算する（Ｓ４１）。その結果、所定の閾値より高い類似度のジャンルが存在する場合、類似度の値が最も高いジャンルを付与したジャンル化位置情報を返却する（Ｓ４２→Ｓ４３）。一方、所定の閾値より高い類似度のジャンルが存在しない場合、エラーを返却する（Ｓ４２→Ｓ４４）。もちろん、閾値は適宜変更することが可能となっている。 FIG. 15 is a flowchart showing details of the explanatory note determination process (S24 in FIG. 13). First, the explanatory note determination unit 14B calculates the similarity with each genre by using the frequent words in the explanatory sentence and the explanatory database classification learning database 15B (S41). As a result, when there is a genre having a similarity higher than a predetermined threshold, genre position information to which the genre having the highest similarity is assigned is returned (S42 → S43). On the other hand, if there is no genre with a similarity higher than a predetermined threshold, an error is returned (S42 → S44). Of course, the threshold value can be changed as appropriate.

なお、「メゾン○○公園」のように特徴的な接尾語・接頭語の両方に合致する場合、スポット名判定処理において、２つのジャンルと判定されることになる（図１４のＳ３１）。この場合、以降の説明文判定処理において、判定された２つのジャンルのみに候補を絞るようにしてもよい（図１５のＳ４１）。 In addition, when it matches both the characteristic suffix and prefix like “Maison XX Park”, it is determined as two genres in the spot name determination process (S31 in FIG. 14). In this case, candidates may be narrowed down to only the two determined genres in the subsequent explanatory sentence determination processing (S41 in FIG. 15).

図１６は、類似度計算処理（図１５のＳ４１）の詳細を示すフローチャートである。まず、説明文判定部１４Ｂは、説明文分類用学習データベース１５Ｂに存在する特徴語に対し、スポット情報テーブル１１Ａのジャンル推定対象レコードにおける頻出語のうち、出現した単語を１、出現していない単語を０としてベクトル化する（Ｓ５１）。説明文分類用学習データベース１５Ｂの各レコードも同様にベクトル化しておく（Ｓ５１）。その後、説明文分類用学習データベース１５Ｂの各レコード毎にコサイン距離を計算する（Ｓ５２）。 FIG. 16 is a flowchart showing details of the similarity calculation process (S41 in FIG. 15). First, the explanatory note determination unit 14B sets one of the appearing words among the frequent words in the genre estimation target record of the spot information table 11A for the feature word existing in the explanatory sentence classification learning database 15B, the word that does not appear. Is vectorized as 0 (S51). Similarly, each record in the explanatory sentence classification learning database 15B is vectorized (S51). Thereafter, the cosine distance is calculated for each record in the explanatory sentence classification learning database 15B (S52).

（類似度計算処理の具体例）
以下、類似度計算処理の具体例について説明する。ここでは、説明文分類用学習データベース１５Ｂに存在する特徴語は、Ｄ＝（海洋、生物、古代、アジア、こってり、とんこつ、細麺）とする。この場合、説明文中の頻出語が「こってり、とんこつ」に対応するベクトルＤ＝（０，０，０，０，１，１，０）となる。また、特徴語が「海洋、生物、古代、アジア」のジャンル「博物館」に対するベクトルＤ１＝（１，１，１，１，０，０，０）となる。さらに、特徴語が「アジア、こってり、細麺」のジャンル「ラーメン」に対するベクトルＤ２＝（０，０，０，１，１，０，１）となる。 (Specific example of similarity calculation processing)
Hereinafter, a specific example of the similarity calculation process will be described. Here, it is assumed that the feature word existing in the explanatory sentence classification learning database 15B is D = (marine, creature, ancient, Asia, Koturi, Tonkotsu, fine noodles). In this case, the frequent word in the explanatory text is a vector D = (0, 0, 0, 0, 1, 1, 0) corresponding to “Stick, Tonkotsu”. Also, the vector D1 = (1, 1, 1, 1, 0, 0, 0) for the genre “museum” with the characteristic word “marine, creature, ancient, Asia”. Further, the vector D2 = (0, 0, 0, 1, 1, 0, 1) for the genre “ramen” of the feature word “Asian, heavy, thin noodles” is obtained.

このような場合、博物館ジャンルとの類似度は、次式により計算することができる。 In such a case, the similarity to the museum genre can be calculated by the following equation.

同様に、ラーメンジャンルとの類似度は、次式により計算することができる。 Similarly, the similarity with the ramen genre can be calculated by the following equation.

この類似度計算方法は一例であり、様々な計算方法を採用することができる。例えば、出現回数によるベクトルの重み付けを行ってもよい。また、類似度計算方法はコサイン距離でなくてもよい。例えば、オープンソースＪｕｂａｔｕｓを利用することも可能である。 This similarity calculation method is an example, and various calculation methods can be employed. For example, the vector may be weighted based on the number of appearances. The similarity calculation method may not be the cosine distance. For example, it is possible to use open source Jubatus.

（スポット名判定処理の具体例）
図１７は、スポット名判定処理の具体例を説明するための図である。既に説明した通り、スポット名判定部１４Ａは、ジャンルに特有の接尾語・接頭語の辞書を用いてジャンルを判定する。具体的には、ジャンルとその接尾語・接頭語パターンからスポット名分類用学習辞書１５Ａを作成する。そして、推定対象のスポット名をスポット名分類用学習辞書１５Ａと比較し、該当するパターンがスポット名分類用学習辞書１５Ａ内に存在する場合は、そのジャンルをそのスポットのジャンルとして特定する。一方、該当するパターンがスポット名分類用学習辞書１５Ａ内に存在しない場合はジャンル未定とし、次の処理に移る。このようにすれば、スポット名のパターンに基づくため、未知のスポット名に対してもジャンル推定が可能である。 (Specific example of spot name determination processing)
FIG. 17 is a diagram for explaining a specific example of the spot name determination process. As already described, the spot name determination unit 14A determines a genre using a suffix / prefix dictionary unique to the genre. Specifically, the spot name classification learning dictionary 15A is created from the genre and its suffix / prefix pattern. Then, the spot name to be estimated is compared with the spot name classification learning dictionary 15A, and if the corresponding pattern exists in the spot name classification learning dictionary 15A, the genre is specified as the genre of the spot. On the other hand, if the corresponding pattern does not exist in the spot name classification learning dictionary 15A, the genre is undetermined, and the process proceeds to the next process. In this way, since it is based on a spot name pattern, genre estimation is possible even for unknown spot names.

例えば、スポット名１が「琵琶湖」、スポット名２が「ＸＸちゃん」であるとする。この場合、図１７に示すように、スポット名１の「琵琶湖」のジャンルとして湖沼・池ジャンルが特定される。一方、スポット名２の「ＸＸちゃん」のジャンルは未定となり、次の処理に移る。 For example, assume that spot name 1 is “Lake Biwa” and spot name 2 is “XX-chan”. In this case, as shown in FIG. 17, the lake / pond genre is specified as the genre of “Lake Biwa” of the spot name 1. On the other hand, the genre of “XX-chan” of spot name 2 is undecided, and the process proceeds to the next process.

（説明文判定処理の具体例）
図１８は、説明文判定処理の具体例を説明するための図である。既に説明した通り、説明文判定部１４Ｂは、説明文の類似度による判定を行う。具体的には、Ｍｅｃａｂ等の形態素解析器を利用して名詞・動詞・形容詞をカウントする。これらの単語を文書分類器にかけ、最も近似するジャンルを特定する。このようにすれば、未知のスポットの中でも、接尾語・接頭語に特徴の無いスポット名や、他ジャンルにもありがちなスポット名に対応することができる。 (Specific example of explanation sentence determination processing)
FIG. 18 is a diagram for describing a specific example of the explanatory note determination process. As already described, the explanatory note determination unit 14B performs the determination based on the similarity of the explanatory text. Specifically, nouns, verbs, and adjectives are counted using a morphological analyzer such as Mecab. Apply these words to the document classifier to identify the closest genre. In this way, among unknown spots, it is possible to deal with spot names that are not characterized by suffixes / prefixes and spot names that are likely to exist in other genres.

例えば、図１８（ａ）に示すように、スポット名２の「ＸＸちゃん」の説明文は、「こってりとんこつラーメン…」であるとする。インターネット上からスポット名と住所をキーに説明文を取得した場合、図１８（ｂ）に示すように、複数の説明文が見つかる場合もある。このような場合でも、住所情報を加味することで必要な説明文「こってりとんこつラーメン…」を特定することができる。この説明文中の頻出語は、図１８（ｃ）に示すように、「こってり，とんこつ，ラーメン，人気，数，博多，濃厚」である。そのため、図１８（ｄ）に示すように、最も類似度の高いラーメンジャンルを「ＸＸちゃん」のジャンルとして採用する。 For example, as illustrated in FIG. 18A, it is assumed that the description of “XX-chan” in the spot name 2 is “Kokoritonkotsu Ramen ...”. When an explanatory text is acquired from the Internet using a spot name and an address as a key, a plurality of explanatory texts may be found as shown in FIG. Even in such a case, it is possible to specify the necessary explanatory text “Kokotoritonkotsu Ramen ...” by taking address information into account. As shown in FIG. 18 (c), the frequent words in this explanatory note are “Kokoteri, Tonkotsu, Ramen, Popularity, Number, Hakata, Rich”. Therefore, as shown in FIG. 18D, the ramen genre having the highest similarity is adopted as the genre of “XX-chan”.

（説明文分類用学習データベースの自動作成）
図１９は、説明文分類用学習データベース１５Ｂを自動作成する方法を説明するための図である。まず、同じジャンルの説明文を集め、Ｍｅｃａｂ等の形態素解析器を利用して名詞・動詞・形容詞の頻出語をカウントする。例えば、ラーメンジャンルの説明文１００を集めた場合、「細麺」「太麺」「醤油」「みそ」「塩」「とんこつ」「バター」などが頻出語としてカウントされる。これら頻出語のうち、食堂ジャンルや居酒屋ジャンルなどの他ジャンルに出現する単語「ビール」「おいしい」を削除し、当該ラーメンジャンルにのみ出現しやすい単語１０１を残す。この単語１０１をラーメンジャンルの特徴語として利用する。 (Automatic creation of learning database for explanatory text classification)
FIG. 19 is a diagram for explaining a method of automatically creating the explanatory sentence classification learning database 15B. First, explanations of the same genre are collected, and frequent words such as nouns, verbs, and adjectives are counted using a morphological analyzer such as Mecab. For example, when explanations 100 of ramen genres are collected, “thin noodles”, “thick noodles”, “soy sauce”, “miso”, “salt”, “tonkotsu”, “butter”, etc. are counted as frequent words. Among these frequently used words, the words “beer” and “delicious” appearing in other genres such as the cafeteria genre and the pub genre are deleted, and the word 101 that is likely to appear only in the ramen genre is left. This word 101 is used as a feature word of the ramen genre.

（先行技術文献との比較）
図２０は、先行技術文献と本発明の実施の形態とを比較するための図である。 (Comparison with prior art documents)
FIG. 20 is a diagram for comparing the prior art document and the embodiment of the present invention.

図２０（ａ）に示すように、特許文献１では、滞在スポットをジャンルに抽象化するため、スポットとジャンルの対応データベースを利用する。しかし、このような対応データベースの構築は煩雑であり、また、ジャンル変換用データベースに含まれる位置情報でないとジャンル付与できない。 As shown in FIG. 20A, in Patent Document 1, a spot-genre correspondence database is used to abstract stay spots into genres. However, the construction of such a correspondence database is complicated, and genres can only be assigned if the location information is included in the genre conversion database.

図２０（ｂ）に示すように、特許文献２では、対応データベースの準備作業を低減するために、インターネット上からスポットとジャンルの対応情報を収集する。そのため、スポットとジャンルの対応データベースの構築は特許文献１よりも容易である。しかし、スポットとジャンルの対応情報がインターネット上にないスポットについてはジャンル付与できない。 As shown in FIG. 20B, in Patent Document 2, in order to reduce the preparation work of the correspondence database, correspondence information between spots and genres is collected from the Internet. Therefore, the construction of the spot-genre correspondence database is easier than that of Patent Document 1. However, a genre cannot be assigned to a spot for which there is no correspondence information between the spot and genre on the Internet.

図２０（ｃ）に示すように、本発明の実施の形態では、接尾語・接頭語の特徴による推定技術と、インターネット上の説明文の類似度による推定技術とを組み合わせる。そのため、スポットとジャンルの対応データベースの構築は特許文献２よりも容易である。また、スポットとジャンルの対応情報がインターネット上になくても、スポット名そのもの又は関連情報があればジャンルを推定することができる。 As shown in FIG. 20 (c), in the embodiment of the present invention, an estimation technique based on suffix / prefix characteristics and an estimation technique based on similarity of explanatory text on the Internet are combined. Therefore, the construction of the spot-genre correspondence database is easier than that of Patent Document 2. Even if the correspondence information between the spot and the genre is not on the Internet, the genre can be estimated if the spot name itself or related information exists.

以上のように、本発明の実施の形態におけるジャンル推定サーバ１０は、ユーザの滞在スポット情報に対応するジャンルを推定するジャンル推定サーバであって、ユーザの滞在スポット情報を取得するスポット情報取得部１３と、スポット情報取得部１３により取得されたスポット情報に含まれるスポット名の接尾語または接頭語の特徴、およびスポット名のスポットを説明するインターネット上の説明文を組み合わせてジャンルを分類するジャンル分類部１４とを備える。これにより、互いのデメリットを相殺する２つの推定技術を組み合わせることができるため、対応データベースに準備されていない未知のスポットに対するジャンル付与を効率的に実施することが可能である。 As described above, the genre estimation server 10 according to the embodiment of the present invention is a genre estimation server that estimates the genre corresponding to the stay spot information of the user, and the spot information acquisition unit 13 that acquires the stay spot information of the user. And a genre classification unit that classifies a genre by combining a suffix or prefix feature of a spot name included in the spot information acquired by the spot information acquisition unit 13 and a description on the Internet that describes the spot name spot 14. Thereby, since two estimation techniques that cancel each other's disadvantages can be combined, it is possible to efficiently perform genre assignment for unknown spots that are not prepared in the correspondence database.

具体的には、ジャンル分類部１４は、スポット名の接尾語または接頭語の特徴からジャンルを判定するスポット名判定処理を実施し、スポット名判定処理において１つのジャンルに合致しなかった場合、インターネット上の説明文を解析してジャンルを判定する説明文判定処理を実施する。これにより、スポット名判定処理が正常終了した場合は説明文判定処理を実施する必要がないため、精度よく且つ迅速にジャンルを推定することが可能である。 Specifically, the genre classification unit 14 performs a spot name determination process for determining a genre based on the suffix or prefix feature of the spot name and does not match one genre in the spot name determination process. An explanatory sentence determination process for analyzing the above explanatory sentence and determining the genre is performed. As a result, when the spot name determination process ends normally, it is not necessary to execute the explanatory sentence determination process, so that the genre can be estimated accurately and quickly.

また、ジャンル分類部１４は、スポット名判定処理において２つ以上のジャンルに合致した場合、その合致した２つ以上のジャンルのみに候補を絞って説明文判定処理を実施してもよい。これにより、各ジャンルとの類似度を計算する必要がないため、さらに迅速にジャンルを推定することが可能である。 In addition, when the spot name determination process matches two or more genres, the genre classification unit 14 may perform the explanatory sentence determination process by narrowing down candidates to only the two or more genres that match. Thereby, since it is not necessary to calculate the similarity with each genre, it is possible to estimate the genre more quickly.

また、スポット情報取得部１３は、スポット名判定処理に先立って、インターネット上の説明文中に頻出する単語である頻出語をカウントし、ジャンル分類部１４は、スポット名判定処理を実施した後、スポット情報取得部１３によりカウントされた頻出語を用いて説明文判定処理を実施してもよい。これにより、緯度・経度などの情報とともに頻出語をスポット情報データベース１１に格納することができるため、効率的である。 Further, prior to the spot name determination process, the spot information acquisition unit 13 counts frequent words that are frequently appearing in the explanatory text on the Internet, and the genre classification unit 14 performs the spot name determination process, The explanatory sentence determination process may be performed using the frequent words counted by the information acquisition unit 13. Thus, frequently used words can be stored in the spot information database 11 together with information such as latitude and longitude, which is efficient.

以上説明したように、本発明の実施の形態によれば、対応データベースに準備されていない未知のスポットに対するジャンル付与を効率的に実施することができる。すなわち、事前登録していない未知のスポットでも、スポット名そのもの又は関連情報が判明すれば、自動でジャンルを付与することができる。その結果、スポットとジャンルを紐付けるデータベース構築作業を低減することが可能である。また、従来よりも多くの地点をジャンル変換することができるため、偏り無くユーザ滞在スポットのジャンルを取得することが可能である。本技術を観光分野に適用すれば、観光客の移動パターンを推定することができ、自治体におけるインフラ整備の基礎情報としたり、移動パターンに沿った柔軟なレコメンドを実施したりすることが可能である。 As described above, according to the embodiment of the present invention, it is possible to efficiently perform genre assignment for unknown spots that are not prepared in the correspondence database. That is, a genre can be automatically assigned to an unknown spot that has not been pre-registered if the spot name itself or related information is found. As a result, it is possible to reduce the database construction work for associating spots and genres. In addition, since more points can be genre converted than before, it is possible to acquire the genre of the user stay spot without any bias. If this technology is applied to the tourism field, it is possible to estimate the movement pattern of tourists and use it as basic information for infrastructure development in the local government, or to make flexible recommendations according to the movement pattern. .

なお、本発明は、ジャンル推定サーバ１０として実現することができるだけでなく、ジャンル推定サーバ１０が備える特徴的な機能部をステップとするジャンル推定方法として実現したり、それらのステップをコンピュータに実行させるジャンル推定プログラムとして実現したりすることもできる。そして、そのようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのはいうまでもない。 Note that the present invention can be realized not only as the genre estimation server 10 but also as a genre estimation method using characteristic function units included in the genre estimation server 10 as steps, or causing a computer to execute these steps. It can also be realized as a genre estimation program. Needless to say, such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

１，１Ａ，１Ｂ…端末
２…ログ収集サーバ
３…インターネット
３Ａ…ユーザ緯度経度情報
４Ａ，４Ｂ…外部システム
１０…ジャンル推定サーバ
１１…スポット情報データベース
１１Ａ…スポット情報テーブル
１１Ｂ…スポット情報キャッシュ
１２…ジャンル化位置情報データベース
１３…スポット情報取得部
１４…ジャンル分類部
１４Ａ…スポット名判定部
１４Ｂ…説明文判定部
１５…学習用データベース
１５Ａ…スポット名分類用学習辞書
１５Ｂ…説明文分類用学習データベース DESCRIPTION OF SYMBOLS 1,1A, 1B ... Terminal 2 ... Log collection server 3 ... Internet 3A ... User latitude / longitude information 4A, 4B ... External system 10 ... Genre estimation server 11 ... Spot information database 11A ... Spot information table 11B ... Spot information cache 12 ... Genre Location information database 13 ... Spot information acquisition unit 14 ... Genre classification unit 14A ... Spot name determination unit 14B ... Description sentence determination unit 15 ... Learning database 15A ... Spot name classification learning dictionary 15B ... Explanation sentence classification learning database

Claims

A genre estimation server that estimates a genre corresponding to user stay spot information,
A spot information acquisition unit for acquiring user stay spot information;
A genre classification unit for classifying a genre by combining a suffix or prefix feature of a spot name included in the spot information acquired by the spot information acquisition unit, and a description on the Internet that describes the spot of the spot name; equipped with a,
The genre classification unit performs a spot name determination process for determining a genre from features of the suffix or prefix of the spot name, and when two or more genres are matched in the spot name determination process, the matched 2 The candidate sentence is narrowed down to only one or more genres, and the explanatory sentence determination process for analyzing the explanatory sentence on the Internet and determining the genre is performed.
A genre estimation server characterized by that.

Prior to the spot name determination process, the spot information acquisition unit counts frequent words that are frequent words in the description on the Internet,
The genre classification unit, genre of claim 1, wherein the after performing the spot name determination process, implementing the explanation determination process using frequent words counted by said spot information acquiring unit Estimated server.

A genre estimation method in which a genre estimation server estimates a genre corresponding to user stay spot information,
Computer
A spot information acquisition step for acquiring user stay spot information;
A genre classification step for classifying a genre by combining a suffix or prefix feature of a spot name included in the spot information acquired in the spot information acquisition step, and a description on the Internet that describes the spot of the spot name; Run
In the genre classification step, a spot name determination process for determining a genre based on the suffix or prefix feature of the spot name is performed, and two or more genres are matched in the spot name determination process. The candidate sentence is narrowed down to only one or more genres, and the explanatory sentence determination process for analyzing the explanatory sentence on the Internet and determining the genre is performed
The genre estimation method characterized by this.

A genre estimation program for causing a computer to function as each processing unit according to claim 1 .