JP2015102933A

JP2015102933A - Place-name position estimation method, place-name position estimation apparatus, and place-name position estimation program

Info

Publication number: JP2015102933A
Application number: JP2013241668A
Authority: JP
Inventors: 良太今井; Ryota Imai; 浩之戸田; Hiroyuki Toda; 鷲崎　誠司; Seiji Washisaki; 誠司鷲崎
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-11-22
Filing date: 2013-11-22
Publication date: 2015-06-04
Anticipated expiration: 2033-11-22
Also published as: JP6106069B2

Abstract

PROBLEM TO BE SOLVED: To improve the accuracy of estimating the position of the name of a place.SOLUTION: A POI data storage section 12 stores multiple pieces of POI data formed by associating the name of a POI with position information of the POI. A POI related-document extraction section 13 extracts document data including the name of the POI from the document data. A document analysis section 14 extracts the name of a place related to the POI data from the extracted document data, and calculates the number of times that the name of the place appears in the document data. A place-name position estimation section 15 calculates the position of the name of the place, by use of weight of the POI data and the position information of the POI. The frequency of appearance of the name of the place is the weight of the POI data with respect to the name of the place.

Description

本発明は、地名の位置を推定する技術に関する。 The present invention relates to a technique for estimating the position of a place name.

現在、文書データに含まれる地名の地理的な位置を推定する技術が求められている。例えば、「友人と中華街のＡＡという店にいきました。」という文書データから、地名に該当する「中華街」の地図上の場所や緯度経度を表示する技術である。 Currently, there is a need for a technique for estimating the geographical position of a place name included in document data. For example, this is a technique for displaying the location and latitude / longitude on the map of “Chinatown” corresponding to the place name from document data “I went to a store called AA in Chinatown with a friend”.

このような技術の例として、地名とその位置情報とを対応付けた辞書データを用いて地名を実世界の位置に変換する技術がある（従来技術１）。また、特許文献１では、地名と関連する複数の地点を抽出し、各地点を内包するポリゴンを地名として算出する技術を開示している（従来技術２）。 As an example of such a technique, there is a technique for converting a place name into a real-world position using dictionary data in which the place name is associated with its position information (Prior Art 1). Further, Patent Document 1 discloses a technique of extracting a plurality of points related to a place name and calculating a polygon including each point as a place name (Prior Art 2).

特開２００９−３７３１６号公報JP 2009-37316 A

今井、“ＰＯＩ情報を利用したＷｅｂ文書からの地名の抽出”、情報処理学会、第１２回情報科学技術フォーラム公演論文集第２分冊、FIT2013、D-019、p.127-128Imai, “Extracting Place Names from Web Documents Using POI Information”, Information Processing Society of Japan, 12th Information Science and Technology Forum Proceedings Vol. 2, FIT2013, D-019, p.127-128 岡崎、外１名、“集合間類似度に対する簡潔かつ高速な類似文字列検索アルゴリズム”、言語処理学会、自然言語処理、Vol.13、No.2、2006年4月、p.1-29Okazaki, 1 other, “Concise and Fast Similar String Search Algorithm for Inter-set Similarity”, Language Processing Society of Japan, Natural Language Processing, Vol.13, No.2, April 2006, p.1-29

しかしながら、従来技術１によれば、辞書データを用いて地名を位置に変換するため、推定対象の地名が辞書データに登録されていない場合には変換することができない。また、その地名が辞書データにある場合でも、登録されている地名の位置が一般的に想起される位置と異なる可能性もある。 However, according to the prior art 1, since the place name is converted into the position using the dictionary data, it cannot be converted when the place name to be estimated is not registered in the dictionary data. Even if the place name is in the dictionary data, the registered place name position may be different from the generally recalled position.

例えば、「品川」という地名を位置に変換するとき、住所の辞書データから「東京都品川区」の中心点を「品川」の位置として表示しても、一般的に想起される「品川」は中心点よりも北側にある品川駅である、という場合である。 For example, when converting the place name “Shinagawa” to a position, even if the central point of “Shinagawa-ku, Tokyo” is displayed as the position of “Shinagawa” from the dictionary data of the address, “Shinagawa” generally recalled is This is the case of Shinagawa Station on the north side of the center point.

一方、従来技術２によれば、地名と関連性のある地点を利用することから、一般的に想起される位置に近い位置が得られる。しかし、地名を多角形の面に変換するため、その面内のいずれの位置が地名に合致しているかを把握できない。 On the other hand, according to the prior art 2, since a point related to the place name is used, a position close to a generally recalled position can be obtained. However, since the place name is converted into a polygonal face, it is impossible to grasp which position in the face matches the place name.

本発明は、上記事情を鑑みてなされたものであり、地名の位置の推定精度を改善することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to improve the estimation accuracy of the position of a place name.

請求項１に記載の地名位置推定方法は、コンピュータにより、所定の地点の名称と前記地点の位置情報とを対応付けた複数の地点データを記憶手段に記憶しておく記憶ステップと、前記記憶手段から地点の名称を読み出して、複数の文書データから前記地点の名称を含む文書データを抽出する抽出ステップと、抽出された文書データから前記地点に関連する地名を抽出し、前記地名が当該文書データ内に出現する頻度を算出する解析ステップと、算出された地名の出現頻度を前記地名に対する前記地点の重要度とし、前記地点の重要度と当該地点の位置情報とを用いて前記地名の位置を算出する推定ステップと、を有することを要旨とする。 The place name position estimation method according to claim 1 is a storage step of storing a plurality of point data in which a name of a predetermined point and position information of the point are associated with each other by a computer, and the storage unit. An extraction step of reading the name of the spot from the document data and extracting the document data including the name of the spot from a plurality of document data, and extracting the place name related to the spot from the extracted document data, and the place name is the document data An analysis step for calculating the frequency of occurrence of the place name, and the calculated appearance frequency of the place name as the importance of the point with respect to the place name, and using the importance of the point and the position information of the point, the position of the place name is determined It has an estimation step to calculate.

本発明によれば、所定の地点の名称と地点の位置情報とを対応付けた複数の地点データを記憶しておき、複数の文書データから地点の名称を含む文書データを抽出し、抽出された文書データから地点に関連する地名を抽出し、その地名が文書データ内に出現する頻度を算出し、算出された地名の出現頻度を地名に対する地点の重要度として、地点の重要度と地点の位置情報とを用いて地名の位置を算出するため、地名の位置を確実に推定できる。 According to the present invention, a plurality of point data in which a name of a predetermined point and position information of the point are associated are stored, document data including the name of the point is extracted from the plurality of document data, and extracted. The location name related to the location is extracted from the document data, the frequency of the location name appearing in the document data is calculated, and the importance of the location and the location of the location are calculated using the frequency of appearance of the location as the importance of the location with respect to the location name. Since the position of the place name is calculated using the information, the position of the place name can be reliably estimated.

請求項２に記載の地名位置推定方法は、請求項１に記載の地名位置推定方法において、前記解析ステップでは、前記地点の名称と前記地名と前記地名の出現頻度とを対応付けた複数の組み合わせデータを出力し、前記推定ステップにおいて、前記複数の組み合わせデータを統合して対応する地点の名称と地名とを結びつけた統合グラフを生成し、前記統合グラフを用いて地名に対応する地点の名称を取得し、当該地点の重要度と当該地点の位置情報とを用いて地名の位置を算出することを要旨とする。 The place name position estimation method according to claim 2 is the place name position estimation method according to claim 1, wherein, in the analysis step, a plurality of combinations in which the name of the point, the place name, and the appearance frequency of the place name are associated with each other. Output data, and in the estimating step, combine the plurality of combination data to generate an integrated graph in which the name of the corresponding point and the name of the place are combined, and use the integrated graph to determine the name of the point corresponding to the name of the place The gist is to obtain and calculate the location of the place name using the importance of the location and the location information of the location.

請求項３に記載の地名位置推定方法は、請求項１又は２に記載の地名位置推定方法において、前記推定ステップでは、前記地点の重要度を当該地点からの距離に換算して地名の位置を算出することを要旨とする。 The place name position estimation method according to claim 3 is the place name position estimation method according to claim 1 or 2, wherein in the estimation step, the importance of the point is converted into a distance from the point, and the position of the place name is calculated. The gist is to calculate.

請求項４に記載の地名位置推定装置は、所定の地点の名称と前記地点の位置情報とを対応付けた複数の地点データを記憶しておく記憶手段と、前記記憶手段から地点の名称を読み出して、複数の文書データから前記地点の名称を含む文書データを抽出する抽出手段と、抽出された文書データから前記地点に関連する地名を抽出し、前記地名が当該文書データ内に出現する頻度を算出する解析手段と、算出された地名の出現頻度を前記地名に対する前記地点の重要度とし、前記地点の重要度と当該地点の位置情報とを用いて前記地名の位置を算出する推定手段と、を有することを要旨とする。 The place name position estimation apparatus according to claim 4 stores a plurality of point data in which a name of a predetermined point and position information of the point are associated with each other, and reads the name of the point from the storage unit Extracting means for extracting document data including the name of the point from a plurality of document data, extracting a place name related to the point from the extracted document data, and determining a frequency at which the place name appears in the document data. An analyzing means for calculating, and an estimation means for calculating the position of the place name using the importance of the point and the position information of the point, using the calculated appearance frequency of the place name as the importance of the point with respect to the place name, It is summarized as having.

請求項５に記載の地名位置推定装置は、請求項４に記載の地名位置推定装置において、前記解析手段は、前記地点の名称と前記地名と前記地名の出現頻度とを対応付けた複数の組み合わせデータを出力し、前記推定手段は、前記複数の組み合わせデータを統合して対応する地点の名称と地名とを結びつけた統合グラフを生成し、前記統合グラフを用いて地名に対応する地点の名称を取得し、当該地点の重要度と当該地点の位置情報とを用いて地名の位置を算出することを要旨とする。 The place name position estimation device according to claim 5 is the place name position estimation apparatus according to claim 4, wherein the analysis means associates the name of the point, the place name, and the appearance frequency of the place name. Output the data, the estimation means integrates the plurality of combination data to generate an integrated graph in which the name of the corresponding point and the name of the place are combined, and using the integrated graph, the name of the point corresponding to the name of the place The gist is to obtain and calculate the location of the place name using the importance of the location and the location information of the location.

請求項６に記載の地名位置推定装置は、請求項４又は５に記載の地名位置推定装置において、前記推定手段は、前記地点の重要度を当該地点からの距離に換算して地名の位置を算出することを要旨とする。 The place name position estimation apparatus according to claim 6 is the place name position estimation apparatus according to claim 4 or 5, wherein the estimation means converts the importance of the point into a distance from the point and calculates the position of the place name. The gist is to calculate.

請求項７に記載の地名位置推定プログラムは、請求項１乃至３のいずれかに記載の地名位置推定方法をコンピュータに実行させることを要旨とする。 The place name position estimation program according to claim 7 causes a computer to execute the place name position estimation method according to any one of claims 1 to 3.

本発明によれば、地名の位置を確実に推定できる。 According to the present invention, the position of a place name can be reliably estimated.

地名位置推定装置の機能ブロック構成を示す図である。It is a figure which shows the functional block structure of a place name position estimation apparatus. 文書データの例を示す図である。It is a figure which shows the example of document data. ＰＯＩデータの例を示す図である。It is a figure which shows the example of POI data. 地名位置推定装置の動作フローを示す図である。It is a figure which shows the operation | movement flow of a place name position estimation apparatus. テキスト文書等の抽出結果例を示す図である。It is a figure which shows the example of extraction results, such as a text document. 文書解析部の機能ブロック構成を示す図である。It is a figure which shows the functional block structure of a document analysis part. 文書解析部の動作フローを示す図である。It is a figure which shows the operation | movement flow of a document analysis part. 文書解析部の動作フロー説明時の参照図である。It is a reference figure at the time of description of the operation | movement flow of a document analysis part. 地名の出現回数等の出力結果例を示す図である。It is a figure which shows the example of output results, such as the frequency | count of appearance of a place name. 地名位置推定部の機能ブロック構成を示す図である。It is a figure which shows the functional block structure of a place name position estimation part. 地名位置推定部の動作フローを示す図である。It is a figure which shows the operation | movement flow of a place name position estimation part. 統合グラフの例を示す図である。It is a figure which shows the example of an integrated graph. 地名の位置の推定結果例を示す図である。It is a figure which shows the example of an estimation result of the position of a place name.

本発明は、地名には住所が存在しないものや、登録されている住所がユーザの認知する住所と異なるものがあることに着目し、推定対象の地名に係る位置情報以外の情報（後述するＰＯＩデータ等）を用いて当該地名の位置を推定することを特徴としている。 The present invention pays attention to the fact that place names do not exist in some place names, and registered addresses that are different from the addresses recognized by the user, and information other than position information related to place names to be estimated (POI described later). The location of the place name is estimated using data etc.).

以下、本発明を実施する一実施の形態について図面を用いて説明する。 Hereinafter, an embodiment for carrying out the present invention will be described with reference to the drawings.

まず、本実施の形態で使用する言葉の定義を説明する。 First, the definition of the words used in this embodiment will be described.

「テキスト文書」とは、日本語等の自然言語で記述された文書をプレーンテキストで表現したものである。例えば、インターネット上で公開されているブログの記事から本文を抽出したものが挙げられる。 A “text document” is a plain text representation of a document described in a natural language such as Japanese. For example, a text extracted from a blog article published on the Internet.

「地名」とは、実世界に存在する場所を日本語等の自然言語で表したものである。例えば、「東京」、「横浜」、「どぶ板通り」、「東京ソラマチ」等が挙げられる。また、「地名の位置」とは，地名を１つの点である座標として表現したものである。一般的には緯度と経度の組で表される。 A “place name” is a place in the real world expressed in a natural language such as Japanese. For example, “Tokyo”, “Yokohama”, “Dobita Street”, “Tokyo Solamachi” and the like can be mentioned. The “location of the place name” represents the place name as coordinates that are one point. Generally expressed as a pair of latitude and longitude.

「ＰＯＩ（ＰｏｉｎｔｏｆＩｎｔｅｒｅｓｔ）」とは、実世界に存在する何らかの意味を持つ地点を指す。例えば、飲食店等の店舗や観光スポット、「横浜マリンタワー」等のランドマークが挙げられる。 “POI (Point of Interest)” refers to a point having some meaning in the real world. For example, shops such as restaurants and sightseeing spots, and landmarks such as “Yokohama Marine Tower” can be mentioned.

次に、本実施の形態に係る地名位置推定装置１の機能を説明する。 Next, the function of the place name position estimation apparatus 1 according to the present embodiment will be described.

図１は、地名位置推定装置１の機能ブロック構成を示す図である。この地名位置推定装置１は、文書データ記憶部１１と、ＰＯＩデータ記憶部１２と、ＰＯＩ関連文書抽出部１３と、文書解析部１４と、地名位置推定部１５と、地名位置データ記憶部１６とを備えて構成される。 FIG. 1 is a diagram showing a functional block configuration of the place name position estimation apparatus 1. The place name position estimation apparatus 1 includes a document data storage unit 11, a POI data storage unit 12, a POI related document extraction unit 13, a document analysis unit 14, a place name position estimation unit 15, and a place name position data storage unit 16. It is configured with.

文書データ記憶部１１は、地名の抽出元である複数のテキスト文書を記憶する。これらのテキスト文書は事前に収集され、文書データ記憶部１１に予め記憶されている。文書データの例を図２に示す。 The document data storage unit 11 stores a plurality of text documents from which place names are extracted. These text documents are collected in advance and stored in the document data storage unit 11 in advance. An example of document data is shown in FIG.

ＰＯＩデータ記憶部１２は、複数のＰＯＩデータ（地点データ）を記憶する。ＰＯＩデータには、少なくともＰＯＩの名称とＰＯＩの位置情報が対応付けて格納されている。ＰＯＩの位置は、例えば緯度と経度の組として表されている。これらのＰＯＩデータは事前に収集され、ＰＯＩデータ記憶部１２に予め記憶されている。ＰＯＩデータの例を図３に示す。 The POI data storage unit 12 stores a plurality of POI data (point data). The POI data stores at least a POI name and POI position information in association with each other. The position of the POI is represented as a pair of latitude and longitude, for example. These POI data are collected in advance and stored in the POI data storage unit 12 in advance. An example of POI data is shown in FIG.

ＰＯＩ関連文書抽出部１３は、各テキスト文書と各ＰＯＩデータを用いて、複数のテキスト文書からＰＯＩの名称が含まれているテキスト文書を抽出する。 The POI related document extraction unit 13 extracts a text document including the name of the POI from a plurality of text documents using each text document and each POI data.

文書解析部１４は、形態素解析技術及び固有表現抽出技術を用いて、ＰＯＩ関連文書抽出部１３によって抽出されたテキスト文書からＰＯＩデータに関連する地名を抽出し、更に地名がテキスト文書内に出現する回数（地名の出現頻度）を算出する。 The document analysis unit 14 extracts a place name related to the POI data from the text document extracted by the POI related document extraction unit 13 using the morphological analysis technique and the specific expression extraction technique, and the place name appears in the text document. The number of times (frequency of appearance of place names) is calculated.

地名位置推定部１５は、文書解析部１４で算出された地名の出現回数を当該地名に対するＰＯＩデータの重み（重要度）とし、各ＰＯＩデータの重みと各ＰＯＩの位置情報とを用いて当該地名の位置を算出する。 The place name position estimation unit 15 sets the number of appearances of the place name calculated by the document analysis unit 14 as the weight (importance) of the POI data for the place name, and uses the weight of each POI data and the position information of each POI. The position of is calculated.

地名位置データ記憶部１６は、地名位置推定部１５で算出された地名の位置情報を記憶する。 The place name position data storage unit 16 stores the position information of the place name calculated by the place name position estimation unit 15.

以上が地名位置推定装置１の備える機能である。なお、文書データ記憶部１１とＰＯＩデータ記憶部１２とＰＯＩ関連文書抽出部１３と文書解析部１４の一部の機能とは、前述の非特許文献１に開示された技術に基づいている。この開示技術との相違は、文書解析部１４でＰＯＩの名称を出力するのに加えて地名の出現回数を算出し、地名位置推定部１５において、その地名の出現回数を用いて地名の位置を推定するようにしている点にある。 The above is the function with which the place name position estimation apparatus 1 is provided. Note that some functions of the document data storage unit 11, the POI data storage unit 12, the POI-related document extraction unit 13, and the document analysis unit 14 are based on the technique disclosed in Non-Patent Document 1 described above. The difference from this disclosed technique is that, in addition to outputting the POI name in the document analysis unit 14, the number of appearances of the place name is calculated, and the place name position estimation unit 15 determines the position of the place name using the number of appearances of the place name. It is in the point which is trying to estimate.

次に、地名位置推定装置１の全体動作を説明する。図４は、地名位置推定装置１の動作フローを示す図である。 Next, the overall operation of the place name position estimation apparatus 1 will be described. FIG. 4 is a diagram illustrating an operation flow of the place name position estimation apparatus 1.

最初に、ステップＳ１０１において、ＰＯＩ関連文書抽出部１３が、文書データ記憶部１１とＰＯＩデータ記憶部１２から各テキスト文書と各ＰＯＩの名称をそれぞれ読み出して、各テキスト文書のうちＰＯＩの名称を含むテキスト文書を抽出する。その際、前述の非特許文献２に開示された類似文字列検索技術を利用し、ＰＯＩの名称が表記ゆれによってわずかに異なる表記で記述されているテキスト文書を抽出してもよい。 First, in step S101, the POI related document extracting unit 13 reads out the names of the text documents and the POIs from the document data storage unit 11 and the POI data storage unit 12, respectively, and includes the POI names of the text documents. Extract text documents. At that time, a text document in which the POI name is described in slightly different notation due to the notation fluctuation may be extracted using the similar character string search technique disclosed in Non-Patent Document 2 described above.

そして、抽出されたテキスト文書と、そのテキスト文書に含まれるＰＯＩの名称と、そのＰＯＩの名称がテキスト文書上で出現する桁数や行数等の出現位置とを関連付けて出力する。その出力結果例を図５に示す。 Then, the extracted text document, the POI name included in the text document, and the appearance position such as the number of digits and the number of lines where the POI name appears on the text document are output in association with each other. An example of the output result is shown in FIG.

例えば、「昨日は天気がよかったので、横浜通りに行って横浜ｃａｆｅのケーキを食べました。」というテキスト文書があり、「横浜Ｃａｆｅ」という名称のＰＯＩデータがある場合には、このテキスト文書を抽出し、ＰＯＩの名称の出現位置として「２２−２７文字目」を出力する。 For example, if there is a text document that says “Yesterday, the weather was nice, so I went to Yokohama street and ate a cake from Yokohama cafe.” If there is POI data named “Yokohama Cafe,” this text document Extract and output “22-27th characters” as the appearance position of the POI name.

次に、ステップＳ１０２において、文書解析部１４が、ステップＳ１０１で抽出されたテキスト文書とＰＯＩの名称とＰＯＩの出現位置とを用いて当該テキスト文書からＰＯＩデータに関連する地名を抽出し、更に当該地名の出現回数を算出する。この処理は後で詳述する。 Next, in step S102, the document analysis unit 14 extracts a place name related to the POI data from the text document using the text document extracted in step S101, the POI name, and the POI appearance position. Calculate the number of appearances of the place name. This process will be described in detail later.

最後に、ステップＳ１０３において、地名位置推定部１５が、ＰＯＩデータ記憶部１２から読み出した各ＰＯＩの位置情報を用いて、ステップＳ１０２で出力されたＰＯＩの名称と地名とその出現回数とからなる組の集合から各地名の位置を算出する。そして、その算出結果である地名とその位置情報を地名位置データ記憶部１６に記憶させる。この処理についても後で詳述する。 Finally, in step S103, the place name position estimation unit 15 uses the position information of each POI read from the POI data storage unit 12, and includes the POI name, the place name, and the number of appearances output in step S102. The position of each name is calculated from the set. Then, the place name and its position information as the calculation result are stored in the place name position data storage unit 16. This process will also be described in detail later.

以上が地名位置推定装置１の全体動作である。 The above is the overall operation of the place name position estimation apparatus 1.

続いて、前述のステップＳ１０２における文書解析部１４の動作を説明する。文書解析部１４は、図６に示すように、文書入力部１４１と、形態素解析部１４２と、固有表現抽出部１４３と、地名抽出部１４４と、地名出力部１４５とを備えて構成される。図７は、文書解析部１４の動作フローを示す図である。 Next, the operation of the document analysis unit 14 in step S102 described above will be described. As shown in FIG. 6, the document analysis unit 14 includes a document input unit 141, a morpheme analysis unit 142, a specific expression extraction unit 143, a place name extraction unit 144, and a place name output unit 145. FIG. 7 is a diagram illustrating an operation flow of the document analysis unit 14.

最初に、ステップＳ１０２−１において、文書入力部１４１が、ステップＳ１０１で抽出されたテキスト文書とＰＯＩの名称とＰＯＩの名称の出現位置と（図５参照）の入力を受け付けた後、テキスト文書を形態素解析部１４２に出力し、ＰＯＩの名称とＰＯＩの表記を地名抽出部１４４に出力する。 First, in step S102-1, the document input unit 141 receives input of the text document extracted in step S101, the POI name, the POI name appearance position (see FIG. 5), and then the text document. The morpheme analysis unit 142 outputs the POI name and the POI notation to the place name extraction unit 144.

「ＰＯＩの表記」とは、ＰＯＩの名称がテキスト文書内のＰＯＩの名称の出現位置で実際に記述されている表記である。ＰＯＩ関連文書抽出部１３は、前述したようにＰＯＩの名称が表記ゆれによってわずかに異なる表記で記述されていてもテキスト文書を抽出できるため、ここでは実際の記述をＰＯＩの表記として取り出す。例えば、ＰＯＩの名称が「横浜Ｃａｆｅ」であり、出現位置で実際に記述されているのが「横浜ｃａｆｅ」の場合、「横浜ｃａｆｅ」がＰＯＩの表記として出力される。 The “POI notation” is a notation in which the POI name is actually described at the appearance position of the POI name in the text document. Since the POI related document extracting unit 13 can extract a text document even if the POI name is described with slightly different notation due to the notation as described above, the POI related document extracting unit 13 extracts the actual description as the POI notation here. For example, if the POI name is “Yokohama Cafe” and “Yokohama cafe” is actually described at the appearance position, “Yokohama cafe” is output as the notation of POI.

次に、ステップＳ１０２−２において、形態素解析部１４２が、入力されたテキスト文書を形態素解析することによって形態素で区切られた文の列に変換する。「文の列」とは、１つ以上の文からなる順序のある集まりを指す。例えば、図８（ａ）のテキスト文書が入力されると、同図（ｃ）のような文の列が出力される。なお、形態素解析技術は公知の技術である。 Next, in step S102-2, the morpheme analyzer 142 converts the input text document into a sequence of sentences delimited by morphemes by performing morphological analysis. A “sentence string” refers to an ordered collection of one or more sentences. For example, when the text document shown in FIG. 8A is input, a sentence string as shown in FIG. 8C is output. Note that the morphological analysis technique is a known technique.

次に、ステップＳ１０２−３において、固有表現抽出部１４３が、形態素解析後のテキスト文書（文の列）から日時や場所等の固有表現を抽出し、抽出された形態素に各固有表現の種別を識別するマークを付与する。例えば、図８（ｃ）の形態素で区切られた文の列が入力されると、同図（ｄ）のような日時や場所のマークが付与された文の列が出力される。なお、固有表現抽出技術も公知の技術である。 Next, in step S102-3, the specific expression extraction unit 143 extracts specific expressions such as date and time from the text document (sentence column) after the morphological analysis, and sets the type of each specific expression to the extracted morphemes. A mark for identification is given. For example, when a sequence of sentences delimited by morphemes in FIG. 8C is input, a sequence of sentences with a date and time mark as shown in FIG. 8D is output. The specific expression extraction technique is also a known technique.

次に、ステップＳ１０２−４において、地名抽出部１４４が、固有表現のマークが付与された文の列の中からＰＯＩの表記を含む文のみを抽出する。その際、非特許文献１の開示技術と同様に品詞等の条件を用いて文を選別することもできる。そしてその後、固有表現のマークのうち場所のマークが付与された形態素の列を抽出し、これらを地名として取り出す。ただし、非特許文献１の開示技術と同様に場所のマークがＰＯＩの表記と重なっている場合には取り出さない。 Next, in step S102-4, the place name extraction unit 144 extracts only the sentence including the POI notation from the sentence column to which the unique expression mark is added. At that time, the sentence can be selected using conditions such as part of speech as in the technology disclosed in Non-Patent Document 1. After that, morpheme strings to which place marks are assigned are extracted from the marks of specific expressions, and these are extracted as place names. However, as in the technology disclosed in Non-Patent Document 1, if the place mark overlaps the POI notation, it is not extracted.

例えば、図８（ｄ）のマークが付与された文の列に対して、同図（ｂ）のＰＯＩの表記を含む文を抽出すると、同図（ｅ）のようになる。そして、ここから場所のマークが付与された形態素の列を抽出すると、同図（ｆ）のような地名が取り出される。 For example, when a sentence including the POI notation shown in FIG. 8B is extracted from the sentence column to which the mark shown in FIG. 8D is assigned, the result is as shown in FIG. When a morpheme sequence to which a place mark is assigned is extracted from here, a place name as shown in FIG.

最後に、ステップＳ１０２−５において、地名出力部１４５が、ステップＳ１０２−４で取り出した地名について、ＰＯＩの名称と地名と地名の出現回数とを対応付けた組み合わせデータを出力する。その出力結果を図９に示す。 Finally, in step S102-5, the place name output unit 145 outputs the combination data in which the POI name, the place name, and the number of appearances of the place name are associated with each other for the place name extracted in step S102-4. The output result is shown in FIG.

以上が文書解析部１４の動作である。なお、ステップＳ１０２−２〜Ｓ１０２−４は１つのテキスト文書につき１回実行される。テキスト文書が複数ある場合にはその数に応じて繰り返し実行され、複数の組み合わせデータが出力される。 The above is the operation of the document analysis unit 14. Note that steps S102-2 to S102-4 are executed once for each text document. When there are a plurality of text documents, it is repeatedly executed according to the number of the text documents, and a plurality of combination data is output.

引き続き、前述のステップＳ１０３における地名位置推定部１５の動作を説明する。地名位置推定部１５は、図１０に示すように、ＰＯＩ・地名入力部１５１と、ＰＯＩ・地名統合部１５２と、位置計算部１５３と、位置出力部１５４とを備えて構成される。図１１は、地名位置推定部１５の動作フローを示す図である。 Subsequently, the operation of the place name position estimation unit 15 in step S103 described above will be described. As shown in FIG. 10, the place name position estimation unit 15 includes a POI / place name input unit 151, a POI / place name integration unit 152, a position calculation unit 153, and a position output unit 154. FIG. 11 is a diagram illustrating an operation flow of the place name position estimation unit 15.

最初に、ステップＳ１０３−１において、ＰＯＩ・地名入力部１５１が、ステップＳ１０２−５で出力されたＰＯＩの名称と地名と地名の出現回数とを対応付けた組み合わせデータ（図９参照）の入力を受け付け、更にＰＯＩデータ記憶部１２からＰＯＩの位置情報（図３参照）を読み出した後、組み合わせデータをＰＯＩ・地名統合部１５２に出力し、ＰＯＩの位置情報を位置計算部１５３に出力する。 First, in step S103-1, the POI / place name input unit 151 inputs combination data (see FIG. 9) that associates the POI name, the place name, and the number of appearances of the place name output in step S102-5. After receiving the POI position information (see FIG. 3) from the POI data storage unit 12, the combination data is output to the POI / place name integration unit 152, and the POI position information is output to the position calculation unit 153.

次に、ステップＳ１０３−２において、ＰＯＩ・地名統合部１５２が、組み合わせデータを用いてＰＯＩの名称と地名との対応関係をグラフ化する。具体的には、ＰＯＩの名称や地名で同一のものについては統合し、対応する各ＰＯＩの名称と各地名とを線で結び、その線上に地名の出現回数を付与することにより、複数の組み合わせデータを統合した１つの統合グラフを生成する。その統合グラフの例を図１２に示す。これにより、ある地名に統合グラフ上で結びつくＰＯＩを辿ることで、その地名が抽出された全てのＰＯＩデータと、そのときの地名の出現回数を調べることができる。 Next, in step S103-2, the POI / place name integration unit 152 graphs the correspondence between the POI name and the place name using the combination data. Specifically, POI names and place names that are the same are integrated, and each POI name and each place name are connected by a line, and the number of appearances of the place name is given on the line, so that multiple combinations One integrated graph in which data is integrated is generated. An example of the integrated graph is shown in FIG. Thus, by tracing a POI linked to a certain place name on the integrated graph, all POI data from which the place name is extracted and the number of appearances of the place name at that time can be checked.

次に、ステップＳ１０３−３において、位置計算部１５３が、統合グラフを用いて推定対象の地名に対応するＰＯＩの名称を取得し、地名の出現回数を当該地名に対するＰＯＩデータの重みとし、その重みをＰＯＩの位置からの距離に換算してＰＯＩの位置情報を用いて当該地名の位置を算出する。具体的な計算方法は、本技術の適用先によって様々な方法が考えられる。例えば、ＰＯＩの位置の座標値に重みを掛け合わせたものを平均してもよいし、適切な確率分布とのフィッティングを考えてもよい。 Next, in step S103-3, the position calculation unit 153 obtains the name of the POI corresponding to the place name to be estimated using the integrated graph, and uses the number of appearances of the place name as the weight of the POI data for the place name. Is converted into a distance from the POI position, and the position of the place name is calculated using the POI position information. Various calculation methods can be considered depending on the application destination of the present technology. For example, a value obtained by multiplying the coordinate value of the POI position by a weight may be averaged, or fitting with an appropriate probability distribution may be considered.

最後に、ステップＳ１０３−４において、位置出力部１５４が、ステップＳ１０３−３で算出された地名の位置情報（緯度・経度の値）を地名位置データ記憶部１６へ格納する。推定された位置情報のイメージを図１３に示す。ＰＯＩデータの重みに基づいて算出された各ＰＯＩの位置からの位置に「中華街」の位置が表示されている。 Finally, in step S103-4, the position output unit 154 stores the location information (latitude / longitude values) of the place name calculated in step S103-3 in the place name position data storage unit 16. An image of the estimated position information is shown in FIG. The position of “Chinatown” is displayed at a position from the position of each POI calculated based on the weight of the POI data.

以上が地名位置推定装置１の動作である。以上、本実施の形態に係る地名位置推定装置１について説明した。 The operation of the place name position estimation apparatus 1 has been described above. The place name position estimation apparatus 1 according to the present embodiment has been described above.

以上より、本実施の形態によれば、ＰＯＩの名称とＰＯＩの位置情報とを対応付けた複数のＰＯＩデータを記憶しておき、複数の文書データからＰＯＩの名称を含む文書データを抽出し、抽出された文書データからＰＯＩデータに関連する地名を抽出し、その地名が文書データ内に出現する回数を算出し、算出された地名の出現回数を地名に対するＰＯＩデータの重みとして、ＰＯＩデータの重みとＰＯＩの位置情報とを用いて地名の位置を算出するので、地名の位置を確実に推定できる。 As described above, according to the present embodiment, a plurality of POI data in which a POI name and POI position information are associated is stored, and document data including the POI name is extracted from the plurality of document data. The name of the place related to the POI data is extracted from the extracted document data, the number of times the place name appears in the document data is calculated, and the weight of the POI data is calculated using the calculated number of place names as the weight of the POI data for the place name. Since the position of the place name is calculated using the position information of the POI, the position of the place name can be reliably estimated.

つまり、ＰＯＩデータやその重要度を考慮しながら地名を１つの位置に変換するため、地名がどのＰＯＩと関連しており、かつ、それぞれの関連の重要度がどれくらい大きいかという情報をテキスト文書から得ることにより、単純にＰＯＩの位置から地名の位置を計算する場合や予め用意された辞書データを用いる場合よりも一般的に想起される１つの位置に変換でき、辞書データにない未知の地名を変換することができる。 In other words, in order to convert the place name into one position in consideration of the POI data and its importance, information on which POI is associated with which place name and how important each relation is from the text document. By obtaining, it is possible to convert the position of the place name simply from the position of the POI, or to convert it to one position generally recalled than when using dictionary data prepared in advance. Can be converted.

最後に、本実施の形態で説明した地名位置推定装置１は、メモリやＣＰＵを備えたコンピュータにより実現できる。また、地名位置推定装置１の各動作をプログラムとして構築し、コンピュータにインストールして実行させることや、通信ネットワークを介して流通させることも可能である。 Finally, the place name position estimation device 1 described in the present embodiment can be realized by a computer having a memory and a CPU. Moreover, each operation | movement of the place name position estimation apparatus 1 can be constructed | assembled as a program, can be installed and executed in a computer, or can be distributed via a communication network.

１…地名位置推定装置
１１…文書データ記憶部
１２…ＰＯＩデータ記憶部
１３…ＰＯＩ関連文書抽出部
１４…文書解析部
１４１…文書入力部
１４２…形態素解析部
１４３…固有表現抽出部
１４４…地名抽出部
１４５…地名出力部
１５…地名位置推定部
１５１…ＰＯＩ・地名入力部
１５２…ＰＯＩ・地名統合部
１５３…位置計算部
１５４…位置出力部
１６…地名位置データ記憶部
Ｓ１０１〜Ｓ１０３、Ｓ１０２−１〜Ｓ１０２−５、Ｓ１０３−１〜Ｓ１０３−４…ステップ DESCRIPTION OF SYMBOLS 1 ... Place name position estimation apparatus 11 ... Document data storage part 12 ... POI data storage part 13 ... POI related document extraction part 14 ... Document analysis part 141 ... Document input part 142 ... Morphological analysis part 143 ... Specific expression extraction part 144 ... Place name extraction Part 145 ... Place name output part 15 ... Place name position estimation part 151 ... POI / place name input part 152 ... POI / place name integration part 153 ... Position calculation part 154 ... Position output part 16 ... Place name position data storage part S101 to S103, S102-1 ~ S102-5, S103-1 to S103-4 ... step

Claims

By computer
A storage step of storing a plurality of point data in which a name of a predetermined point and position information of the point are associated with each other in a storage unit;
An extraction step of reading the name of the point from the storage means and extracting document data including the name of the point from a plurality of document data;
An analysis step of extracting a place name related to the point from the extracted document data, and calculating a frequency at which the place name appears in the document data;
An estimation step of calculating the frequency of appearance of the place name as the importance of the point with respect to the place name, and calculating the position of the place name using the importance of the point and the position information of the point;
A place name position estimation method characterized by comprising:

In the analysis step, a plurality of combination data in which the name of the point, the place name, and the appearance frequency of the place name are associated are output,
In the estimating step,
An integrated graph that combines the plurality of combination data and associates the name of the corresponding point and the name of the place is generated, the name of the point corresponding to the place name is acquired using the integrated graph, the importance of the point and the The place name position estimation method according to claim 1, wherein the position of the place name is calculated using position information of the point.

In the estimating step,
The place name position estimation method according to claim 1 or 2, wherein the place name position is calculated by converting the importance of the place into a distance from the point.

Storage means for storing a plurality of point data in which a name of a predetermined point is associated with position information of the point;
An extraction unit that reads the name of the point from the storage unit and extracts document data including the name of the point from a plurality of document data;
An analysis means for extracting a place name related to the point from the extracted document data, and calculating a frequency at which the place name appears in the document data;
Estimating means for calculating the frequency of appearance of the place name as the importance of the point with respect to the place name, and calculating the position of the place name using the importance of the point and the position information of the point;
A place name position estimation device characterized by comprising:

The analysis means outputs a plurality of combination data that associates the name of the point, the place name, and the appearance frequency of the place name,
The estimation means includes
An integrated graph that combines the plurality of combination data and associates the name of the corresponding point and the name of the place is generated, the name of the point corresponding to the place name is acquired using the integrated graph, the importance of the point and the 5. The place name position estimation apparatus according to claim 4, wherein the place name position is calculated using the position information of the point.

The estimation means includes
6. The place name position estimation apparatus according to claim 4, wherein the place name position is calculated by converting the importance of the place into a distance from the place.

A place name position estimation program that causes a computer to execute the place name position estimation method according to claim 1.