JP5068605B2

JP5068605B2 - Database generation method, database generation device, and computer program

Info

Publication number: JP5068605B2
Application number: JP2007221576A
Authority: JP
Inventors: 正喜奥
Original assignee: Zenrin Datacom Co Ltd
Current assignee: Zenrin Datacom Co Ltd
Priority date: 2007-08-28
Filing date: 2007-08-28
Publication date: 2012-11-07
Anticipated expiration: 2027-08-28
Also published as: JP2009054036A

Description

この発明は、文書データが表す情報の質を評価する技術に関する。 The present invention relates to a technique for evaluating the quality of information represented by document data.

インターネットの利用が一般に普及したことにより、商品やサービスを提供する様々な店舗に関して、一般の消費者が評価を行い、その評価の情報をインターネット上で公開するようになっている。しかし、それらインターネット上で公開されている情報には、第三者の役には立たない不正確なものも含まれる。このため、インターネット上で公開されている情報の中から高精度な情報を抽出することは容易ではない。 Due to the widespread use of the Internet, general consumers evaluate various stores that provide products and services, and the evaluation information is released on the Internet. However, the information published on the Internet includes inaccurate information that is not useful to third parties. For this reason, it is not easy to extract highly accurate information from information published on the Internet.

ある従来技術においては、ホームページ１０３から主に記述されている情報に関する住所、地名、建物名、企業名、団体名もしくはそれらを導出する語句を抽出し、複数の地図データベース１０８、情報データベース１０９内におけるその情報を検索する。そして、最良の候補を求め、ホームページ１０３と地図上の情報の対を作成する。また、最良と選択された地図上の上から緯度経度もしくは地図特有の直角座標系の座標値を求め、ホームページ１０３と地図上の情報と座標値の組合わせを作成し、座標付きホームページ情報データベース１１０へ転送する。 In a certain prior art, an address, a place name, a building name, a company name, an organization name or a phrase for deriving them are extracted from information mainly described from the home page 103, and a plurality of map databases 108 and information databases 109 are extracted. Search for that information. Then, the best candidate is obtained and a pair of information on the home page 103 and the map is created. Further, the latitude / longitude or the coordinate value of the rectangular coordinate system peculiar to the map is obtained from the top selected map, a combination of the home page 103, the information on the map and the coordinate value is created, and the coordinated home page information database 110 is obtained. Forward to.

この技術においては、転送されたホームページ１０３を、デザイン、記述されている情報の公共性及び知名度、情報更新頻度及びホームページ作成者もしくは発信者の本人性及び信頼度で評価する。記述されている情報に関する評価は、一般品詞辞書及び固有名詞辞書１１２、各種の情報データベース１０９も参照して、いくつの情報源にその情報が紹介されているかという情報の出現頻度及び項目の大きさ等による公共性及び知名度の判断、ホームページファイルの作成時間の継続的監視によって求める。ホームページ作成者もしくは発信者の本人性及び信頼度に関する評価は、ホームページ１０３中での情報の記述方法、地図データベース１０８中及び各種の情報データベース１０９中での存在の有無、記述情報に関する検索サービス提供装置からの検索結果をそれぞれ得点化し、デザイン評価における得点とを足し合わせて評価する。 In this technique, the transferred homepage 103 is evaluated based on the publicity and name recognition of information described and described, the information update frequency, and the identity and reliability of the homepage creator or sender. For the evaluation of the information described, the general part-of-speech dictionary and proper noun dictionary 112 and various information databases 109 are referred to, and the frequency of appearance of information and the size of items indicating how many information sources the information is introduced. It is obtained by judging the public nature and the publicity by etc. and continuously monitoring the creation time of the homepage file. Evaluation of the identity and reliability of the creator or sender of the homepage is based on the description method of information in the homepage 103, the presence / absence of presence in the map database 108 and various information databases 109, and a search service providing device for description information Each search result from is scored, and the score in the design evaluation is added and evaluated.

特開２０００−３３９３３０号公報JP 2000-339330 A

しかし、上記の技術においては、ホームページの評価は、ホームページ上の情報の、他の既存のデータベース中における存在の有無、存在頻度、項目の大きさ等により行っている。このため、たとえばブログ（Weblog）などに記載されているような、一般の消費者が自ら店舗におもむいて収集した最新の情報や、独自の用語や表現で記載された記事については、有意義な情報を含んでいても、高評価が得られない。このような問題は、文書が表す情報の質を評価する際に、広く生じうる。 However, in the above technique, the evaluation of the home page is performed based on the presence / absence of information on the home page in other existing databases, the presence frequency, the size of items, and the like. For this reason, for example, the latest information collected by general consumers at stores, such as those described in blogs (Weblogs), and articles written in original terms and expressions are meaningful information. Even if it contains, high evaluation cannot be obtained. Such problems can occur widely when evaluating the quality of information represented by a document.

本発明は、上記の問題の少なくとも一部を取り扱うものであり、複数の文書の中から信頼性の高い文書を選択することによって、信頼性の高い情報を提供できるようにすることを目的とする。 The present invention addresses at least a part of the above-described problems, and an object thereof is to provide highly reliable information by selecting a highly reliable document from a plurality of documents. .

上記目的を達成するために、本発明は、複数の文書データに基づいてデータベースを生成する際に、以下のような処理を行う。
（ａ）住所を表す文字列である住所文字列と、店舗に関する記述である記事文字列と、を含む文書データを含む複数の文書データを準備する。
（ｂ）住所文字列のデータを解析して、住所文字列が表す住所の詳細さを表す詳細レベルデータを生成する。
（ｃ）詳細レベルデータに基づいて、複数の文書データの中から住所文字列を含む少なくとも一つの文書データを選択する。
（ｄ）選択された文書データに基づいて、店舗に関するデータベースを生成する。 In order to achieve the above object, the present invention performs the following processing when generating a database based on a plurality of document data.
(A) A plurality of document data including document data including an address character string that is a character string representing an address and an article character string that is a description relating to a store is prepared.
(B) Analyzing the data of the address character string to generate detail level data representing the details of the address represented by the address character string.
(C) Based on the detail level data, at least one document data including an address character string is selected from a plurality of document data.
(D) A database relating to the store is generated based on the selected document data.

文書中で住所を詳細に記述する作者は、文書中の他の記述についても正確かつ客観的に行っていると推定できる。よって、上記のような態様とすれば、複数の文書の中から信頼性の高い文書を選択し、その信頼性の高い文書に基づいて、信頼性の高い情報を有するデータベースを生成することができる。 It can be presumed that the author who describes the address in detail in the document accurately and objectively makes other descriptions in the document. Therefore, if it is set as the above aspects, a highly reliable document can be selected from a plurality of documents, and a database having highly reliable information can be generated based on the highly reliable document. .

なお、工程（ａ）〜（ｄ）は、コンピュータによって実行されることができる。たとえば、工程（ａ）は、第１のコンピュータが、第１のコンピュータとネットワークで接続された第２のコンピュータ（サーバ）が格納する原文書データから文字列のデータを取得することによって実行されることができる。そして、工程（ｂ）は、第１のコンピュータが詳細レベルデータを生成し、所定の第１のメモリに格納することによって実行されることができる。さらに、工程（ｄ）は、第１のコンピュータがデータベースを生成し、所定の第２のメモリに格納することによって実行されることができる。なお、第１と第２のメモリは、第１のコンピュータ内に設けられていてもよく、第１のコンピュータにネットワークを介して接続されている他の装置内に設けられていてもよい。 Steps (a) to (d) can be executed by a computer. For example, the step (a) is executed when the first computer acquires character string data from original document data stored in a second computer (server) connected to the first computer via a network. be able to. Then, the step (b) can be executed by the first computer generating detailed level data and storing it in a predetermined first memory. Furthermore, the step (d) can be executed by the first computer generating a database and storing it in a predetermined second memory. The first and second memories may be provided in the first computer, or may be provided in another device connected to the first computer via a network.

なお、複数の文書データの中から住所文字列を含む少なくとも一つの文書データを選択する際には、少なくとも一つの文書データの一つとして、もっとも詳細な住所文字列を含む文書データを選択することが好ましい。 When selecting at least one document data including an address character string from a plurality of document data, selecting the document data including the most detailed address character string as one of the at least one document data. Is preferred.

上記（ａ）において、複数の文書データを準備する際には、以下のような処理を行うことが好ましい。すなわち、所定の契約をしたユーザに対して、文字列を含みユーザが作成した原文書データを、ネットワークを介して公開するサービスを提供するサービス提供者のサーバに、ネットワークを介してアクセスして、原文書データが含む前記文字列に基づいて前記文書データを生成する。 In the above (a), when preparing a plurality of document data, it is preferable to perform the following processing. That is, for a user who has made a predetermined contract, access the server of a service provider that provides a service for publishing original document data including a character string created by the user via the network via the network, The document data is generated based on the character string included in the original document data.

このような態様とすれば、店舗と利害関係のないユーザによる店舗の記述を利用可能なデータベースを作成することができる。 If it is such an aspect, the database which can utilize the description of the store by the user who does not have a stake in a store can be created.

上記（ｄ）において、店舗に関するデータベースを生成する際には、以下のような処理を行うことが好ましい。
（ｄ１）住所文字列が表す地点の緯度および経度を特定する。
（ｄ２）緯度および経度と、選択された文書データの前記住所文字列および前記記事文字列の取得先の情報と、を互いに関連づけて含むデータベースを作成する。 In the above (d), it is preferable to perform the following processing when generating a database relating to a store.
(D1) The latitude and longitude of the point represented by the address character string are specified.
(D2) A database including the latitude and longitude and the information on the acquisition destination of the address character string and the article character string of the selected document data is created.

このような態様とすれば、店舗に関する記述と、店舗の場所と、の両方の情報を容易に入手できるデータベースを作成することができる。 If it is such an aspect, the database which can obtain easily both the description regarding a store and the location of a store can be created.

なお、さらに、データベースを利用したサービスを提供する際には、以下のような処理を行うことが好ましい。
（ｅ１）所定の領域の地図を表す地図データを準備する。
（ｅ２）地図データに基づく地図を表示する。
（ｅ３）地図上の、住所文字列が表す地点の緯度および経度に相当する位置に、所定のマークを表示する。
（ｅ４）ユーザからのマークの指定を受け取って、文書データの前記住所文字列および前記記事文字列の取得先を表すデータを提供する。 Furthermore, when providing a service using a database, it is preferable to perform the following processing.
(E1) Map data representing a map of a predetermined area is prepared.
(E2) A map based on the map data is displayed.
(E3) A predetermined mark is displayed at a position corresponding to the latitude and longitude of the point represented by the address character string on the map.
(E4) Receives designation of a mark from the user, and provides data representing an acquisition destination of the address character string and the article character string of the document data.

上記（ｂ）において、住所文字列が表す住所の詳細さを表す詳細レベルデータを生成する際には、以下のような処理を行うことが好ましい。
（ｂ１）文書データが含む文字列を、それぞれ１以上の文字からなる複数の語に分解する。
（ｂ２）文書データが含む語を順に検討し、文書データから住所文字列を抽出する。 In the above (b), when generating the detail level data representing the details of the address represented by the address character string, it is preferable to perform the following processing.
(B1) The character string included in the document data is decomposed into a plurality of words each composed of one or more characters.
(B2) The words included in the document data are examined in order, and an address character string is extracted from the document data.

そして、上記（ｂ２）において、住所文字列を抽出する際には、以下のような処理を行うことが好ましい。
（ｂ３）文書データが含む第１の語が、地域を表す語であることを含む所定の第１の条件を満たす場合に、第１の語を住所バッファの先頭に格納する。
（ｂ４）住所バッファに格納された語に続く第２の語が、所定の第２の条件を満たす場合に、第２の語を住所バッファに追加的に格納する。
（ｂ５）第２の語が、第２の条件とは異なる所定の第３の条件を満たす場合に、住所バッファ内に格納された文字列を、住所文字列として抽出する。 And in said (b2), when extracting an address character string, it is preferable to perform the following processes.
(B3) When the first word included in the document data satisfies a predetermined first condition including that it is a word representing an area, the first word is stored at the head of the address buffer.
(B4) If the second word following the word stored in the address buffer satisfies a predetermined second condition, the second word is additionally stored in the address buffer.
(B5) When the second word satisfies a predetermined third condition different from the second condition, the character string stored in the address buffer is extracted as an address character string.

このような態様とすれば、個々の語を解析することによって、文書データから住所文字列を抽出することができる。なお、住所バッファ内の文字列を住所文字列として抽出する際の住所バッファは、「直前に語が格納された住所バッファ」とすることが好ましい。 With such an embodiment, an address character string can be extracted from document data by analyzing individual words. In addition, it is preferable that the address buffer when the character string in the address buffer is extracted as the address character string is “an address buffer in which a word is stored immediately before”.

なお、上記（ｂ５）において、住所バッファ内に格納された文字列を、住所文字列として抽出する際には、以下のような処理を行うことが好ましい。
第３の条件が満たされた場合として、
第２の語が、
（下位条件１）地域を表す語であること、
（下位条件２）「東」、「西」、「南」、「北」、「大字」のいずれかの文字を含むこと、
（下位条件３）数を表す語であること、
（下位条件４）「字」の文字を含むこと、
のいずれにも該当しない場合に、住所バッファ内の文字列を住所文字列として抽出する。 In (b5) above, when the character string stored in the address buffer is extracted as an address character string, it is preferable to perform the following processing.
If the third condition is met,
The second word is
(Sub-condition 1) A word representing a region,
(Sub-condition 2) include any of the characters “east”, “west”, “south”, “north”, “large”,
(Sub-condition 3) A word representing a number,
(Sub-condition 4) Including the character “character”,
If none of the above applies, the character string in the address buffer is extracted as an address character string.

このような態様とすれば、「丁目」レベル以下の情報を含まない住所文字列を、住所文字列として抽出することができる。 With such an embodiment, an address character string that does not include information below the “chome” level can be extracted as an address character string.

また、上記（ｂ５）において、住所バッファ内に格納された文字列を、住所文字列として抽出する際には、以下のような処理を行うことも好ましい。
第３の条件が満たされた場合として、第２の語が、第２の語の直前に上記（ｂ２）において検討された語よりも上位の地域を表す語である場合に、住所バッファ内の文字列を住所文字列として抽出する。 In (b5) above, when the character string stored in the address buffer is extracted as an address character string, the following processing is also preferably performed.
Assuming that the third condition is satisfied, if the second word is a word representing a region higher than the word examined in (b2) immediately before the second word, Extract a string as an address string.

このような態様とすれば、文書データ中に２組の住所文字列が連続して記載されているときに、それらを一体の住所文字列ではなく、２組の住所文字列として処理することができる。 According to such an aspect, when two sets of address character strings are continuously described in the document data, they can be processed as two sets of address character strings instead of a single address character string. it can.

さらに、上記（ｂ５）において、住所バッファ内に格納された文字列を、住所文字列として抽出する際には、以下のような処理を行うことも好ましい。
第３の条件が満たされた場合として、
第２の語が、
（下位条件５）数を表す語であること、
（下位条件６）数に続いて表示され数の単位を表す所定の語であること、
（下位条件７）名詞の後に表示され名詞の属性を表す所定の語であること、
（下位条件８）記号であること、
（下位条件９）名詞であること、
（下位条件１０）数に続いて表示され数の接続を表す所定の語であること、
（下位条件１１）名詞の前および後の少なくとも一方に表示され名詞を修飾する所定の語であること、
のいずれにも該当しない場合に、住所バッファ内の文字列を住所文字列として抽出する。 Furthermore, in the above (b5), when the character string stored in the address buffer is extracted as the address character string, it is also preferable to perform the following processing.
If the third condition is met,
The second word is
(Sub-condition 5) A word representing a number,
(Sub-condition 6) A predetermined word that is displayed after the number and represents the unit of the number,
(Sub-condition 7) It is a predetermined word that is displayed after the noun and represents the attribute of the noun,
(Sub-condition 8) Symbol
(Sub-condition 9) being a noun,
(Sub-condition 10) A predetermined word representing the number of connections displayed after the number,
(Sub-condition 11) It is a predetermined word that is displayed at least one before and after the noun and modifies the noun,
If none of the above applies, the character string in the address buffer is extracted as an address character string.

このような態様とすれば、「丁目」、「番地」または「号」レベルの情報を含む住所文字列を、住所文字列として抽出することができる。 According to such an aspect, an address character string including information on the “chome”, “address”, or “issue” level can be extracted as an address character string.

上記（ｂ４）において、第２の語を住所バッファに追加的に格納する際には、以下のような処理を行うことが好ましい。
第２の条件が満たされた場合として、
第２の語が、
（下位条件１２）地域を表すこと、
（下位条件１３）「東」、「西」、「南」、「北」、「大字」のいずれかの文字を含むこと、
のいずれかを満たす場合に、第２の語を住所バッファに追加的に格納する。 In the above (b4), when the second word is additionally stored in the address buffer, it is preferable to perform the following processing.
If the second condition is met:
The second word is
(Sub-condition 12) Representing a region,
(Sub-condition 13) It must contain any of the characters “East”, “West”, “South”, “North”, “Large”,
If either of the above is satisfied, the second word is additionally stored in the address buffer.

このような態様とすれば、「字」（あざ）のレベルや「丁目」レベルよりも上位の住所を表す文字列を、住所文字列の一部として抽出することができる。 With such an aspect, it is possible to extract a character string representing an address higher than the “character” (bruise) level or the “chome” level as a part of the address character string.

また、上記（ｂ４）において、第２の語を住所バッファに追加的に格納する際には、以下のような処理を行うことも好ましい。
第２の条件が満たされた場合として、
第２の語が、
（下位条件１４）数を表す語であること、
（下位条件１５）「字」の文字を含むこと、
のいずれかを満たす場合に、第２の語を住所バッファに追加的に格納する。 In (b4) above, when the second word is additionally stored in the address buffer, it is also preferable to perform the following processing.
If the second condition is met:
The second word is
(Sub-condition 14) A word representing a number,
(Sub-condition 15) Including the character “character”,
If either of the above is satisfied, the second word is additionally stored in the address buffer.

このような態様とすれば、「字」（あざ）の文字や、「丁目」レベル以下のレベルの住所を表す先頭の数字を、住所文字列の一部として抽出することができる。 With such an aspect, it is possible to extract the first character representing the character of “character” (bruise) or the address of the level below the “chome” level as a part of the address character string.

さらに、上記（ｂ４）において、第２の語を住所バッファに追加的に格納する際には、以下のような処理を行うことも好ましい。
第２の条件が満たされた場合として、
第２の語よりも前に検討された語が、
（下位条件１４）数を表す語であること、
（下位条件１５）「字」の文字を含むこと、
のいずれか満たし、かつ、
第２の語が、
（下位条件５）数を表す語であること、
（下位条件６）数に続いて表示され数の単位を表す所定の語であること、
（下位条件７）名詞の後に表示され名詞の属性を表す所定の語であること、
（下位条件８）記号であること、
（下位条件９）名詞であること、
（下位条件１０）数に続いて表示され数の接続を表す所定の語であること、
（下位条件１１）名詞の前および後の少なくとも一方に表示され名詞を修飾する所定の語であること、
のいずれかを満たす場合に、第２の語を住所バッファに追加的に格納する。 Further, in the above (b4), when the second word is additionally stored in the address buffer, it is also preferable to perform the following processing.
If the second condition is met:
A word that was considered before the second word
(Sub-condition 14) A word representing a number,
(Sub-condition 15) Including the character “character”,
And either
The second word is
(Sub-condition 5) A word representing a number,
(Sub-condition 6) A predetermined word that is displayed after the number and represents the unit of the number,
(Sub-condition 7) It is a predetermined word that is displayed after the noun and represents the attribute of the noun,
(Sub-condition 8) Symbol
(Sub-condition 9) being a noun,
(Sub-condition 10) A predetermined word representing the number of connections displayed after the number,
(Sub-condition 11) It is a predetermined word that is displayed at least one before and after the noun and modifies the noun,
If either of the above is satisfied, the second word is additionally stored in the address buffer.

このような態様とすれば、「丁目」レベル以下の住所を表す文字列を、住所文字列の一部として抽出することができる。 With such an aspect, a character string representing an address below the “chome” level can be extracted as a part of the address character string.

上記（ｂ）において住所文字列が表す住所の詳細さを表す詳細レベルデータを生成する際には、さらに、以下のような処理を行うことが好ましい。
（ｂ７）住所文字列について、住所の一部であり区域を表す所定の語であって、互いに異なる複数の階層の区域に属する複数の所定の語を含むか否かを検討する。 When generating the detail level data indicating the details of the address represented by the address character string in (b) above, it is preferable to further perform the following processing.
(B7) With regard to the address character string, it is examined whether or not the address character string includes a plurality of predetermined words that are a part of the address and represent the area and belong to a plurality of areas having a plurality of different levels.

そして、上記（ｃ）において、複数の文書データの中から住所文字列を含む少なくとも一つの文書データを選択する際には、以下のような処理を行うことが好ましい。
（ｃ１）住所文字列に含まれる所定の語の最も下位の階層に基づいて、文書データを選択する。 In (c) above, when selecting at least one document data including an address character string from a plurality of document data, the following processing is preferably performed.
(C1) Select document data based on the lowest hierarchy of a predetermined word included in the address character string.

このような態様とすれば、住所文字列について客観的に詳細さの評価を行うことができる。そして、住所文字列の詳細さについての客観的な評価に基づいて、内容が正確かつ客観的である可能性が高い文書データを選択することができる。なお、文書データを選択する際には、さらに、住所文字列に含まれる所定の語の数を考慮することも好ましい。 With such an aspect, it is possible to objectively evaluate the details of the address character string. Then, based on an objective evaluation of the details of the address character string, it is possible to select document data whose contents are highly likely to be accurate and objective. In selecting document data, it is also preferable to consider the number of predetermined words included in the address character string.

なお、複数の階層が、町域の階層と街区の階層とを含むことがある。そのような態様において、上記（ｂ７）で、住所文字列が所定の語を含むか否かを検討する際には、以下のような処理を行うことが好ましい。
（ｂ８）住所文字列について、上位の階層の区域を表す語から下位の階層の区域を表す語に向かう順番で、上記の区域を表す所定の語を含むか否かを検討する。 In addition, a some hierarchy may include the hierarchy of a town area and the hierarchy of a block. In such an aspect, in the above (b7), when examining whether or not the address character string includes a predetermined word, it is preferable to perform the following processing.
(B8) With regard to the address character string, it is examined whether or not a predetermined word representing the above-mentioned area is included in the order from the word representing the upper-level area to the word representing the lower-level area.

そして、上記（ｂ８）において、上記の順番で、区域を表す所定の語を含むか否かを検討する際には、以下のような処理を行うことが好ましい。
住所文字列中の、全角または半角のマイナスまたはハイフンで結ばれた数については、
全角または半角のマイナスまたはハイフンで結ばれた数の個数、ならびに
それまでに検出された町域の階層に含まれる語の有無、およびそれまでに検出された街区の階層に含まれる語の有無
に基づいて、所定の語を含むか否かの評価と同等の評価を行う。 In (b8) above, when examining whether or not a predetermined word representing a zone is included in the above order, the following processing is preferably performed.
For the number of full-width or half-width minus or hyphen in the address string,
The number of full-width or half-width numbers connected by minus or hyphen, whether or not there is a word included in the town level hierarchy detected so far, and whether or not there is a word included in the block hierarchy detected so far Based on this, an evaluation equivalent to the evaluation of whether or not a predetermined word is included is performed.

このような態様とすれば、たとえば、文書データ中において、「丁目」レベル以下の住所が、「○丁目×番地△号」や、「○丁目×−△」や、「○−×−△」と表記されているいずれの場合にも、正確に住所の詳細さを評価することができる。 In this manner, for example, in the document data, an address below the “chome” level is “○ chome × address △ number”, “○ chome × −Δ”, or “◯ −x−Δ”. In any case, it is possible to accurately evaluate the details of the address.

なお、上記（ｃ）において、文書データを選択する際には、さらに、以下のような処理を行うことが好ましい。
（ｃ２）住所文字列が「京都府」または「京都市」の文字列を含む場合であって、かつ、「上る」、「下る」、「東入る」、「西入る」、「南入る」、「北入る」、「上ル」、「下ル」、「東入ル」、「西入ル」、「南入ル」、「北入ル」のいずれかの文字列を含む場合には、
住所文字列が「京都府」および「京都市」の文字列を含まない場合であって、かつ、「丁目」の文字列を含む場合と同等に評価する。 In the above (c), when selecting document data, it is preferable to further perform the following processing.
(C2) The address character string includes the character string “Kyoto Prefecture” or “Kyoto City”, and “rises up”, “descends”, “enters east”, “enters west”, “enters south” , "North entry", "Upper", "Lower", "East entry", "West entry", "South entry", "North entry"
Evaluation is performed in the same manner as when the address character string does not include the character strings of “Kyoto Prefecture” and “Kyoto City” and includes the character string of “Chome”.

このような態様とすれば、京都独特の住所表記がされている場合にも、そのような文字列について正確に住所の詳細さの評価を行うことができる。なお、「同等に評価する」とは、より具体的には、詳細さの評価において、同じ評価点を加算する、という態様や、同じフラグを立てる、という態様をとることができる。 With such an aspect, even when an address unique to Kyoto is written, it is possible to accurately evaluate the details of the address for such a character string. More specifically, “evaluate equally” can take the form of adding the same evaluation score or setting the same flag in the evaluation of detail.

なお、以下のような処理を行うことも好ましい。
（ｅ）前記文書データ中から電話番号を取得する。
（ｆ）電話番号と住所文字列とが関連づけられて格納されているデータベースを参照しつつ、前記取得された電話番号と関連づけられた住所文字列が、前記文書データに含まれるか否かを決定する。 It is also preferable to perform the following processing.
(E) A telephone number is acquired from the document data.
(F) Deciding whether or not an address character string associated with the acquired telephone number is included in the document data while referring to a database in which the telephone number and the address character string are stored in association with each other. To do.

そして、複数の文書データの中から少なくとも一つの文書データを選択する際には、以下のような処理を行うことが好ましい。すなわち、前記取得された電話番号と関連づけられた住所文字列を含む文書データを、前記取得された電話番号と関連づけられた住所文字列を含まない文書データに比べて優先的に、前記少なくとも一つの文書データとして選択する。 When selecting at least one document data from a plurality of document data, it is preferable to perform the following processing. That is, the document data including the address character string associated with the acquired telephone number is preferentially compared with the document data not including the address character string associated with the acquired telephone number. Select as document data.

電話番号を正確に記載する作者は、文書中の他の記述についても正確かつ客観的に行っていると推定できる。よって、上記のような態様とすれば、複数の文書の中から信頼性の高い文書を選択し、その信頼性の高い文書に基づいて、信頼性の高い情報を有するデータベースを生成することができる。 It can be presumed that the author who accurately describes the telephone number is accurately and objectively making other descriptions in the document. Therefore, if it is set as the above aspects, a highly reliable document can be selected from a plurality of documents, and a database having highly reliable information can be generated based on the highly reliable document. .

また、前記原文書データが、文字列を含む参照文書データと前記原文書データとを関連づけるためのリンクデータを含む場合には、以下のような処理を行うことも好ましい。
（ｅ）前記リンクデータに基づいて前記参照文書データを参照しつつ、前記参照文書データが含む前記文字列に基づいてリンク先文書データを生成する。
（ｆ）前記文書データが含む前記住所文字列を前記リンク先文書データが含むか否かを決定する。 In addition, when the original document data includes link data for associating reference document data including a character string and the original document data, it is preferable to perform the following processing.
(E) Linked document data is generated based on the character string included in the reference document data while referring to the reference document data based on the link data.
(F) It is determined whether or not the link destination document data includes the address character string included in the document data.

そして、複数の文書データの中から少なくとも一つの文書データを選択する際には、以下のような処理を行うことが好ましい。すなわち、前記リンク先文書データが前記文書データの前記住所文字列を含む文書データを、前記リンク先文書データが前記文書データの前記住所文字列を含まない文書データに比べて優先的に、前記少なくとも一つの文書データとして選択する。 When selecting at least one document data from a plurality of document data, it is preferable to perform the following processing. That is, the link destination document data includes the document data that includes the address character string of the document data, and the link destination document data has priority over the document data that does not include the address character string of the document data. Select as one document data.

文書を作成する際に参照したデータが含む住所文字列を正確に引き写す作者は、文書中の他の記述についても正確かつ客観的に行っていると推定できる。よって、上記のような態様とすれば、複数の文書の中から信頼性の高い文書を選択し、その信頼性の高い文書に基づいて、信頼性の高い情報を有するデータベースを生成することができる。 It can be presumed that the author who accurately copies the address character string included in the data referred to when creating the document also accurately and objectively performs other descriptions in the document. Therefore, if it is set as the above aspects, a highly reliable document can be selected from a plurality of documents, and a database having highly reliable information can be generated based on the highly reliable document. .

なお、「ＡをＢに比べて優先的に選択する。」とは、選択の際に考慮される他の評価パラメータがＡとＢにおいて同じである場合に、ＢではなくＡを選択することを意味する。ある文書データを優先的に選択する方法としては、たとえば、その文書データの詳細レベルデータを、より高い詳細さを表す値に改変して、工程（ｃ）を実行する態様とすることができる。たとえば、詳細さを表す値にさらに所定の値を加えてもよい。また、詳細さを表す値を定数倍してもよい。また、工程（ｃ）を実行するに際して、「取得された電話番号と関連づけられた住所文字列を含むこと」と「リンク先文書データが文書データの住所文字列を含むこと」一方または両方を選択の際の必要条件とする態様とすることもできる。 Note that “A is preferentially selected over B” means that A is selected instead of B when the other evaluation parameters considered in selection are the same in A and B. means. As a method for preferentially selecting a certain document data, for example, it is possible to change the detail level data of the document data to a value representing higher detail and execute the step (c). For example, a predetermined value may be added to the value representing the detail. Further, the value representing the detail may be multiplied by a constant. Further, when executing the step (c), one or both of “include address string associated with the acquired telephone number” and “link destination document data include address string of document data” is selected. It is also possible to adopt a mode that is a necessary condition at the time.

なお、本発明は、一態様として、たとえば以下のような、文書データの内容の正確さを推定または評価する方法として実現することもできる。この方法においては、以下のような処理が行われる。
（ａ）住所を表す文字列である住所文字列と、住所を表す文字列以外の文字からなる記事文字列と、を含む複数の文書データを準備する。
（ｂ）住所文字列のデータを解析して、住所文字列が表す住所の詳細さを検討する。
（ｃ）詳細さの検討結果に基づいて、文書データの内容の正確さを表す評価を決定する。 It should be noted that the present invention can also be implemented as a method for estimating or evaluating the accuracy of the contents of document data, for example, as follows. In this method, the following processing is performed.
(A) A plurality of document data including an address character string that is a character string representing an address and an article character string including characters other than the character string representing an address is prepared.
(B) Analyzing the address character string data and examining the details of the address represented by the address character string.
(C) An evaluation representing the accuracy of the contents of the document data is determined based on the examination result of the details.

なお、「評価を決定する」には、たとえば、以下のような態様が含まれる。（ｉ）評価を表す値を決定すること。（ｉｉ）複数セットの文書データに対して、評価の高い順番を表す順位をそれぞれ決定すること。 The “determining the evaluation” includes, for example, the following modes. (I) determining a value representing the evaluation; (Ii) For each of a plurality of sets of document data, a ranking representing the highest evaluation order is determined.

また、本発明は、一態様として、たとえば以下のような、文書データの選択方法として実現することもできる。この方法においては、以下のような処理が行われる。
（ａ）住所を表す文字列である住所文字列と、住所を表す文字列以外の文字からなる記事文字列と、を含む複数の文書データを準備する。
（ｂ）住所文字列のデータを解析して、住所文字列が表す住所の詳細さを検討する。
（ｃ）詳細さの検討結果に基づいて、複数の文書データの中から住所文字列を含む少なくとも一つの文書データを選択する。 In addition, the present invention can be realized as an aspect of, for example, the following document data selection method. In this method, the following processing is performed.
(A) A plurality of document data including an address character string that is a character string representing an address and an article character string including characters other than the character string representing an address is prepared.
(B) Analyzing the address character string data and examining the details of the address represented by the address character string.
(C) At least one document data including an address character string is selected from a plurality of document data based on the examination result of details.

なお、本発明は、種々の形態で実現することが可能であり、例えば、文書評価方法および文書評価装置、文書選択方法および文書選択装置、文書データ処理方法および文書データ処理装置、データベース提供方法およびデータベース提供装置、それらの方法または装置の機能を実現するためのコンピュータプログラム、そのコンピュータプログラムを記録した記録媒体等の形態で実現することができる。 The present invention can be realized in various forms, for example, a document evaluation method and a document evaluation device, a document selection method and a document selection device, a document data processing method and a document data processing device, a database providing method, and The present invention can be realized in the form of a database providing apparatus, a computer program for realizing the functions of those methods or apparatuses, a recording medium on which the computer program is recorded, and the like.

Ａ．第１実施例：
Ａ１．装置の構成および機能の概要：
図１は、本発明の地図情報提供システムの概略を示す図である。インターネットＩＮＴには、アプリケーションサービスプロバイダのサーバＡＳ１，ＡＳ２が接続されている。アプリケーションサービスプロバイダは、所定の契約をしたユーザに対して、そのユーザが作成した文字列や画像を含むデータをインターネットを介して公開するサービスを提供している。サーバＡＳ１，ＡＳ２には、それらの複数のユーザが作成した文字や画像を含むデータである閲覧用データＰＤ１，ＰＤ２が格納されている。これら閲覧用データＰＤ１，ＰＤ２は、インターネットに接続された他の機器から閲覧することができる。 A. First embodiment:
A1. Overview of device configuration and functions:
FIG. 1 is a diagram showing an outline of a map information providing system of the present invention. Servers AS1 and AS2 of application service providers are connected to the Internet INT. An application service provider provides a service for publishing data including character strings and images created by a user via the Internet to a user who has a predetermined contract. The servers AS1 and AS2 store browsing data PD1 and PD2 that are data including characters and images created by the plurality of users. These browsing data PD1 and PD2 can be browsed from other devices connected to the Internet.

閲覧用データＰＤ１，ＰＤ２は、具体的には、ウェブページやブログのデータである。これらのウェブページやブログには、しばしばユーザが利用したレストランなどの店舗についての評価記事が、その店舗の住所とともに掲載される。閲覧用データＰＤ１，ＰＤ２の少なくとも一部のページのデータは、住所を表す文字列である住所文字列と、住所文字列以外の文字からなる記事文字列と、を含む文字、画像（静止画、動画を含む）、ならびに音声のデータを含む。 Specifically, the browsing data PD1 and PD2 are web page and blog data. On these web pages and blogs, evaluation articles about stores such as restaurants used by users are often posted together with the addresses of the stores. The data of at least some pages of the browsing data PD1 and PD2 includes characters, images (still images, images) including an address character string that is a character string representing an address and an article character string including characters other than the address character string. Including video), as well as audio data.

また、インターネットＩＮＴには、地図アプリケーションサーバ１００が接続されている。地図アプリケーションサーバ１００は、ＣＰＵ１１０、メモリ１２０等の構成を備えている。地図アプリケーションサーバ１００は、サーバＡＳ１，ＡＳ２中の閲覧用データＰＤ１，ＰＤ２のうち、住所の文字列を含むページのデータのｕｒｌ（Uniform Resource Locator）と、地図情報とを関連づけてデータベースＤＢ（以下「記事リンク地図データベースＭＤＢ」と呼ぶ）を生成し、インターネットを介して、有料または無料で閲覧可能とする。 A map application server 100 is connected to the Internet INT. The map application server 100 includes a CPU 110, a memory 120, and the like. The map application server 100 associates the url (Uniform Resource Locator) of the page data including the address character string among the browsing data PD1 and PD2 in the servers AS1 and AS2 with the database DB (hereinafter “ Article link map database MDB ") is generated, and can be viewed for a fee or free of charge via the Internet.

さらに、インターネットＩＮＴには、クライアント２００が接続されている。クライアント２００には、出力装置としての液晶ディスプレイ２１０、ならびに入力装置としてのキーボード２２０およびマウス２３０が接続されている。クライアント２００は、インターネットＩＮＴを介して、地図アプリケーションサーバ１００の記事リンク地図データベースＭＤＢを利用することができる。 Furthermore, a client 200 is connected to the Internet INT. The client 200 is connected to a liquid crystal display 210 as an output device, and a keyboard 220 and a mouse 230 as input devices. The client 200 can use the article link map database MDB of the map application server 100 via the Internet INT.

なお、図１では、例示として、アプリケーションサービスプロバイダのサーバを２台、そして、クライアントを１台、示している。しかし、アプリケーションサービスプロバイダのサーバおよびクライアントは、それぞれインターネットＩＮＴに多数接続されている。 In FIG. 1, as an example, two application service provider servers and one client are shown. However, a large number of application service provider servers and clients are connected to the Internet INT.

図２は、クライアント２００で記事リンク地図データベースＭＤＢを利用する際の、ディスプレイ２１０上の表示の一例を示す図である。画面中央の領域Ａ１０には、クライアント２００を介してユーザが指定した場所の地図が表示される。画面上段の領域Ａ２０には、検索用の入力窓Ａ２１と、検索ボタンＡ２２が表示されている。 FIG. 2 is a diagram illustrating an example of a display on the display 210 when the article link map database MDB is used by the client 200. A map of a location designated by the user via the client 200 is displayed in the area A10 in the center of the screen. In the upper area A20, a search input window A21 and a search button A22 are displayed.

ユーザが、記事リンク地図データベースＭＤＢを利用する際には、以下のような処理が行われる。すなわち、ユーザは、マウス２３０およびキーボード２２０を使用して入力窓Ａ２１に所定の文字列を入力し、マウス２３０を使用して検索ボタンＡ２２をクリックする。すると、地図アプリケーションサーバ１００は、サーバＡＳ１，ＡＳ２の閲覧用データＰＤ１，ＰＤ２から収集した文書データから、該当する文字列を含むデータを検索する。そして、閲覧用データＰＤ１，ＰＤ２から収集した文書データ中に、それらの文字列を含むデータが存在する場合には、地図アプリケーションサーバ１００は、それらの文字列を含むデータ（以下、該当文書データという）の存在をクライアント２００のディスプレイ２１０上に表示する。 When the user uses the article link map database MDB, the following processing is performed. That is, the user inputs a predetermined character string in the input window A21 using the mouse 230 and the keyboard 220, and clicks the search button A22 using the mouse 230. Then, the map application server 100 retrieves data including the corresponding character string from the document data collected from the browsing data PD1 and PD2 of the servers AS1 and AS2. If the document data collected from the browsing data PD1 and PD2 includes data including those character strings, the map application server 100 uses the data including these character strings (hereinafter referred to as corresponding document data). ) Is displayed on the display 210 of the client 200.

具体的には、地図アプリケーションサーバ１００は、ディスプレイ２１０上の画面右側の領域Ａ３０に、該当文書データの一部を表示する。また、領域Ａ１０には、該当文書データ中に存在する住所の文字列が表す地点を、所定のマークで表示する。 Specifically, the map application server 100 displays a part of the corresponding document data in the area A30 on the right side of the screen on the display 210. In the area A10, a point represented by a character string of an address existing in the document data is displayed with a predetermined mark.

図２の例では、「イタリア料理」をいう文字列が入力窓Ａ２１に入力されている。そして、「イタリア料理」をいう文字列を含むブログやホームページの記事から生成された該当文書データの一部が、画面右側の領域Ａ３０に列挙されている。そして、画面中央の領域Ａ１０には、それらのブログの記事中の住所が表す地点が、吹き出しＢＬで表されている。なお、営利目的の広告であると思われる記事については、「ＰＲ」の文字を付した吹き出しが付される。ユーザは、領域Ａ３０内の記事のリスト、または領域Ａ１０内の吹き出しＢＬをクリックすることで、それらのブログやホームページにジャンプすることができる。 In the example of FIG. 2, a character string “Italian cuisine” is input to the input window A21. A part of relevant document data generated from articles on a blog or homepage including a character string “Italian cuisine” is listed in an area A30 on the right side of the screen. And in the area A10 in the center of the screen, the point indicated by the address in the articles of those blogs is represented by a balloon BL. Note that articles that appear to be commercial advertisements are accompanied by a balloon with the letters “PR”. The user can jump to those blogs and homepages by clicking on a list of articles in the area A30 or a balloon BL in the area A10.

このような態様によれば、ユーザは、入力窓Ａ２１から所定の言葉、たとえば「イタリア料理」、「おいしい」、「おすすめ」、などを入力することで、ブログやホームページでそれらの言葉を使って言及された店舗について、簡単な手順で地図上の場所を特定することができる。 According to such an aspect, the user inputs predetermined words such as “Italian food”, “delicious”, “recommended”, etc. from the input window A21, and uses those words on a blog or a homepage. A location on the map can be specified for the mentioned store with a simple procedure.

また、ユーザは、バルーンに基づいて、地図に表示された所定のエリア内から、ブログやホームページで言及された店舗を容易に発見することができる。そして、それらのブログやホームページにジャンプすることで、その店舗の評価を読むことができる。 Further, the user can easily find a store mentioned in a blog or a homepage from a predetermined area displayed on the map based on the balloon. And you can read the evaluation of the store by jumping to those blogs and homepages.

以下では、地図アプリケーションサーバ１００が、閲覧用データＰＤ１，ＰＤ２から、住所の文字列を含む文書データを抽出し、さらにその中から文書データを選択して、緯度および経度の情報と関連づけて記事リンク地図データベースＭＤＢを生成する処理について、詳細に説明する。 In the following, the map application server 100 extracts document data including an address character string from the browsing data PD1 and PD2, selects document data from the document data, and links the article data with the latitude and longitude information. The process for generating the map database MDB will be described in detail.

Ａ２．地図アプリケーションサーバの動作：
図３は、地図アプリケーションサーバの処理を示すフローチャートである。ステップＳ１０では、インターネットＩＮＴに接続されたサーバのうち、あらかじめ定められたサーバＡＳ１，ＡＳ２等から文書データを取得する。ステップＳ１０では、たとえば、サーバＡＳ１の閲覧用データＰＤ１のあるページから、ｈｔｍｌ（HyperText Markup Language）データを収集し、その中から文字のデータのみを抽出して、文書データを生成する。一つの文書データは、閲覧用データＰＤ１，ＰＤ２等をクライアント２００等で閲覧する際に同時にディスプレイ２１０に表示される１ページ分の文字のデータからなる。また、ステップＳ１０では、文書データのもととなったページ（以下「記事ページ」ともいう）のｕｒｌを取得し、メモリ１２０に記憶する。ステップＳ１０の処理を行う地図アプリケーションサーバ１００のＣＰＵ１１０の機能部を、文書データ取得部１１１として図１に示す。 A2. Map application server behavior:
FIG. 3 is a flowchart showing the processing of the map application server. In step S10, document data is acquired from predetermined servers AS1, AS2, etc., among servers connected to the Internet INT. In step S10, for example, html (HyperText Markup Language) data is collected from a page with the browsing data PD1 of the server AS1, and only text data is extracted from the data to generate document data. One document data consists of character data for one page displayed on the display 210 at the same time when the browsing data PD1, PD2, etc. are browsed by the client 200 or the like. In step S 10, the url of the page that is the basis of the document data (hereinafter also referred to as “article page”) is acquired and stored in the memory 120. A functional unit of the CPU 110 of the map application server 100 that performs the process of step S10 is shown in FIG.

なお、ステップＳ１０で準備される文書データには、住所の文字列を含まない文書データや、店舗の評価を含まない文書データも含まれる。しかし、図３のステップＳ１０〜Ｓ６０を繰り返し実行し、閲覧用データＰＤ１，ＰＤ２から文字データを取得して複数の文書データを生成することで、そのうちの少なくとも一部の文書データは、住所を表す文字列である住所文字列と、店舗の紹介や評価などの店舗に関する記述である記事文字列と、を含む文書データとなる。 The document data prepared in step S10 includes document data that does not include the address character string and document data that does not include the store evaluation. However, by repeatedly executing Steps S10 to S60 in FIG. 3 to obtain character data from the browsing data PD1 and PD2 and generate a plurality of document data, at least some of the document data represents an address. The document data includes an address character string that is a character string and an article character string that is a description of the store such as introduction and evaluation of the store.

ステップＳ２０では、文書データの文を語に分解する。なお、ここでいう「語」とは、日本語の文法上の「単語」ではなく、文書を解析するために便宜的に定められた、１以上の文字からなる文字群である。本明細書では、この文字群を、便宜的に「語」と呼ぶ。たとえば、「東神奈川」という文字列は、日本語の文法上は一つの固有名詞であるが、ステップＳ２０の処理では「東」という語と「神奈川」という語に分けられる。 In step S20, the sentence of the document data is decomposed into words. The “word” here is not a “word” in Japanese grammar, but a character group composed of one or more characters defined for the purpose of analyzing a document. In this specification, this character group is called a "word" for convenience. For example, the character string “Higashi Kanagawa” is one proper noun in Japanese grammar, but is divided into the words “East” and “Kanagawa” in the process of step S20.

また、ステップＳ２０では、各語の品詞を決定する。なお、ここでいう「品詞」も、日本語の文法上の「品詞」ではなく、文書を解析するために便宜的に定められた、語の属性を表す分類である。 In step S20, the part of speech of each word is determined. Note that the “part of speech” here is not a “part of speech” in Japanese grammar, but is a classification representing the attribute of a word that is defined for the purpose of analyzing a document.

図４は、本明細書における「品詞」の例を示す表である。左端の列には、品詞の分類番号を示す。中央の列には、品詞の分類を示す。右端の列には、品詞の例を示す。本明細書における「品詞」は、分類番号と、１以上の階層の分類とを有している。たとえば、分類番号１１は、「名詞−固有名詞−地域−一般」という「品詞」に与えられた分類番号である。「東京」という語は、この「名詞−固有名詞−地域−一般」という「品詞」に分類される。また、「東京」という語には、分類番号１１番に分類される。なお、図４に示す「品詞」の分類は一例である。図３のステップＳ２０の処理は、様々なエンジンによって実行可能であり、「品詞」の分類はそれらエンジンによって異なる場合がある。 FIG. 4 is a table showing an example of “part of speech” in the present specification. The leftmost column shows the part number classification part. The middle column shows the part of speech classification. The rightmost column shows examples of parts of speech. The “part of speech” in this specification has a classification number and one or more classifications. For example, the classification number 11 is a classification number given to “part of speech” of “noun-proprietary noun-region-general”. The word “Tokyo” is classified into “part of speech” of “noun—proper noun—region—general”. Further, the term “Tokyo” is classified into classification number 11. The classification of “part of speech” shown in FIG. 4 is an example. The process of step S20 in FIG. 3 can be executed by various engines, and the classification of “parts of speech” may differ depending on the engines.

図５および図６は、文書データ中の住所の文字列について、ステップＳ２０の処理結果の例を示す図である。なお、図５および図６では、各語に対して品詞の分類番号のみを示す。図５および図６では、文書データ中の住所の文字列の例を示すが、ステップＳ２０における処理は、このような文字列に限らず、文書データのすべての文のすべての語に対して行われる。図３のステップＳ２０で行われる処理を「形態素解析」と呼ぶ。 5 and 6 are diagrams showing examples of processing results of step S20 for address character strings in document data. 5 and 6, only the part-of-speech classification number is shown for each word. 5 and 6 show examples of address character strings in the document data. However, the processing in step S20 is not limited to such character strings, and is performed for all words in all sentences of the document data. Is called. The process performed in step S20 in FIG. 3 is referred to as “morphological analysis”.

図３のステップＳ３０では、ステップＳ２０で語に分解され、それぞれ品詞が対応づけられた文書データから、住所を表す文字列（以下「住所文字列」という）を抽出して、住所リストデータを作成する。 In step S30 in FIG. 3, a character string representing an address (hereinafter referred to as “address character string”) is extracted from the document data that is decomposed into words in step S20 and associated with parts of speech, and address list data is created. To do.

図３のステップＳ４０では、住所リストデータの各住所文字列の詳細度を決定する。より具体的には、詳細度を表す詳細レベルデータが生成される。ステップＳ２０〜Ｓ４０の処理を行う地図アプリケーションサーバ１００のＣＰＵ１１０の機能部を、住所文字列検討部１１２として図１に示す。 In step S40 of FIG. 3, the degree of detail of each address character string in the address list data is determined. More specifically, detail level data representing the degree of detail is generated. A functional unit of the CPU 110 of the map application server 100 that performs the processes of steps S20 to S40 is shown as an address character string review unit 112 in FIG.

ステップＳ４５では、住所リストデータの住所文字列の中に、所定のしきい値以上の詳細度を有する住所文字列が存在するか否かを判断する。所定のしきい値以上の詳細度を有する住所文字列が存在する場合には、処理はステップＳ５０に進む。所定のしきい値以上の詳細度を有する住所文字列が存在しない場合には、処理はステップＳ６５に進む。 In step S45, it is determined whether or not an address character string having a degree of detail equal to or greater than a predetermined threshold exists in the address character string of the address list data. If there is an address character string having a degree of detail equal to or greater than a predetermined threshold, the process proceeds to step S50. If there is no address character string having a degree of detail equal to or greater than a predetermined threshold, the process proceeds to step S65.

ステップＳ４５においては、所定のしきい値以上の詳細度を有する住所文字列を含むと判断された文書データが選択される。所定のしきい値以上の詳細度を有する住所文字列を含む文書データは、この後、ステップＳ６０においてデータベースに組み込まれる。所定のしきい値以上の詳細度を有する住所文字列を含まないと判断された文書データ、ならびに住所文字列を含まないと判断された文書データについては、ステップＳ５０，Ｓ６０の処理は行われないため、そのような文書データはデータベースに組み込まれない。ステップＳ４５の処理を行う地図アプリケーションサーバ１００のＣＰＵ１１０の機能部を、文書データ選択部１１３として図１に示す。 In step S45, document data determined to include an address character string having a degree of detail equal to or greater than a predetermined threshold is selected. Document data including an address character string having a degree of detail greater than or equal to a predetermined threshold is then incorporated into the database in step S60. For document data determined not to include an address character string having a degree of detail equal to or greater than a predetermined threshold, and document data determined not to include an address character string, the processes in steps S50 and S60 are not performed. Therefore, such document data is not incorporated into the database. A functional unit of the CPU 110 of the map application server 100 that performs the process of step S45 is shown as a document data selection unit 113 in FIG.

図３のステップＳ５０では、住所リストデータの中から詳細度の高い住所文字列を選択する。本実施例では、住所リストデータの中で最も詳細度の高い住所文字列が選択される。なお、最も詳細度の高い住所文字列が複数存在する場合には、そのうちで住所リストデータの最初に位置する住所文字列が選択される。 In step S50 of FIG. 3, an address character string with a high degree of detail is selected from the address list data. In this embodiment, the address character string having the highest level of detail is selected from the address list data. When there are a plurality of address character strings having the highest level of detail, the address character string located at the beginning of the address list data is selected.

図３のステップＳ６０では、選択された住所文字列に基づいて、その住所文字列が表す緯度と経度を特定する。この処理を「ジオコーディング」という。 In step S60 of FIG. 3, the latitude and longitude represented by the address character string are specified based on the selected address character string. This process is called “geocoding”.

ステップＳ６０では、さらに、ステップＳ１０で取得した記事ページのｕｒｌと、ステップＳ６０で得た緯度と経度と、を関連づけて、記事リンク地図データベースＭＤＢに追加する。そして、更新された記事リンク地図データベースＭＤＢをメモリ１２０に格納する。なお、記事リンク地図データベースＭＤＢには、さらに、閲覧用データＰＤ１，ＰＤ２から収集した文書データ、ならびに文書データ中の同一の種類の同一のキーワードの出現回数が格納される。 In step S60, the url of the article page acquired in step S10 and the latitude and longitude obtained in step S60 are associated with each other and added to the article link map database MDB. Then, the updated article link map database MDB is stored in the memory 120. The article link map database MDB further stores document data collected from the browsing data PD1 and PD2, and the number of appearances of the same keyword of the same type in the document data.

ここで、「同一の種類」とは、以下のような意味である。すなわち、各語は、ステップＳ２０において、さらに、名詞、動詞、形容詞、形容動詞に分類される。そして、その分類が同じである場合は、「同一の種類」の語であるとされる。 Here, “same type” means as follows. That is, each word is further classified into a noun, a verb, an adjective, and an adjective verb in step S20. And when the classification is the same, it is considered as the word of "the same kind".

ステップＳ６０の処理を行う地図アプリケーションサーバ１００のＣＰＵ１１０の機能部を、データベース生成部１１４として図１に示す。データベース生成部１１４は、記事リンク地図データベースＭＤＢの生成および更新を行う。 A functional unit of the CPU 110 of the map application server 100 that performs the process of step S60 is shown in FIG. The database generation unit 114 generates and updates the article link map database MDB.

図３のステップＳ６５では、あらかじめ定められたすべてのサーバＡＳ１，ＡＳ２等のすべての閲覧用データについて文書データを生成したか否かが判定される。判定結果がＮｏである場合は、処理はステップＳ１０に戻り、あらかじめ定められたサーバＡＳ１，ＡＳ２等の閲覧用データから新たな文書データが準備される。判定結果がＹｅｓである場合は、処理はステップＳ７０に進む。 In step S65 of FIG. 3, it is determined whether or not the document data has been generated for all browsing data of all the servers AS1, AS2, and the like determined in advance. If the determination result is No, the process returns to step S10, and new document data is prepared from the browsing data of the servers AS1, AS2, and the like determined in advance. If the determination result is Yes, the process proceeds to step S70.

図３のステップＳ７０では、ユーザからの要求に応じて、記事リンク地図データベースＭＤＢに基づいて図２に示すような表示を行うための表示データを作成し、クライアント２００に送信する。ステップＳ７０の処理を行う地図アプリケーションサーバ１００のＣＰＵ１１０の機能部を、サービス提供部１１５として図１に示す。 In step S <b> 70 of FIG. 3, display data for display as shown in FIG. 2 is created based on the article link map database MDB and transmitted to the client 200 in response to a request from the user. A functional unit of the CPU 110 of the map application server 100 that performs the process of step S70 is shown as a service providing unit 115 in FIG.

この表示データに基づく表示においては、地図上のステップＳ５０で得た緯度と経度に相当する位置に、バルーンＢＬが表示される（図２参照）。また、領域Ａ３０に、ステップＳ１０で準備した文書データの一部（すなわち、記事ページの一部）が表示される。そして、それらのバルーンＢＬ、および記事ページの一部が表示される位置に、ステップＳ１０で取得した記事ページのｕｒｌへのリンクが埋め込まれている。 In the display based on the display data, the balloon BL is displayed at a position corresponding to the latitude and longitude obtained in step S50 on the map (see FIG. 2). Further, a part of the document data prepared in step S10 (that is, a part of the article page) is displayed in the area A30. A link to the url of the article page acquired in step S10 is embedded at a position where the balloon BL and a part of the article page are displayed.

このような態様とすれば、利害関係のない素人が忌憚のない店舗評価を述べているブログやホームページに基づいて、記事リンク地図データベースＭＤＢを生成することができる（特に、図３のステップＳ１０参照）。よって、第三者にとって有益な記事リンク地図データベースＭＤＢを生成し、提供することができる。 According to such an aspect, the article link map database MDB can be generated based on a blog or a homepage in which an amateur who has no interest has described an unfaithful store evaluation (particularly, see step S10 in FIG. 3). ). Therefore, the article link map database MDB useful for a third party can be generated and provided.

また、ブログやホームページの記事において住所を詳細に記述するユーザは、同時に、記事中で高精度の店舗評価を行っていると推定できる。このため、上記のような態様とすることで（特にステップＳ４５参照）、高精度な店舗評価を行っていると推定できるブログやホームページの文書データを選んで、それらに基づいて記事リンク地図データベースＭＤＢを生成することができる。このため、精度の高い記事リンク地図データベースＭＤＢを生成することができる。 Moreover, it can be estimated that the user who describes the address in detail in the article on the blog or the homepage is performing highly accurate store evaluation in the article at the same time. For this reason, by setting it as the above aspects (refer especially step S45), the document data of the blog and the homepage which can be estimated that the highly accurate store evaluation is performed are selected, and the article link map database MDB is based on them. Can be generated. For this reason, the highly accurate article link map database MDB can be generated.

Ａ３．住所リストデータの生成：
図７および図８は、図３のステップＳ３０の詳細な処理を示すフローチャートである。ステップＳ３１０では、文書データから語を１個、取得する。なお、文書データからの語の取得は、文書データ中の語の並びの順に行われる。よって、最初にステップＳ３１０が実行されるときには、文書データの先頭の語が取得される。 A3. Generation of address list data:
7 and 8 are flowcharts showing detailed processing of step S30 in FIG. In step S310, one word is acquired from the document data. Note that the words are acquired from the document data in the order of the words in the document data. Therefore, when step S310 is executed for the first time, the first word of the document data is acquired.

ステップＳ３１５では、取得した語の品詞が「地域」（分類番号１０〜１２，３０など。図４参照）であり、かつ、品詞が「国」（分類番号１２）でなく、かつ、取得した語がカタカナのみの語でないか、が検討される。 In step S315, the part of speech of the acquired word is “region” (classification numbers 10 to 12, 30, etc., see FIG. 4), the part of speech is not “country” (classification number 12), and the acquired word It is considered whether is not a katakana-only word.

なお、前述のとおり、品詞は１以上の階層を有している（図４参照）。「品詞が「地域」であるか」の判定においては、いずれかの階層に「地域」という分類を有している場合には、「その品詞は「地域」である」と判定される。「国」、「助数詞」など、他の品詞の分類について判定を行う場合も同様である。 As described above, the part of speech has one or more layers (see FIG. 4). In the determination of “whether the part of speech is“ region ””, it is determined that “the part of speech is“ region ”” if any hierarchy has the classification “region”. The same applies to the determination of other part of speech classifications such as “country” and “classifier”.

また、語に対する品詞の割り当ての際に参照されるデータベースにおいては、現実にある地名のうち、「地域」の品詞が割り当てられているのは、市町村レベルの名前までである。そして、たとえば「字」のあとに続く「大沢」など、市町村より下のレベルの地名については、「地域」の品詞は割り当てられておらず、「名詞−一般」が割り当てられている。 In addition, in the database that is referred to when assigning parts of speech to words, the part of speech of “region” among the actual place names is assigned up to the name at the municipal level. For example, “Osawa” that follows “Character” is not assigned the part of speech of “Region” but is assigned “Noun-General”.

文書データのうち、住所を表す語以外の語については、通常、品詞が「地域」ではない。このため、住所を表す語以外の語については、ステップＳ３１５の判断結果はＮｏとなる。また、たとえば、ステップＳ３１０で取得した語が「日本」の場合には、品詞が「国」であるため、ステップＳ３１５の判定結果はＮｏとなる。さらに、ステップＳ３１０で取得した語が「バージニア」の場合は、カタカナのみの語であるため、ステップＳ３１５の判定結果はＮｏとなる。 Of the document data, the part of speech is not “region” for words other than the word representing the address. For this reason, the determination result of step S315 is No for words other than the word representing the address. For example, when the word acquired in step S310 is “Japan”, the part of speech is “country”, and therefore the determination result in step S315 is No. Furthermore, when the word acquired in step S310 is “Virginia”, the determination result in step S315 is “No” because it is only a katakana word.

判定結果がＮｏの場合には、処理はステップＳ３１０に戻る。そして、文書データから次の語が取得される。ステップＳ３１０で取得した語がステップＳ３１５の条件を満たす語となるまで、ステップＳ３１５〜Ｓ３１０の処理が繰り返される。 If the determination result is No, the process returns to step S310. Then, the next word is acquired from the document data. The processes in steps S315 to S310 are repeated until the word acquired in step S310 becomes a word that satisfies the condition in step S315.

一方、たとえば、語が「東京」の場合には、ステップＳ３１５の判定結果はＹｅｓとなる。判定結果がＹｅｓの場合には、処理はステップＳ３２０に進む。 On the other hand, for example, when the word is “Tokyo”, the determination result in step S315 is Yes. If the determination result is Yes, the process proceeds to step S320.

ステップＳ３２０では、ステップＳ３１０で取得した語を、新たな住所バッファＡＢ（図１参照）に格納する。 In step S320, the word acquired in step S310 is stored in a new address buffer AB (see FIG. 1).

なお、地図アプリケーションサーバ１００（住所リスト作成部１１２）は、メモリ１２０に複数の住所バッファＡＢを有している。図７および図８の処理が行われると、１以上の住所バッファＡＢ内に住所と推定される文字列が格納される。 The map application server 100 (address list creation unit 112) has a plurality of address buffers AB in the memory 120. When the processing of FIGS. 7 and 8 is performed, a character string estimated as an address is stored in one or more address buffers AB.

ステップＳ３３０では、文書データから次の語が取得される。ステップＳ３３５では、ステップＳ３３０で取得された語の品詞が「地域」（分類番号１０〜１２，３０など）であるか、または語の文字が「東」、「西」、「南」、「北」、「大字」であるかが検討される。 In step S330, the next word is acquired from the document data. In step S335, the part of speech of the word acquired in step S330 is “region” (classification number 10-12, 30, etc.), or the word characters are “east”, “west”, “south”, “north”. ”Or“ Large ”.

たとえば、ステップＳ３３０で取得された語が「都」、「道」、「府」、「県」、「区」や「神奈川」、「千代田」、「丸の内」の場合は、品詞が「地域」であるので、ステップＳ３３５の判定結果はＹｅｓとなる。また、ステップＳ３３０で取得された語が、前述の「東神奈川」の一部である「東」である場合にも、ステップＳ３３５の判定結果はＹｅｓとなる。判定結果がＹｅｓの場合には、処理はステップＳ３４０に進む。 For example, if the words acquired in step S330 are “Miyako”, “Michi”, “Fuku”, “Prefecture”, “Ku”, “Kanagawa”, “Chiyoda”, “Marunouchi”, the part of speech is “Region”. Therefore, the determination result of step S335 is Yes. Also, if the word acquired in step S330 is “East” which is a part of “Higashi Kanagawa” described above, the determination result in step S335 is Yes. If the determination result is Yes, the process proceeds to step S340.

ステップＳ３４０では、ステップＳ３３０で取得した語を、住所バッファＡＢに追加する。ステップＳ３４０で住所バッファＡＢに格納される語は、ステップＳ３２０で住所バッファＡＢに格納された語に続けてその住所バッファＡＢに格納される。以下、「住所バッファＡＢに追加する」と記述する場合、同様の処理が行われる。 In step S340, the word acquired in step S330 is added to the address buffer AB. The words stored in the address buffer AB in step S340 are stored in the address buffer AB following the words stored in the address buffer AB in step S320. Hereinafter, when “add to address buffer AB” is described, the same processing is performed.

ステップＳ３５０では、文書データから次の語が取得される。その後、処理は、ステップＳ３３５に戻る。すなわち、ステップＳ３３５の条件を満たす語が続く限り、ステップＳ３５０，Ｓ３４０の処理にしたがって、それらの語は順に住所バッファＡＢに格納される。 In step S350, the next word is acquired from the document data. Thereafter, the process returns to step S335. That is, as long as words that satisfy the condition of step S335 continue, these words are sequentially stored in the address buffer AB in accordance with the processing of steps S350 and S340.

一方、ステップＳ３３５の判定結果がＮｏとなった場合には、処理は、ステップＳ３５５に進む。ステップＳ３３５の判定結果がＮｏとなる場合とは、ステップＳ３３０で取得された語の品詞が「地域」ではなく、かつ、語の文字が「東」、「西」、「南」、「北」、「大字」でもない場合である。 On the other hand, when the determination result of step S335 is No, the process proceeds to step S355. When the determination result in step S335 is No, the part of speech of the word acquired in step S330 is not “region”, and the characters of the word are “east”, “west”, “south”, “north”. , Which is not “Large”.

ステップＳ３５５では、ステップＳ３３０またはＳ３５０で取得した語の品詞が「数」（分類番号１９）であるか、または検討対象の語が「字」（あざ）であるかが検討される。 In step S355, it is examined whether the part of speech of the word acquired in step S330 or S350 is “number” (classification number 19) or whether the word to be examined is “character” (bruise).

ステップＳ３５５の判定結果がＹｅｓの場合は、処理はステップＳ３６０に進む。ステップＳ３６０では、検討対象の語（数字または「字」（あざ））を、住所バッファＡＢに追加する。ステップＳ３７０では、文書データから次の語が取得される。 If the determination result of step S355 is Yes, the process proceeds to step S360. In step S360, the word (number or “character” (bruise)) to be examined is added to the address buffer AB. In step S370, the next word is acquired from the document data.

なお、ステップＳ３５５の判定結果がＹｅｓとなったということは、文書データ中で連続する１以上の語であって、かつ地域を表す１以上の語がすべて住所バッファＡＢに格納された後（ステップＳ３３５，Ｓ３４０，Ｓ３５０）、それらに続いて、数字または「字」（あざ）の字が現れたということである。よって、この後に続く文字列は、たとえば「○丁目×番地△号」や「（字）大沢」といった、町域以下の住所を表す文字列であると推定できる。 Note that the determination result in step S355 is “Yes” means that one or more words that are continuous in the document data and one or more words representing the region are all stored in the address buffer AB (step S335, S340, S350), followed by numbers or “characters” (bruises). Therefore, it can be presumed that the character string that follows this is a character string that represents an address below the town area, for example, “○ chome × address Δ number” or “(character) Osawa”.

なお、「町域」とは、住所を表す語であって、「丁目」レベルの地域を表す語である。これに対して、「街区」とは、住所を表す語であって、「番地」レベルの地域を表す語である。「号」とは、住所を表す語であって、「号」レベルの地域を表す語である。 The “town area” is a word representing an address, and is a word representing an area at the “chome” level. On the other hand, the “block” is a word representing an address, and is a word representing an area at the “address” level. “No.” is a word representing an address, and is a word representing an area of “No.” level.

一方、ステップＳ３５５の判定結果がＮｏの場合は、処理はステップＳ３９０に進む。 On the other hand, if the determination result of step S355 is No, the process proceeds to step S390.

なお、ステップＳ３５５の判定結果がＮｏとなったということは、文書データ中で連続する１以上の語であって、かつ地域を表す１以上の語がすべて住所バッファＡＢに格納された後（ステップＳ３３５〜Ｓ３５０）、数字または「字」（あざ）の字が現れなかったということである。このような場合は、この後に続く文字列は、「字◇◇」、「○丁目」、「×番地」、「△号」といった、町域以下の情報を持っていないと推定できる。すなわち、住所文字列は終了したと推定できる。よって、後述するように、ステップＳ３２０で開始された住所バッファＡＢへの文字列の蓄積は終了する（ステップＳ３９０参照）。 Note that the determination result in step S355 is “No” means that one or more words in the document data that are continuous and one or more words representing the area are all stored in the address buffer AB (step S335 to S350), that is, numbers or “characters” (bruises) did not appear. In such a case, it can be estimated that the subsequent character string does not have information below the town area, such as “letters ◇◇”, “○ chome”, “× address”, “△ number”. That is, it can be estimated that the address character string has ended. Therefore, as will be described later, the accumulation of the character string in the address buffer AB started in step S320 ends (see step S390).

図８のステップＳ３７５では、検討対象の語、すなわち最後に取得した語の品詞が、「地域」（分類番号１０〜１２，３０など）であり、かつ、「品詞が「国」（分類番号１２）でなく、かつ、取得した語がカタカナのみの語でない、か否かが検討される。ステップＳ３７５における判定の内容は、ステップＳ３１５における判定の内容と同じである。判定結果がＹｅｓの場合は、処理はステップＳ３８０に進む。 In step S375 of FIG. 8, the part of speech of the word to be examined, that is, the last acquired word is “region” (classification numbers 10 to 12, 30, etc.), and “part of speech is“ country ”(classification number 12). ) And whether the acquired word is not a katakana-only word. The content of the determination in step S375 is the same as the content of the determination in step S315. If the determination result is Yes, the process proceeds to step S380.

なお、ステップＳ３７５の判定結果がＹｅｓとなったということは、以下のような意味を有する。すなわち、すでに文字列中に数字または「字」（あざ）の字が現れ、その後、町域以下の住所を表す文字列が現れると推定されたにもかかわらず（ステップＳ３５５においてＹｅｓ）、数字や「丁目」等ではなく、より広い地域を表す語（たとえば、「東京」）が現れたということである。このような場合としては、たとえば、文字データが複数の住所の列挙を含んでいる場合がある。「丁目」以下の情報を持たない住所の文字列に続いて、次の住所の文字列が並んでいる場合である。 In addition, that the determination result of step S375 became Yes has the following meanings. That is, even though it has been estimated that a character string representing a number or “letter” (bruise) has already appeared in the character string, and then an address below the town area (Yes in step S355), This means that a word representing a wider area (for example, “Tokyo”) has appeared instead of “Chome” or the like. As such a case, for example, the character data may include an enumeration of a plurality of addresses. This is a case where the character string of the next address is arranged next to the character string of the address having no information below “chome”.

ステップＳ３８０では、住所バッファＡＢに格納されている文字列を住所リストデータＡＬ（図１参照）に追加する。住所リストデータＡＬに追加される文字列は、複数の語によって構成される文字列である。その後、処理はステップＳ３２０に戻る（図７および図８の［Ｃ］参照）。すなわち、ステップＳ３２０において、ステップＳ３７５においてＹｅｓと判定された語、すなわち、次の住所の先頭の語であると推定できる語が、新たな住所バッファＡＢに格納される。 In step S380, the character string stored in the address buffer AB is added to the address list data AL (see FIG. 1). The character string added to the address list data AL is a character string composed of a plurality of words. Thereafter, the process returns to step S320 (see [C] in FIGS. 7 and 8). That is, in step S320, the word determined as Yes in step S375, that is, the word that can be estimated to be the first word of the next address is stored in the new address buffer AB.

一方、ステップＳ３７５の判定結果がＮｏの場合は、処理はステップＳ３８５に進む。 On the other hand, if the determination result of step S375 is No, the process proceeds to step S385.

ステップＳ３８５では、検討対象の語の品詞が「数」（分類番号１９）、「助数詞」（分類番号３５）、「接尾−一般」（分類番号２８）、「記号−一般」（分類番号７７）、もしくは「名詞−一般」（分類番号２）のいずれかであるか、または、検討対象の語が、「の」、「−」（全角マイナス）、「-」（半角マイナス）、「‐」（ハイフン）、「Ｂ」、「Ｆ」、「階」、「東」、「西」、「南」、もしくは「北」のいずれかであるかが検討される。 In step S385, the part of speech of the word to be examined is “number” (classification number 19), “classifier” (classification number 35), “suffix-general” (classification number 28), “symbol-general” (classification number 77). Or “noun-general” (classification number 2), or the words to be examined are “no”, “−” (full-width minus), “-” (half-width minus), “-” (Hyphen), “B”, “F”, “floor”, “east”, “west”, “south”, or “north” are considered.

なお、「助数詞」とは、図４に示すように、「丁目」、「番」などの、数に続いて示され、数の単位を表す語である。「Ｂ」、「Ｆ」、「階」も、同様に、数に続いて示され、数の単位を表す語である。 As shown in FIG. 4, “a classifier” is a word indicating a unit of a number, such as “chome”, “number”, etc., which is shown following the number. Similarly, “B”, “F”, and “floor” are words that follow the number and represent the unit of the number.

「接尾−一般」とは、「号」などのように、名詞の後に表示され名詞の属性を表す所定の語である。「記号−一般」とは、記号を表す語である。「名詞−一般」とは、名詞である。「の」、「−」（全角マイナス）、「-」（半角マイナス）、「‐」（ハイフン）は、数に続いて表示され、数の接続を表す語である。「東」、「西」、「南」、「北」は、名詞の前および後の少なくとも一方に表示され名詞を修飾する語である。 “Suffix-general” is a predetermined word that is displayed after a noun and represents an attribute of the noun, such as “No.”. “Symbol-general” is a word representing a symbol. “Noun—general” is a noun. "No", "-" (full-width minus), "-" (half-width minus), and "-" (hyphen) are words that are displayed following the number and represent the connection of the number. “East”, “West”, “South”, and “North” are words that are displayed before and / or after a noun and modify the noun.

ステップＳ３８５の判定結果がＹｅｓの場合は、処理はステップＳ３６０に戻る（図７および図８の［Ｄ］参照）。ステップＳ３６０、Ｓ３７０，Ｓ３７５，Ｓ３８５，Ｓ３６０のループにおいて、町域以下の住所を表す文字列が、順に住所バッファＡＢに蓄積される。 If the determination result of step S385 is Yes, the process returns to step S360 (see [D] in FIGS. 7 and 8). In the loop of steps S360, S370, S375, S385, and S360, character strings representing addresses below the town area are sequentially stored in the address buffer AB.

一方、ステップＳ３８５の判定結果がＮｏの場合は、処理はステップＳ３９０に進む。 On the other hand, if the determination result of step S385 is No, the process proceeds to step S390.

なお、ステップＳ３８５の判定結果がＮｏとなったということは、町域以下の住所を表す文字列が、すべて住所バッファＡＢに蓄積され、住所を表す文字列が終了したと推定できる。 In addition, it can be estimated that the character string showing the address below a town area is accumulate | stored in the address buffer AB, and the character string showing an address was complete | finished that the determination result of step S385 became No.

ステップＳ３９０では、住所バッファＡＢに格納されている文字列を住所リストデータＡＬに追加する。 In step S390, the character string stored in the address buffer AB is added to the address list data AL.

ステップＳ３９５では、文書データ中のすべての語についてステップＳ３１５〜Ｓ３８５の検討を終了したか否かを検討する。判定結果がＮｏである場合には、処理は、ステップＳ３１０に戻る（図７および図８の［Ｅ］参照）。判定結果がＹｅｓである場合には、図７および図８の処理（図３のステップＳ３０の処理）は、終了する。 In step S395, it is examined whether or not the examinations in steps S315 to S385 have been completed for all words in the document data. If the determination result is No, the process returns to step S310 (see [E] in FIGS. 7 and 8). If the determination result is Yes, the processing in FIGS. 7 and 8 (the processing in step S30 in FIG. 3) ends.

図７および図８の処理を行うことで、文字からなる文書データ中からさまざまな詳細さおよび表記方法で記載された住所文字列を抽出することができる。 By performing the processing of FIGS. 7 and 8, it is possible to extract address character strings described in various details and notation methods from document data made up of characters.

Ａ４．住所文字列の詳細さの評価：
図９〜図１１は、図３のステップＳ４０の詳細な処理を示すフローチャートである。図３のステップＳ４０では、住所リストデータＡＬ（図１参照）の住所文字列について詳細度が決定される。図３のステップＳ４０では、図９〜図１１の処理が、各住所文字列について実行される。図９〜図１１の処理では、住所文字列について、住所の一部であり区域を表す所定の語であって、互いに異なる複数の階層の区域に属する所定の語を含むか否かが、順に検討される。 A4. Evaluation of address string details:
9 to 11 are flowcharts showing detailed processing of step S40 of FIG. In step S40 of FIG. 3, the degree of detail is determined for the address character string of the address list data AL (see FIG. 1). In step S40 of FIG. 3, the processes of FIGS. 9 to 11 are executed for each address character string. In the processing of FIGS. 9 to 11, whether or not the address character string includes a predetermined word that is a part of the address and represents a zone and belongs to a plurality of different zones is determined in order. Be considered.

図９のステップＳ４０５では、住所文字列中に「都」、「道」、「府」、「県」のいずれかの文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４１０において都道府県フラグがＯＮにされる。都道府県フラグがＯＮであるということは、住所文字列中に都道府県レベルの住所の情報が含まれていることを表す。 In step S405 in FIG. 9, it is determined whether the address character string includes any of the characters “Miyako”, “Michi”, “Fu”, and “Prefecture”. If the determination result is Yes, the prefecture flag is turned ON in step S410. The fact that the prefecture flag is ON indicates that the address character string includes address information at the prefecture level.

ステップＳ４１０の後、処理はステップＳ４１５に進む。ステップＳ４０５において判定結果がＮｏであった場合も、同様に処理はステップＳ４１５に進む。 After step S410, the process proceeds to step S415. Similarly, when the determination result is No in step S405, the process proceeds to step S415.

なお、図９〜図１１で説明する各フラグは、メモリ１２０内に格納される。また、初期状態において各フラグはＯＦＦである。 Each flag described in FIGS. 9 to 11 is stored in the memory 120. In the initial state, each flag is OFF.

ステップＳ４１５では、住所文字列中に「京都府」の文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４２０において京都フラグがＯＮにされる。京都フラグがＯＮであるということは、住所文字列が表す住所が京都府または京都市の住所であることを表す。京都市の住所は、「四条河原町西入る」等の、他の地域とは異なる独自の表記がなされることがある。 In step S415, it is determined whether or not the character “Kyoto” is included in the address character string. If the determination result is Yes, the Kyoto flag is turned ON in step S420. The fact that the Kyoto flag is ON indicates that the address represented by the address character string is an address of Kyoto Prefecture or Kyoto City. The address of Kyoto City may have a unique notation different from other regions, such as “Shijo Kawaramachi Nishiru”.

ステップＳ４２０の後、処理はステップＳ４２５に進む。ステップＳ４１５において判定結果がＮｏであった場合も、同様に処理はステップＳ４２５に進む。 After step S420, the process proceeds to step S425. If the determination result in step S415 is No, the process similarly proceeds to step S425.

ステップＳ４２５では、住所文字列中に「市」、「町」、「村」のいずれかの文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４３０において市町村フラグがＯＮにされる。市町村フラグがＯＮであるということは、住所文字列中に市町村レベルの住所の情報が含まれていることを表す。 In step S425, it is determined whether any one of the characters “city”, “town”, and “village” is included in the address character string. If the determination result is Yes, the municipality flag is turned ON in step S430. The fact that the municipality flag is ON indicates that the address character string includes address information at the municipality level.

ステップＳ４３０の後、処理はステップＳ４３５に進む。ステップＳ４２５において判定結果がＮｏであった場合も、同様に処理はステップＳ４３５に進む。 After step S430, the process proceeds to step S435. If the determination result in step S425 is No, the process similarly proceeds to step S435.

ステップＳ４３５では、住所文字列中に「京都市」の文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４４０において京都フラグがＯＮにされる。 In step S435, it is determined whether or not the characters “Kyoto City” are included in the address character string. If the determination result is Yes, the Kyoto flag is turned ON in step S440.

ステップＳ４４０の後、処理はステップＳ４４５に進む。ステップＳ４３５において判定結果がＮｏであった場合も、同様に処理はステップＳ４４５に進む。 After step S440, the process proceeds to step S445. If the determination result is No in step S435, the process similarly proceeds to step S445.

ステップＳ４４５では、住所文字列中に「丁目」の文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４５０において町域フラグがＯＮにされる。町域フラグがＯＮであるということは、住所文字列中に、「○丁目×番地△号」のうちの丁目レベルの住所の情報が含まれていることを表す。 In step S445, it is determined whether the address character string includes the character “chome”. If the determination result is Yes, the town area flag is turned ON in step S450. The fact that the town area flag is ON indicates that the address character string includes address information at the chome level of “○ chome × address Δ number”.

ステップＳ４５０の後、処理はステップＳ４５５に進む。ステップＳ４４５において判定結果がＮｏであった場合も、同様に処理はステップＳ４４５に進む。 After step S450, the process proceeds to step S455. Similarly, when the determination result is No in step S445, the process proceeds to step S445.

ステップＳ４５５では、京都フラグがＯＮであるか、が判定される。判定結果がＹｅｓである場合には、処理は図１０のステップＳ５０５に進む。一方、判定結果がＮｏである場合には、処理は図９のステップＳ４６５に進む。 In step S455, it is determined whether the Kyoto flag is ON. If the determination result is Yes, the process proceeds to step S505 in FIG. On the other hand, if the determination result is No, the process proceeds to step S465 in FIG.

ステップＳ４６５では、住所文字列中に「番」の文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４７０において街区フラグがＯＮにされる。街区フラグがＯＮであるということは、住所文字列中に、「○丁目×番地△号」のうちの番地レベルの住所の情報が含まれていることを表す。 In step S465, it is determined whether the address character string includes the character “No.”. If the determination result is Yes, the block flag is turned ON in step S470. The fact that the block flag is ON indicates that the address character string includes the address level address information of “○ chome × address Δ number”.

ステップＳ４７０の後、処理はステップＳ４７５に進む。ステップＳ４６５において判定結果がＮｏであった場合も、同様に処理はステップＳ４７５に進む。 After step S470, the process proceeds to step S475. If the determination result is No in step S465, the process similarly proceeds to step S475.

ステップＳ４７５では、住所文字列中に「号」の文字が含まれるか、が判定される。判定結果がＹｅｓの場合は、ステップＳ４８０において号フラグがＯＮにされる。号フラグがＯＮであるということは、住所文字列中に、「○丁目×番地△号」のうちの号レベルの住所の情報が含まれていることを表す。 In step S475, it is determined whether the address character string includes the character “No.”. If the determination result is Yes, the number flag is turned ON in step S480. The fact that the number flag is ON indicates that the address character string includes the information of the address at the level of “○ chome × address Δ number”.

ステップＳ４８０の後、処理はステップＳ４８５に進む。ステップＳ４７５において判定結果がＮｏであった場合も、同様に処理はステップＳ４８５に進む。 After step S480, the process proceeds to step S485. If the determination result in step S475 is No, the process proceeds to step S485 in the same manner.

以上で説明したステップＳ４０５，Ｓ４２５，Ｓ４４５，Ｓ４６５，Ｓ４７５では、上位の区域を表す語（たとえば、「都」、「道」、「府」、「県」）から下位の区域を表す語（たとえば、「号」）に向かう順番で、順に、区域を表す所定の語を含むか否かが検討され、各判定結果がフラグのＯＮ／ＯＦＦで記憶される。 In steps S405, S425, S445, S465, and S475 described above, words (for example, "city", "road", "prefecture", "prefecture") that represent lower-order areas (for example, , “No.”), in order, whether or not a predetermined word representing an area is included is examined, and each determination result is stored by ON / OFF of a flag.

ステップＳ４８５では、住所文字列中において、数字と、「−」（全角マイナス）、「-」（半角マイナス）、または「‐」（ハイフン）とが、２個以上続いているか否か判定される。たとえば、住所文字列中において「３−」、「3-」、「３‐」などの文字列があれば、判定結果はＹｅｓとなる。ステップＳ４８５の判定結果がＹｅｓである場合は、処理は、図１１のステップＳ６１０に進む。 In step S485, it is determined whether or not there are two or more numbers and "-" (full-width minus), "-" (half-width minus), or "-" (hyphen) in the address character string. . For example, if there is a character string such as “3-”, “3-”, “3-” in the address character string, the determination result is Yes. If the determination result of step S485 is Yes, the process proceeds to step S610 in FIG.

ステップＳ４８５の判定結果がＹｅｓであるという場合には、住所文字列において、「○丁目×番地△号」の情報の少なくとも一部が、数字をハイフンやマイナスでつなげた表記で表されていると推定できる。たとえば、「４丁目３−１」と表記されている場合である。 If the determination result in step S485 is Yes, in the address character string, at least a part of the information of “○ chome × address △ number” is represented by a notation in which numbers are connected with hyphens or minuses. Can be estimated. For example, it is a case where “4-chome 3-1” is described.

一方、ステップＳ４８５の判定結果がＮｏである場合は、処理は、図９のステップＳ４９０に進む。 On the other hand, if the determination result of step S485 is No, the process proceeds to step S490 of FIG.

ステップＳ４８５の判定結果がＮｏであるという場合には、住所文字列において、「○丁目×番地△号」の情報は、数字をハイフンやマイナスでつなげた表記で表されていないと推定できる。すなわち、ステップＳ４７５までの処理で、住所の情報のすべてについて検討され、検討結果が各フラグに反映されていると考えることができる。 If the determination result in step S485 is No, it can be estimated that the information of “○ chome × address Δ number” in the address character string is not represented by a notation in which numbers are connected with hyphens or minuses. That is, in the process up to step S475, it can be considered that all address information is examined and the examination result is reflected in each flag.

ステップＳ４９０では、都道府県フラグ、市町村フラグ、町域フラグ、街区フラグ、号フラグのＯＮ／ＯＦＦに基づいて、住所文字列の詳細度が決定される。 In step S490, the degree of detail of the address character string is determined based on ON / OFF of the prefecture flag, the municipality flag, the town area flag, the city block flag, and the number flag.

具体的には、号フラグがＯＮである場合には、その住所文字列の詳細度は５である。街区フラグがＯＮである場合には、その住所文字列の詳細度は４である。町域フラグがＯＮである場合には、その住所文字列の詳細度は３である。市町村フラグがＯＮである場合には、その住所文字列の詳細度は２である。都道府県フラグがＯＮである場合には、その住所文字列の詳細度は１である。複数のフラグがＯＮである場合には、対応する詳細度のうちのより高い詳細度がその住所文字列に割り当てられる。このように決定された詳細度を表す詳細レベルデータがメモリ１２０に格納される。 Specifically, when the number flag is ON, the detail level of the address character string is 5. When the block flag is ON, the detail level of the address character string is 4. When the town area flag is ON, the detail level of the address character string is 3. When the municipality flag is ON, the detail level of the address character string is 2. When the prefecture flag is ON, the detail level of the address character string is 1. If the plurality of flags are ON, a higher level of detail among the corresponding levels of detail is assigned to the address character string. Detail level data representing the level of detail determined in this way is stored in the memory 120.

すなわち、住所文字列の詳細度は、住所文字列に含まれる地域を表す語の最も下位の階層に基づいて決定される。このような処理を行うことで、住所文字列に対する詳細度を客観的に決定することができる。 That is, the level of detail of the address character string is determined based on the lowest hierarchy of words representing regions included in the address character string. By performing such processing, the level of detail for the address character string can be objectively determined.

図１０は、住所文字列において京都の住所表記がされている場合（図９のステップＳ４１５，Ｓ４３５，Ｓ４５５参照）の、その住所文字列の詳細度の検討の処理を示すフローチャートである。図１０の処理は、図９のステップＳ４５５の判定結果がＹｅｓである場合に実行される（図９，図１０の［Ｉ］参照）。 FIG. 10 is a flowchart showing the processing for examining the level of detail of the address character string when the Kyoto address is described in the address character string (see steps S415, S435, and S455 in FIG. 9). The process in FIG. 10 is executed when the determination result in step S455 in FIG. 9 is Yes (see [I] in FIGS. 9 and 10).

ステップＳ５０５では、住所文字列に「上る」、「下る」、「東入る」、「西入る」、「南入る」、「北入る」、「上ル」、「下ル」、「東入ル」、「西入ル」、「南入ル」、「北入ル」の文字があるか否かが判定される。判定結果がＹｅｓである場合には、ステップＳ５１０において、町域フラグがＯＮにされる。 In step S505, the address string is “up”, “down”, “enter east”, “enter west”, “enter south”, “enter north”, “upper”, “lower”, “east”, It is determined whether or not there are the characters "West entry", "South entry", and "North entry". If the determination result is Yes, the town area flag is turned ON in step S510.

すなわち、住所文字列の詳細さの検討において、「上る」、「下る」、「東入る」、「西入る」、「南入る」、「北入る」、「上ル」、「下ル」、「東入ル」、「西入ル」、「南入ル」、「北入ル」の表記があるという事実は、京都フラグがＯＮでないときに「丁目」の表記があるという事実と同じ程度に評価される。その後、処理は、図９のステップＳ４６５に戻る（図９，図１０の［Ｋ］参照）。 That is, in the examination of the details of the address string, “up”, “down”, “east entry”, “west entry”, “south entry”, “north entry”, “upper”, “lower”, “ The fact that there is a notation of “East Iru”, “West Ile”, “South Ile”, “Kitairi” is evaluated as much as the fact that there is “Chome” when the Kyoto flag is not ON. The Thereafter, the processing returns to step S465 in FIG. 9 (see [K] in FIGS. 9 and 10).

ステップＳ４５５，Ｓ５０５，Ｓ５１０の処理を行うことで、住所文字列において京都独自の住所表記がされている場合にも、その住所文字列の詳細度さ評価することができる。 By performing the processing of steps S455, S505, and S510, even when the address character string has an address unique to Kyoto, it is possible to evaluate the level of detail of the address character string.

なお、京都の住所を表記する場合にも、通常の「○丁目×番地△号」のような方式で表記することがある。本実施例においては、ステップＳ４５５，Ｓ５０５，Ｓ５１０の処理の後、通常のステップＳ４６５が行われる。このため、京都の住所が通常の「○丁目×番地△号」のような方式で表記されている場合にも、住所文字列の詳細さを評価することができる。 In addition, even when an address in Kyoto is written, it may be written in a system like the usual “○ chome × address △ number”. In the present embodiment, a normal step S465 is performed after the processing of steps S455, S505, and S510. For this reason, the details of the address character string can be evaluated even when the address in Kyoto is represented by a method such as a normal “○ chome × address Δ number”.

図１１は、住所文字列において丁目、番地、号が、ハイフン等でつながれて表記されている場合の、その住所文字列の詳細度の検討の処理を示すフローチャートである。図１１の処理は、ステップＳ４８５の判定結果がＹｅｓである場合に実行される（図９，図１０の［Ｊ］参照）。 FIG. 11 is a flowchart showing the processing for examining the level of detail of the address character string when the chome, address, and number are connected and represented by a hyphen or the like in the address character string. The process in FIG. 11 is executed when the determination result in step S485 is Yes (see [J] in FIGS. 9 and 10).

ステップＳ６１０では、図９のステップＳ４８５で検出された、数字と、「−」、「-」または「‐」等（全角または半角のマイナスまたはハイフン）と、で構成される２個以上の文字列が、マイナスやハイフンの部分で分割される。 In step S610, two or more character strings composed of numbers and "-", "-", "-", etc. (full-width or half-width minus or hyphen) detected in step S485 of FIG. Is divided at the minus and hyphen part.

ステップＳ６１５では、分割結果が３個であるか、が判定される。判定結果がＹｅｓである場合には、処理は、ステップＳ６２０に進む。ステップＳ６１５の判定結果がＹｅｓであるということは、住所文字列において、「○丁目×番地△号」が「○−×−△」等と表記されているということである。なお、以下、全角または半角のマイナスまたはハイフンのうち、全角マイナスを例として使用して、住所表記の説明を行う。 In step S615, it is determined whether there are three division results. If the determination result is Yes, the process proceeds to step S620. The determination result in step S615 being Yes means that “○ chome × address Δ number” is described as “◯ − × −Δ” or the like in the address character string. In the following, address notation will be described using full-width minus or hyphen among full-width or half-width minus or hyphen as an example.

ステップＳ６２０では、町域フラグ、街区フラグ、および号フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S620, the town area flag, the city block flag, and the number flag are turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

ステップＳ６１５の判定結果がＮｏである場合には、処理は、ステップＳ６２５に進む。 If the determination result of step S615 is No, the process proceeds to step S625.

ステップＳ６２５では、分割結果が２個であるか、が判定される。判定結果がＹｅｓである場合には、処理は、ステップＳ６２７に進む。ステップＳ６２５の判定結果がＹｅｓであるということは、住所文字列において、「○丁目×番地」が「○−×」と表記されているか、「○丁目×番地△号」が「○丁目×−△」等と表記されているということである。 In step S625, it is determined whether there are two division results. If the determination result is Yes, the process proceeds to step S627. If the determination result in step S625 is Yes, it means that “○ chome × address” is written as “○ −x” in the address character string, or “○ chome × address △ number” is “○ chome × −”. It means that it is written as “Δ” or the like.

ステップＳ６２７では、住所文字列に「丁目」レベルの情報が含まれるか否かを表す町域フラグがＯＮとなっているか、が判定される。判定結果がＹｅｓである場合には、処理は、ステップＳ６３０に進む。ステップＳ６２７の判定結果がＹｅｓであるということは、住所文字列において、「○丁目×番地△号」が「○丁目×−△」等と表記されているということである。 In step S627, it is determined whether or not the town area flag indicating whether or not the address character string includes “chome” level information is ON. If the determination result is Yes, the process proceeds to step S630. The determination result in step S627 being Yes means that “○ chome × address Δ number” is described as “○ chome × −Δ” or the like in the address character string.

ステップＳ６３０において、「番地」レベルの情報に対応する街区フラグ、および「号」レベルの情報に対応する号フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S630, the block flag corresponding to the “address” level information and the number flag corresponding to the “number” level information are turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

ステップＳ６２７の判定結果がＮｏである場合には、処理は、ステップＳ６４０に進む。ステップＳ６２７の判定結果がＮｏであるということは、住所文字列において、「号」レベルの情報は含まれておらず、「○丁目×番地」が「○−×」と表記されている、ということである。 If the determination result of step S627 is No, the process proceeds to step S640. If the determination result in step S627 is No, it means that the address character string does not include information on the “No.” level, and “○ chome × address” is written as “◯ − ×”. That is.

ステップＳ６４０において、「丁目」レベルの情報に対応する町域フラグ、および「番地」レベルの情報に対応する街区フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S640, the town area flag corresponding to the “chome” level information and the city block flag corresponding to the “address” level information are turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

一方、ステップＳ６２５の判定結果がＮｏである場合には、処理は、ステップＳ６４５に進む。ステップＳ６２５の判定結果がＮｏであるということは、ステップＳ４８５で検出された、数字と、全角または半角のマイナスまたはハイフンと、で構成される２個以上の文字列が、一つの数字と、一つの全角または半角のマイナスまたはハイフンとで構成されることを意味する。 On the other hand, if the determination result of step S625 is No, the process proceeds to step S645. If the determination result in step S625 is No, it means that two or more character strings composed of numbers and full-width or half-width minus or hyphen detected in step S485 are one number and one. It is composed of two full-width or half-width minus or hyphen.

ステップＳ６４５では、住所文字列に「番地」レベルの情報が含まれるか否かを表す街区フラグがＯＮとなっているか、が判定される。判定結果がＹｅｓである場合には、処理は、ステップＳ６５０に進む。ステップＳ６４５の判定結果がＹｅｓであるということは、住所文字列において、「×番地△号」が「×番地−△」等と表記されているということである。 In step S645, it is determined whether or not a block flag indicating whether or not the address character string includes “address” level information is ON. If the determination result is Yes, the process proceeds to step S650. If the determination result in step S645 is Yes, it means that “× address Δ number” is written as “× address−Δ” or the like in the address character string.

ステップＳ６５０において、「号」レベルの情報に対応する号フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S650, the number flag corresponding to the “number” level information is turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

一方、ステップＳ６４５の判定結果がＮｏである場合には、処理は、ステップＳ６５５に進む。 On the other hand, if the determination result of step S645 is No, the process proceeds to step S655.

ステップＳ６５５では、住所文字列に「丁目」レベルの情報が含まれるか否かを表す町域フラグがＯＮとなっているか、が判定される。判定結果がＹｅｓである場合には、処理は、ステップＳ６６０に進む。ステップＳ６５５の判定結果がＹｅｓであるということは、住所文字列において、「○丁目×番地」が「○丁目−×」等と表記されているということである。 In step S655, it is determined whether the town area flag indicating whether or not the address character string includes information of the “chome” level is ON. If the determination result is Yes, the process proceeds to step S660. The determination result in step S655 being Yes means that “○ chome × address” is written as “○ chome-x” or the like in the address character string.

ステップＳ６６０において、「番地」レベルの情報に対応する街区フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S660, the block flag corresponding to the “address” level information is turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

一方、ステップＳ６５５の判定結果がＮｏである場合には、処理は、ステップＳ６７０に進む。ステップＳ６５５の判定結果がＮｏであるということは、住所文字列において、「○丁目」が「−○」等と表記されているということである。 On the other hand, if the determination result of step S655 is No, the process proceeds to step S670. The determination result in step S655 being No means that “○ chome” is written as “− ○” or the like in the address character string.

ステップＳ６７０において、「丁目」レベルの情報に対応する町域フラグがＯＮとされる。その後、処理は、図９のステップＳ４９０に進む（図９，図１１の［Ｌ］参照）。 In step S670, the town area flag corresponding to the “chome” level information is turned ON. Thereafter, the processing proceeds to step S490 in FIG. 9 (see [L] in FIGS. 9 and 11).

図９のステップＳ４０５〜Ｓ４７５の処理においては、住所文字列について、上位の階層の区域を表す語から下位の階層の区域を表す語に向かう順番で、順に、区域を表す所定の語を含むか否かを検討される。そして、図１１の処理においては、住所文字列中の全角または半角のマイナスまたはハイフンで結ばれた数については、それらの個数（図１１のステップＳ６１５，Ｓ６２５参照）、ならびにそれまでに検出された町域の階層に含まれる語の有無を表す町域フラグのＯＮ／ＯＦＦ、およびそれまでに検出された街区の階層に含まれる語の有無を表す街区フラグのＯＮ／ＯＦＦに基づいて（図１１のステップＳ６２７，Ｓ６４５，Ｓ６５５参照）、町域フラグ、街区フラグ、および号フラグをＯＮまたはＯＦＦとする（同、ステップＳ６３０，Ｓ６４０，Ｓ６５０，Ｓ６６０，Ｓ６７０参照）。このため、「○丁目×番地△号」のような情報の一部または全部が、ハイフン等でつながれた数字で表されている住所文字列についても、正確に詳細さを評価することができる。 In the processing of steps S405 to S475 in FIG. 9, whether the address character string includes predetermined words representing the areas in order from the word representing the upper hierarchy area to the word representing the lower hierarchy area. It is considered whether or not. In the processing of FIG. 11, the numbers connected by minus or hyphen of full-width or half-width in the address character string have been detected so far (see steps S615 and S625 of FIG. 11). Based on ON / OFF of a town area flag indicating the presence / absence of a word included in the hierarchy of the town area, and ON / OFF of a city area flag indicating presence / absence of a word included in the area of the town block detected so far (FIG. In step S627, S645, and S655), the town area flag, the city block flag, and the number flag are turned ON or OFF (see steps S630, S640, S650, S660, and S670). For this reason, it is possible to accurately evaluate the details of an address character string in which a part or all of the information such as “○ chome × address Δ No.” is represented by numbers connected by hyphens or the like.

Ｂ．第２実施例：
第２実施例においては、図３のステップＳ４０で住所文字列の詳細度を決定した後、ステップＳ４５の処理を行う前に、さらに、他の要素を考慮して詳細度を改変する。また、地図アプリケーションサーバ１００（図１参照）は、電話番号と、その電話番号の所有者または店舗名、およびその住所と、を関連づけて保持する電話番号データベースを有する。第２実施例の他の点は、第１実施例と同じである。 B. Second embodiment:
In the second embodiment, after the detail level of the address character string is determined in step S40 of FIG. 3, the detail level is further modified in consideration of other factors before the process of step S45. Further, the map application server 100 (see FIG. 1) has a telephone number database that holds a telephone number, an owner or store name of the telephone number, and an address thereof in association with each other. The other points of the second embodiment are the same as those of the first embodiment.

図１２は、地図アプリケーションサーバが詳細度を改変する処理Ｓ４２を示すフローチャートである。図１２の処理Ｓ４２は、図３のステップＳ４０とステップＳ４５の間に行われる。なお、図１２の処理Ｓ４２は、ＣＰＵ１１０の機能部としての住所文字列検討部１１２が実現する。 FIG. 12 is a flowchart showing the process S42 in which the map application server modifies the detail level. The process S42 in FIG. 12 is performed between step S40 and step S45 in FIG. 12 is realized by the address character string review unit 112 as a function unit of the CPU 110.

ステップＳ７１０では、図３のステップＳ１０で取得した文書データから、電話番号が取得される。電話番号は、たとえば、全角または半角の「‐」（ハイフン）、「−」（マイナス）または「）」（カッコ）以外の文字を間に含まない、全角または半角の１０個の連続する数字の文字列を検索することで、文書データから抽出することができる。 In step S710, a telephone number is acquired from the document data acquired in step S10 of FIG. A phone number is, for example, 10 consecutive full-width or half-width characters that do not include any characters other than full-width or half-width "-" (hyphen), "-" (minus), or ")" (parentheses). By retrieving the character string, it can be extracted from the document data.

ステップＳ７２０では、地図アプリケーションサーバ１００は、電話番号データベースを参照して、ステップＳ７１０で取得した電話番号に対応する店舗名または所有者、ならびに住所を取得する。 In step S720, the map application server 100 refers to the telephone number database and acquires the store name or owner corresponding to the telephone number acquired in step S710, and the address.

ステップＳ７２５では、図３のステップＳ１０で取得した文書データ中に、図１２のステップＳ７２０で取得した店舗名または所有者、ならびに住所が存在するか否かを判定する。文書データ中に店舗名または所有者、ならびに住所が存在する場合は、処理はステップＳ７３０に進む。文書データ中に店舗名も所有者も存在しない場合、および文書データ中に住所が存在しない場合は、処理は、終了する。なお、図１２には示していないが、ステップＳ７２０で、電話番号に対応する電話番号の店舗名または所有者、ならびに住所を取得できなかった場合も、処理は終了する。 In step S725, it is determined whether the store name or owner acquired in step S720 of FIG. 12 and the address exist in the document data acquired in step S10 of FIG. If the store name or owner and the address exist in the document data, the process proceeds to step S730. If neither the store name nor the owner exists in the document data, and the address does not exist in the document data, the process ends. Although not shown in FIG. 12, if the store name or owner of the telephone number corresponding to the telephone number and the address cannot be acquired in step S720, the process is also terminated.

ステップＳ７３０では、図３のステップＳ４０で決定した住所文字列の詳細度を改変する。具体的には、住所文字列の詳細度をより高い詳細度に改変する。第２実施例においては、詳細度を２だけ上げる。その後、処理を終了する。 In step S730, the level of detail of the address character string determined in step S40 of FIG. 3 is modified. Specifically, the detail level of the address character string is changed to a higher detail level. In the second embodiment, the level of detail is increased by 2. Thereafter, the process ends.

ブログやホームページの記事において電話番号を正確に記述するユーザは、同時に、記事中で高精度の店舗評価を行っていると推定できる。このため、第２実施例のような態様とすることで、高精度な店舗評価を行っていると推定できるブログやホームページの文書データを選んで、それらに基づいて記事リンク地図データベースＭＤＢを生成することができる。このため、精度の高い記事リンク地図データベースＭＤＢを生成することができる。 A user who accurately describes a telephone number in an article on a blog or a homepage can be estimated at the same time as performing highly accurate store evaluation in the article. For this reason, by setting it as an aspect like 2nd Example, the blog and the homepage document data which can be estimated that the highly accurate store evaluation is performed are selected, and based on them, the article link map database MDB is generated. be able to. For this reason, the highly accurate article link map database MDB can be generated.

Ｃ．第３実施例：
第３実施例においても、図３のステップＳ４０で住所文字列の詳細度を決定した後、ステップＳ４５の処理を行う前に、さらに、他の要素を考慮して詳細度を改変する。また、第３実施例においては、図３のステップＳ１０で、閲覧用データ中からリンク先を表すリンク先データが取得される。第３実施例の他の点は、第１実施例と同じである。 C. Third embodiment:
Also in the third embodiment, after the detail level of the address character string is determined in step S40 of FIG. 3, the detail level is further modified in consideration of other factors before the process of step S45. In the third example, link destination data representing the link destination is acquired from the browsing data in step S10 of FIG. The other points of the third embodiment are the same as those of the first embodiment.

図１３は、地図アプリケーションサーバが詳細度を改変する処理Ｓ４３を示すフローチャートである。図１３の処理Ｓ４３は、図３のステップＳ４０とステップＳ４５の間に行われる。なお、図１３の処理Ｓ４３は、ＣＰＵ１１０の機能部としての住所文字列検討部１１２が実現する。 FIG. 13 is a flowchart showing the process S43 in which the map application server modifies the level of detail. Process S43 in FIG. 13 is performed between step S40 and step S45 in FIG. Note that the process S43 of FIG. 13 is realized by the address character string review unit 112 as a functional unit of the CPU 110.

ステップＳ８１０では、ステップＳ１０で取得されたリンク先データが表すリンク先のページからリンク先文書データを取得する。なお、第３実施例においては、図３のステップＳ１０で、あらかじめ、閲覧用データの中からリンク先を表すリンク先データが取得されている。 In step S810, link destination document data is acquired from the link destination page represented by the link destination data acquired in step S10. In the third embodiment, link destination data representing a link destination is acquired in advance from the browsing data in step S10 of FIG.

ステップＳ８１０では、具体的には、リンク先のページであってインターネット上のサーバに格納されているページから、ｈｔｍｌデータを収集する。そして、その中から文字のデータのみを抽出して、リンク先文書データを生成する。ステップＳ８１０の処理の内容は、リンク先のページからデータを取得する点以外は、図３のステップＳ１０の処理と同様である。 In step S810, specifically, html data is collected from a linked page that is stored in a server on the Internet. Then, only character data is extracted from them, and linked document data is generated. The content of the process in step S810 is the same as the process in step S10 of FIG. 3 except that data is acquired from the linked page.

ステップＳ８２０では、リンク先文書データの文を語に分解する。ステップＳ８２０の処理は、リンク先文書データを対象とする点以外は、図３のステップＳ２０の処理と同様である。 In step S820, the sentence of the link destination document data is decomposed into words. The process of step S820 is the same as the process of step S20 in FIG. 3 except that the target document data is the target.

ステップＳ８３０では、ステップＳ８２０で語に分解され、それぞれ品詞が対応づけられたリンク先文書データから、住所文字列を抽出して、リンク先住所リストデータを作成する。ステップＳ８３０の処理は、リンク先文書データを対象としてリンク先住所リストデータを作成する点以外は、図３のステップＳ３０の処理と同様である。 In step S830, an address character string is extracted from the linked document data that is decomposed into words in step S820 and each part of speech is associated with each other, thereby creating linked address list data. The process of step S830 is the same as the process of step S30 of FIG. 3 except that linked address list data is created for the linked document data.

ステップＳ８３５では、ステップＳ８３０で生成したリンク先住所リストデータと、図３のステップＳ３０で生成した住所リストデータとを照合して、両者に同じ住所文字列が含まれているか否かを判定する。リンク先住所リストデータと住所リストデータに同じ住所文字列が含まれている場合は、処理はステップＳ８４０に進む。同じ住所文字列が含まれていない場合は、処理は終了する。なお、図１３には示していないが、ステップＳ８３０で、住所文字列が抽出されなかった場合も、処理は終了する。 In step S835, the linked address list data generated in step S830 and the address list data generated in step S30 of FIG. 3 are collated to determine whether or not the same address character string is included in both. If the same address character string is included in the linked address list data and the address list data, the process proceeds to step S840. If the same address character string is not included, the process ends. Although not shown in FIG. 13, the process is also terminated when an address character string is not extracted in step S830.

ステップＳ８４０では、図３のステップＳ４０で決定した住所文字列の詳細度を改変する。具体的には、住所文字列の詳細度をより高い詳細度に改変する。第３実施例においては、詳細度を２だけ上げる。その後、処理を終了する。 In step S840, the level of detail of the address character string determined in step S40 of FIG. 3 is modified. Specifically, the detail level of the address character string is changed to a higher detail level. In the third embodiment, the level of detail is increased by 2. Thereafter, the process ends.

ブログやホームページの記事においてリンク先に記載された住所を正確に引き写すユーザは、同時に、記事中で高精度の店舗評価を行っていると推定できる。このため、第３実施例のような態様とすることで、高精度な店舗評価を行っていると推定できるブログやホームページの文書データを選んで、それらに基づいて記事リンク地図データベースＭＤＢを生成することができる。このため、精度の高い記事リンク地図データベースＭＤＢを生成することができる。 It can be presumed that a user who accurately copies an address described in a link destination in a blog or a homepage article is performing a highly accurate store evaluation in the article at the same time. For this reason, by setting it as an aspect like 3rd Example, the document data of the blog and the homepage which can be estimated that the highly accurate store evaluation is performed are selected, and the article link map database MDB is generated based on the data. be able to. For this reason, the highly accurate article link map database MDB can be generated.

Ｄ．変形例：
なお、この発明は上記の実施例や実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば次のような変形も可能である。なお、以下に示す変形例、および上記の第１〜第３実施例は、適宜組み合わせて本願発明を実現することができる。たとえば、詳細度の決定において第２実施例の図１２の処理と第３実施例の図１３の処理を両方行うことで、より精度の高いデータベースを生成することができる。 D. Variations:
The present invention is not limited to the above-described examples and embodiments, and can be implemented in various modes without departing from the gist thereof. For example, the following modifications are possible. In addition, the modification shown below and said 1st-3rd Example can be combined suitably, and this invention can be implement | achieved. For example, a database with higher accuracy can be generated by performing both the processing of FIG. 12 of the second embodiment and the processing of FIG. 13 of the third embodiment in determining the degree of detail.

Ｄ１．変形例１：
上記実施例においては、文書データは、インターネット上で公開されているブログやホームページに記載された記事のデータである。しかし、文書データは、たとえば、社内ネットワークにおいて入手可能なデータなど、他のデータとすることもできる。すなわち、文書データは、何らかの方法で入手可能な任意の文書データとすることができる。そのような文書データは、住所を表す文字列である住所文字列を含むことが好ましい。そして、そのような文書データは、記事文字列として、店舗の紹介や評価に関する記事を含むことが好ましい。 D1. Modification 1:
In the above-described embodiment, the document data is data of articles described on a blog or a homepage published on the Internet. However, the document data can also be other data, such as data available on an in-house network. That is, the document data can be any document data that can be obtained by any method. Such document data preferably includes an address character string that is a character string representing an address. Such document data preferably includes articles relating to store introductions and evaluations as article character strings.

Ｄ２．変形例２：
上記実施例では、レストランの評価記事を例に閲覧用データＰＤ１，ＰＤ２の記事ページの説明を行った。しかし、閲覧用データＰＤ１，ＰＤ２の記事ページは、食料品や日用品を販売する店舗に関する記事であってもよいし、映画館で上映されている映画の映画評であってもよい。すなわち、記事ページおよび記事ページに基づいて生成できる文書データは、住所によって場所を特定することができる任意の店舗に関する記事を含むものとすることができる。 D2. Modification 2:
In the above-described embodiment, the article pages of the browsing data PD1 and PD2 have been described by taking an evaluation article of a restaurant as an example. However, the article page of the browsing data PD1 and PD2 may be an article about a store that sells groceries and daily necessities, or may be a movie review of a movie being screened in a movie theater. In other words, the article page and the document data that can be generated based on the article page can include an article about an arbitrary store whose location can be specified by an address.

Ｄ３．変形例３：
上記実施例では、住所文字列の詳細度を決定して、その詳細度に基づいて文書データを選択している（図３のステップＳ４０，Ｓ４５参照）。しかし、文書データの選択は、直接、住所文字列の詳細さを表すフラグなどのデータに基づいて行ってもよい。すなわち、文書データの選択は、詳細度に限らず、住所文字列の詳細さの検討結果を表す任意の形式の詳細レベルデータに基づいて行うことができる。 D3. Modification 3:
In the above embodiment, the level of detail of the address character string is determined, and the document data is selected based on the level of detail (see steps S40 and S45 in FIG. 3). However, the selection of the document data may be performed directly based on data such as a flag indicating the details of the address character string. In other words, the selection of the document data is not limited to the degree of detail, but can be performed based on the detail level data in an arbitrary format that represents the result of examining the details of the address character string.

Ｄ４．変形例４：
上記実施例では、一つの文書データの中から最も詳細度の高い住所文字列が選択される（図３のステップＳ５０参照）。しかし、住所文字列を選択する際には、詳細度が所定のしきい値以上である１以上の住所文字列を選択することができる。また、住所文字列を選択する際には、最も詳細度の高い住所文字列から、２以上の所定数だけ選択することもできる。さらに、住所文字列を選択する際には、最も高い詳細度を有する住所文字列すべてを選択することもできる。 D4. Modification 4:
In the above embodiment, the address character string with the highest degree of detail is selected from one document data (see step S50 in FIG. 3). However, when selecting an address character string, it is possible to select one or more address character strings whose degree of detail is equal to or greater than a predetermined threshold value. Further, when selecting an address character string, a predetermined number of two or more can be selected from the address character string having the highest degree of detail. Furthermore, when selecting an address character string, it is also possible to select all address character strings having the highest level of detail.

Ｄ５．変形例５：
上記実施例では、文書データのｕｒｌを取得し（図３のステップＳ１０）、住所文字列の緯度および経度と関連づけてデータベースに格納している（同、ステップＳ６０）。しかし、緯度および経度と関連づけてデータベースに格納するデータは、ｕｒｉ（Uniform Resource Identifier）など、他の記述方式によるものでもよい。すなわち、緯度および経度の情報と関連づけてデータベースに格納するデータは、文書データの所在を表すデータであれば、任意の形式のものとすることができる。 D5. Modification 5:
In the above embodiment, the url of the document data is acquired (step S10 in FIG. 3) and stored in the database in association with the latitude and longitude of the address character string (step S60). However, the data stored in the database in association with the latitude and longitude may be based on other description methods such as uri (Uniform Resource Identifier). That is, the data stored in the database in association with the latitude and longitude information can be in any format as long as the data represents the location of the document data.

Ｄ６．変形例６：
上記実施例における住所文字列の詳細どの決定は、住所文字列中の前から順に、階層の異なる所定の語（「都」、「道」、「府」、「県」や「市」、「町」、「村」）に該当するか否かを検討することが好ましい。すなわち、ある住所文字列に「都」の文字があった場合には（図９のステップＳ４０５がＹｅｓ）、その後、その住所文字列中でその「都」の文字より後ろの部分において、次の階層の「市」、「町」、または「村」の文字があるか否かについて検討する（ステップＳ４２５参照）ことが好ましい。 D6. Modification 6:
Details of the address character string in the above embodiment are determined in order from the front of the address character string in the order of predetermined words (“city”, “road”, “fu”, “prefecture”, “city”, “ It is preferable to consider whether it falls under “town” or “village”). That is, if there is a character “Miya” in a certain address character string (Yes in step S405 in FIG. 9), then, in the address character string, in the part after the character “Miya”, the next It is preferable to examine whether or not there is a character of “city”, “town”, or “village” in the hierarchy (see step S425).

また、上記実施例では、「地域」の品詞が割り当てられているのは、市町村レベルの名前までである。そして、たとえば「字」のあとに続く「大沢」など、市町村より下のレベルの地名については、「地域」の品詞は割り当てられていない。しかし、市町村レベルより下の住所についても、「地域」の品詞を割り当てる態様とすることもできる。そのような態様においては、図８のステップＳ３８５においては、「品詞が「地域」である」、という条件を、選択的な条件の一つとして有することが好ましい。 In the above embodiment, the part of speech of “region” is assigned up to the name of the municipal level. Then, the part of speech of “region” is not assigned to a place name at a level lower than the municipality, such as “Osawa” following “letter”. However, it is possible to assign a part of speech of “region” to an address below the municipality level. In such an aspect, in step S385 of FIG. 8, it is preferable to have the condition that “part of speech is“ region ”” as one of the optional conditions.

Ｄ７．変形例７：
上記実施例では、あらかじめ定められたサーバの閲覧用データから文書データを生成してデータベースを生成し、その後に、そのデータベースを利用したサービスが提供された。しかし、データベースの生成および更新と、そのデータベースを利用したサービスの提供とは、並行して行うこともできる。そのような態様においては、サーバの閲覧用データへのアクセスは、所定の時間（たとえば５分、１０分、３０分など）内にアクセス可能な所定の回数のしきい値（たとえば、１００回、５００回、１０００回など）を超えないような頻度で行われることが好ましい。 D7. Modification 7:
In the above-described embodiment, document data is generated from browsing data of a predetermined server, a database is generated, and then a service using the database is provided. However, generation and update of a database and provision of a service using the database can be performed in parallel. In such an aspect, access to the browsing data of the server is performed for a predetermined number of thresholds (for example, 100 times) that can be accessed within a predetermined time (for example, 5 minutes, 10 minutes, 30 minutes, etc.). 500 times, 1000 times, etc.) is preferably performed at such a frequency that does not exceed.

Ｄ８．変形例８：
上記実施例では、サーバ１００のハードウェア構成については、ＣＰＵ１００とメモリ１２０のみについて言及している。このＣＰＵ１００は、単一のＣＰＵで構成されることもでき、複数のＣＰＵで構成されることもできる。また、メモリ１２０は、半導体メモリとすることもでき、固定ディスクや書き込み可能な他の記録媒体とすることもできる。
また、それらのＣＰＵ１００とメモリ１２０の構成は、単一のサーバ内に格納されていてもよく、ネットワークを介して接続された複数の装置として設けられていてもよい。 D8. Modification 8:
In the above embodiment, the hardware configuration of the server 100 refers only to the CPU 100 and the memory 120. The CPU 100 can be composed of a single CPU or a plurality of CPUs. Further, the memory 120 can be a semiconductor memory, and can also be a fixed disk or another writable recording medium.
The configurations of the CPU 100 and the memory 120 may be stored in a single server, or may be provided as a plurality of devices connected via a network.

同様に、ＣＰＵの各機能部も、単一のサーバ内のＣＰＵによって実現されることもでき、ネットワークで接続された複数の装置のＣＰＵが、それぞれ所定の機能部を実現する態様とすることもできる。また、各機能部自体も、単一のＣＰＵで実現されてもよく、複数のＣＰＵで実現されてもよい。 Similarly, each functional unit of the CPU can also be realized by a CPU in a single server, and the CPUs of a plurality of devices connected via a network can each realize a predetermined functional unit. it can. Each functional unit itself may be realized by a single CPU or a plurality of CPUs.

さらに、アプリケーションサービスプロバイダのサーバについても、一つの閲覧用データは、一つのサーバ内に格納されていてもよいし、２以上のサーバに分散されて格納されていても良い。 Further, for the application service provider server, one browsing data may be stored in one server, or may be distributed and stored in two or more servers.

Ｄ９．変形例９：
上記実施例において、ハードウェアによって実現されていた構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されていた構成の一部をハードウェアに置き換えるようにしてもよい。 D9. Modification 9:
In the above embodiment, a part of the configuration realized by hardware may be replaced with software, and conversely, a part of the configuration realized by software may be replaced by hardware.

このような機能を実現するコンピュータプログラムは、フロッピディスクやＣＤ−ＲＯＭ、ＤＶＤ等の、コンピュータ読み取り可能な記録媒体に記録された形態で提供される。ホストコンピュータは、その記録媒体からコンピュータプログラムを読み取って内部記憶装置または外部記憶装置に転送する。あるいは、通信経路を介してプログラム供給装置からホストコンピュータにコンピュータプログラムを供給するようにしてもよい。コンピュータプログラムの機能を実現する時には、内部記憶装置に格納されたコンピュータプログラムがホストコンピュータのマイクロプロセッサによって実行される。また、記録媒体に記録されたコンピュータプログラムをホストコンピュータが直接実行するようにしてもよい。 A computer program that realizes such a function is provided in a form recorded on a computer-readable recording medium such as a floppy disk, a CD-ROM, or a DVD. The host computer reads the computer program from the recording medium and transfers it to the internal storage device or the external storage device. Alternatively, the computer program may be supplied from the program supply device to the host computer via a communication path. When realizing the function of the computer program, the computer program stored in the internal storage device is executed by the microprocessor of the host computer. Further, the host computer may directly execute the computer program recorded on the recording medium.

この明細書において、ホストコンピュータとは、ハードウェア装置とオペレーションシステムとを含む概念であり、オペレーションシステムの制御の下で動作するハードウェア装置を意味している。コンピュータプログラムは、このようなホストコンピュータに、上述の各部の機能を実現させる。なお、上述の機能の一部は、アプリケーションプログラムでなく、オペレーションシステムによって実現されていても良い。 In this specification, the host computer is a concept including a hardware device and an operation system, and means a hardware device that operates under the control of the operation system. The computer program causes such a host computer to realize the functions of the above-described units. Note that some of the functions described above may be realized by an operation system instead of an application program.

なお、この発明において、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスクやＣＤ−ＲＯＭのような携帯型の記録媒体に限らず、各種のＲＡＭやＲＯＭ等のコンピュータ内の内部記憶装置や、ハードディスク等のコンピュータに固定されている外部記憶装置も含んでいる。 In the present invention, the “computer-readable recording medium” is not limited to a portable recording medium such as a flexible disk or a CD-ROM, but an internal storage device in a computer such as various RAMs and ROMs, An external storage device fixed to a computer such as a hard disk is also included.

本発明の地図情報提供システムの概略を示す図。The figure which shows the outline of the map information provision system of this invention. クライアント２００で記事リンク地図データベースＭＤＢを利用する際の、ディスプレイ２１０上の表示の一例を示す図。The figure which shows an example of the display on the display 210 at the time of using the article link map database MDB with the client 200. FIG. 地図アプリケーションサーバの処理を示すフローチャート。The flowchart which shows the process of a map application server. 本明細書における「品詞」の例を示す表。The table | surface which shows the example of "part of speech" in this specification. 文書データ中の住所の文字列について、ステップＳ２０の処理結果の例を示す図。The figure which shows the example of the process result of step S20 about the character string of the address in document data. 文書データ中の住所の文字列について、ステップＳ２０の処理結果の例を示す図。The figure which shows the example of the process result of step S20 about the character string of the address in document data. 図３のステップＳ３０の詳細な処理を示すフローチャート。The flowchart which shows the detailed process of step S30 of FIG. 図３のステップＳ３０の詳細な処理を示すフローチャート。The flowchart which shows the detailed process of step S30 of FIG. 図３のステップＳ４０の詳細な処理を示すフローチャート。The flowchart which shows the detailed process of step S40 of FIG. 住所文字列において京都の住所表記がされている場合の、その住所文字列の詳細度の検討の処理を示すフローチャート。The flowchart which shows the process of examination of the detail level of the address character string when the address notation of Kyoto is written in the address character string. 住所文字列において丁目、番地、号が、ハイフン等でつながれて表記されている場合の、その住所文字列の詳細度の検討の処理を示すフローチャート。The flowchart which shows the process of examination of the detail level of the address character string when the chome, the address, and the number are connected and represented by a hyphen etc. in the address character string. 地図アプリケーションサーバが詳細度を改変する処理を示すフローチャート。The flowchart which shows the process in which a map application server changes detail. 地図アプリケーションサーバが詳細度を改変する処理を示すフローチャートである。It is a flowchart which shows the process in which a map application server changes a detail level.

Explanation of symbols

１００…地図アプリケーションサーバ
１１０…ＣＰＵ
１１１…文書データ取得部
１１２…住所文字列検討部
１１３…文書データ選択部
１１４…データベース生成部
１１５…サービス提供部
１２０…メモリ
２００…クライアント
２１０…液晶ディスプレイ
２２０…キーボード
２３０…マウス
Ａ１０…地図が表示される領域
Ａ２０…入力窓と検索ボタンが表示される領域
Ａ２１…入力窓
Ａ２２…検索ボタン
Ａ３０…文書データの一部が表示される領域
ＡＢ…住所バッファ
ＡＬ…住所リストデータ
ＡＳ１，ＡＳ２…サーバ
ＢＬ…地図上に表示されるバルーン
ＩＮＴ…インターネット
ＭＤＢ…記事リンク地図データベース
ＰＤ１，ＰＤ２…閲覧用データ 100 ... Map application server 110 ... CPU
DESCRIPTION OF SYMBOLS 111 ... Document data acquisition part 112 ... Address character string examination part 113 ... Document data selection part 114 ... Database production | generation part 115 ... Service provision part 120 ... Memory 200 ... Client 210 ... Liquid crystal display 220 ... Keyboard 230 ... Mouse A10 ... Map display Area A20 ... Area where input window and search button are displayed A21 ... Input window A22 ... Search button A30 ... Area where part of document data is displayed AB ... Address buffer AL ... Address list data AS1, AS2 ... Server BL ... Balloons displayed on the map INT ... Internet MDB ... Article link map database PD1, PD2 ... Reading data

Claims

A method for generating a database based on a plurality of document data,
(A) preparing a plurality of document data including an address character string that is a character string representing an address and an article character string that is a description about a store;
(B) analyzing the data of the address character string to generate detail level data representing details of the address represented by the address character string;
(C) selecting at least one document data including the address character string from the plurality of document data based on the detail level data;
(D) generating a database relating to the store based on the selected document data.

The method of claim 1, comprising:
The step (c)
A method of selecting document data including the most detailed address character string as one of the at least one document data.

The method according to claim 1 or 2, comprising:
The step (b)
Analyzing the data of the address string and examining details of the address represented by the address string;
Determining an evaluation representing the accuracy of the content of the document data based on the result of examination of the details.

The method of claim 3 , comprising:
The step of determining the evaluation representing the accuracy includes at least one of determining a value representing the evaluation and determining a rank representing the order of the evaluation with respect to the plurality of document data. Including a method.

The method of claim 1, comprising:
The step (a)
Accessing a server of a service provider that provides a service for publishing original document data including a character string and created by the user via a network to a user who has made a predetermined contract via the network, Generating the document data based on the character string included in the original document data,
The original document data further includes link data for associating the reference document data including a character string with the original document data,
The method further comprises:
(E) generating linked document data based on the character string included in the reference document data while referring to the reference document data based on the link data;
(F) determining whether the link destination document data includes the address character string included in the document data;
In the step (c), priority is given to document data in which the linked document data includes the address character string of the document data, compared to document data in which the linked document data does not include the address character string of the document data. In particular, the method is a step of selecting as the at least one document data.

A method according to any of claims 1 to 5 , comprising
The step (d)
(D1) specifying the latitude and longitude of the point represented by the address character string;
(D2) creating the database including the latitude and longitude and the address character string of the selected document data and the information on the acquisition destination of the article character string in association with each other.

A method according to any of claims 1 to 6 , comprising
The step (b)
(B1) decomposing a character string included in the document data into a plurality of words each including one or more characters;
(B2) sequentially examining the words included in the document data, and extracting the address character string from the document data.
The step (b2)
(B3) storing the first word at the head of the address buffer when the first word included in the document data satisfies a predetermined first condition including that it is a word representing an area;
(B4) additionally storing the second word in the address buffer when a second word following the word stored in the address buffer satisfies a predetermined second condition;
(B5) extracting the character string stored in the address buffer as the address character string when the second word satisfies a predetermined third condition different from the second condition; Including a method.

The method of claim 7, comprising:
The step (b5)
As the case where the third condition is satisfied,
The second word is
It ’s a local word,
Contain any of the characters “east”, “west”, “south”, “north”, “large”,
The word represents a number,
Contain the word “letter”,
The method includes the step of extracting a character string in the address buffer as the address character string if none of the above applies.

The method according to claim 7 or 8, comprising:
The step (b5) further includes
When the third condition is satisfied, when the second word is a word representing a region higher than the word examined in the step (b2) immediately before the second word, Extracting the character string in the address buffer as the address character string.

A method according to any of claims 7 to 9, comprising
The step (b5) further includes
As the case where the third condition is satisfied,
The second word is
The word represents a number,
A given word that appears after the number and represents the unit of the number,
A given word that appears after the noun and represents the attribute of the noun,
Be a symbol,
Being a noun,
A predetermined word that appears after the number and represents the connection of the number,
A predetermined word that appears before and / or after the noun and modifies the noun,
The method includes the step of extracting a character string in the address buffer as the address character string if none of the above applies.

A method according to any of claims 7 to 10, comprising
The step (b4)
As a case where the second condition is satisfied,
The second word is
Representing the region,
Contain any of the characters “east”, “west”, “south”, “north”, “large”,
And if the second word is satisfied, the method further comprises storing the second word in the address buffer.

A method according to any of claims 7 to 11, comprising
In the step (b4),
As a case where the second condition is satisfied,
The second word is
The word represents a number,
Contain the word “letter”,
And if the second word is satisfied, the method further comprises storing the second word in the address buffer.

A method according to any of claims 7 to 12, comprising
In the step (b4),
As a case where the second condition is satisfied,
A word considered before the second word is
The word represents a number,
Contain the word “letter”,
And either
The second word is
The word represents a number,
A given word that appears after the number and represents the unit of the number,
A given word that appears after the noun and represents the attribute of the noun,
Be a symbol,
Being a noun,
A predetermined word that appears after the number and represents the connection of the number,
A predetermined word that appears before and / or after the noun and modifies the noun,
And if the second word is satisfied, the method further comprises storing the second word in the address buffer.

A method according to any of claims 1 to 13, comprising
The step (b) further includes:
(B7) including a step of examining whether or not the address character string includes a plurality of predetermined words that are a part of the address and represent a zone and belong to a plurality of zones of different levels. ,
The step (c)
(C1) A method including the step of selecting the document data based on the lowest hierarchy of the predetermined word included in the address character string.

15. The method of claim 14, wherein
The plurality of levels include a town level and a block level,
The step (b7)
(B8) including a step of examining whether or not the address character string includes a predetermined word representing the area in an order from a word representing an upper hierarchy area to a word representing a lower hierarchy area;
The step (b8)
For the number of full-width or half-width minus or hyphen in the address string,
The number of the full-width or half-width minus or hyphen-connected number, and
Evaluation of whether or not the predetermined word is included based on the presence / absence of a word included in the district hierarchy detected so far and the presence / absence of a word included in the block hierarchy detected so far Including a step of performing an evaluation equivalent to.

The method of claim 1, further comprising:
(E) obtaining a telephone number from the document data;
(F) Deciding whether or not an address character string associated with the acquired telephone number is included in the document data while referring to a database in which the telephone number and the address character string are stored in association with each other. Including the steps of:
In the step (c), the document data including the address character string associated with the acquired telephone number is preferentially compared with the document data not including the address character string associated with the acquired telephone number. A method of selecting as the at least one document data.

A database generation device that generates a database based on a plurality of document data,
(A) a document data acquisition unit that prepares a plurality of document data including an address character string that is a character string representing an address and an article character string that is a description about a store;
(B) Analyzing the data of the address character string, and generating an address character string examination unit that generates detail level data representing the details of the address represented by the address character string;
(C) a document data selection unit that selects at least one document data including the address character string from the plurality of document data based on the detail level data;
(D) A database generation device including a database generation unit that generates a database relating to a store based on the selected document data.

A computer program for generating a database based on a plurality of document data,
(A) a function of preparing a plurality of document data including an address character string that is a character string representing an address and an article character string that is a description about a store;
(B) a function of analyzing the data of the address character string and generating detail level data representing the details of the address represented by the address character string;
(C) a function of selecting at least one document data including the address character string from the plurality of document data based on the detail level data;
(D) A computer program capable of causing a computer to realize a function of generating a database relating to a store based on the selected document data.