JP4667823B2

JP4667823B2 - Table monitoring device, Web page monitoring system, computer program

Info

Publication number: JP4667823B2
Application number: JP2004309291A
Authority: JP
Inventors: 元服部; 一則松本; 史昭菅谷
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2004-10-25
Filing date: 2004-10-25
Publication date: 2011-04-13
Anticipated expiration: 2024-10-25
Also published as: JP2006120048A

Description

本発明は、テーブル監視装置、Ｗｅｂページモニタリングシステム、並びにそのテーブル監視装置をコンピュータを利用して実現するためのコンピュータプログラムに関する。 The present invention relates to a table monitoring apparatus, a Web page monitoring system, and a computer program for realizing the table monitoring apparatus using a computer.

近年、インターネット上のＷｅｂサイトで開設されるＷｅｂページによって、金融会社のアカウント情報提供サービスや株式情報提供サービス、天気予報情報提供サービス等の情報提供サービスが行われており、ユーザは時々刻々と変化する所望のデータをタイムリーに取得可能となっている。このような情報提供サービスを受ける場合、ユーザ自らが常時、Ｗｅｂページを閲覧し、提供されるデータの更新の有無をチェックすることは大変な負担である。このため、ユーザが注目する特定のデータを監視し、該特定データの更新有りを自動的に検知してユーザに通知することが要望されている。 In recent years, information services such as account information provision services for financial companies, stock information provision services, weather forecast information provision services, etc. have been provided by web pages established on websites on the Internet. The desired data can be acquired in a timely manner. When receiving such an information providing service, it is a heavy burden for the user himself / herself to always browse the web page and check whether the provided data has been updated. For this reason, it is desired to monitor specific data that the user pays attention to automatically detect that the specific data is updated and notify the user.

従来、Ｗｅｂページ中のテーブル内に含まれる複数のデータを内容ごとに分割する技術として、例えば特許文献１に記載の技術が知られている。この従来技術では、Ｗｅｂページ中のテーブルを解析してデータを格納するセルの位置を示すセル位置データとセルの特徴を表現したセルベクトルとを生成し、テーブルタイプ（表か、レイアウトか）を判別する。そして、テーブルタイプが表の場合は、セル位置データとセルベクトルを参照してテーブルの分割方向（縦分割か横分割か）を決定し、この分割方向を参照してテーブルを分割し、セグメントを生成する。一方、テーブルタイプがレイアウトの場合は、セルベクトルを参照して各セルをクラスタリングし、セル位置データとセルクラスタ情報を参照してセグメントを生成する。これにより、Ｗｅｂページ中のテーブルを内容ごとに分割している。
特開２０００−３３９３０１号公報 Conventionally, for example, a technique disclosed in Patent Document 1 is known as a technique for dividing a plurality of data included in a table in a Web page for each content. In this prior art, a table in a Web page is analyzed to generate cell position data indicating the position of a cell storing data and a cell vector expressing the characteristics of the cell, and the table type (table or layout) is set. Determine. If the table type is table, the table division direction (vertical division or horizontal division) is determined with reference to the cell position data and cell vector, the table is divided with reference to this division direction, and the segment Generate. On the other hand, when the table type is layout, each cell is clustered with reference to a cell vector, and a segment is generated with reference to cell position data and cell cluster information. Thereby, the table in the Web page is divided for each content.
JP 2000-339301 A

しかし、上述した従来技術では、特定のデータを格納するセルの位置がテーブル内で固定せず変動する場合、特定のデータを格納するセルを一意に同定することは困難であり、テーブル内に含まれる特定のデータを監視する目的に適用することができない。 However, in the above-described prior art, when the position of the cell storing the specific data fluctuates without being fixed in the table, it is difficult to uniquely identify the cell storing the specific data and is included in the table. It cannot be applied to the purpose of monitoring specific data.

例えば、株価の値上がり率の順位に従って各銘柄情報（コード、市場、名称、取引値等）を記載する株価ランキングテーブル（図９参照）では、時々刻々と変動する各銘柄の株価の値上がり率の順位に応じて各銘柄情報の記載位置が変動する。したがって、ユーザが注目する特定の銘柄の情報を監視するためには、該特定銘柄のテーブル内記載位置（図９ではテーブル内の記載行）を確実に同定することが求められる。なお、図９に示される銘柄情報（コード、名称、取引値等）は、便宜上のものである。 For example, in a stock price ranking table (see FIG. 9) that describes each brand information (code, market, name, transaction price, etc.) according to the ranking of the stock price increase rate, the ranking of the stock price increase rate of each stock that varies from time to time. Depending on the situation, the description position of each brand information varies. Therefore, in order to monitor the information of the specific brand that the user pays attention to, it is required to reliably identify the position of the specific brand in the table (in FIG. 9, the description line in the table). The brand information (code, name, transaction value, etc.) shown in FIG. 9 is for convenience.

また、株価ランキングテーブルにおいて複数の銘柄を監視し、該監視対象の複数の銘柄のデータを収集して表示する場合、各表示データがいずれの銘柄のデータであるのかを容易に判別できることが望ましい。例えば、表示データを識別するために表示する情報として、図９の「コード」を使用するよりも、「名称」を使用した方がユーザには判りやすい。 Further, when a plurality of brands are monitored in the stock price ranking table, and data of a plurality of brands to be monitored are collected and displayed, it is desirable that it is possible to easily determine which brand each display data is. For example, it is easier for the user to use “name” as information to be displayed for identifying display data than to use “code” in FIG.

本発明は、このような事情を考慮してなされたもので、その目的は、特定のデータを格納するセルの位置が変動するテーブルにおいて、特定のデータを一意に同定することができると共に、ユーザに判りやすい識別情報を選択することができるテーブル監視装置を提供することにある。 The present invention has been made in view of such circumstances, and the object of the present invention is to uniquely identify specific data in a table in which the position of a cell storing specific data varies, and for a user. It is an object of the present invention to provide a table monitoring apparatus capable of selecting identification information that can be easily understood.

また、本発明の他の目的は、本発明のテーブル監視装置を使用してＷｅｂページ中のテーブル内に含まれる特定のデータを監視することにより、ユーザが注目する特定のデータの更新有りを自動的に検知すると共に、該特定データを収集してユーザに対して判りやすい表示を行うことができるＷｅｂページモニタリングシステムを提供することにある。 Another object of the present invention is to automatically monitor the update of specific data that is noticed by the user by using the table monitoring apparatus of the present invention to monitor specific data included in a table in a Web page. Another object of the present invention is to provide a Web page monitoring system that can detect the target and collect the specific data and display it easily for the user.

また、本発明の他の目的は、本発明のテーブル監視装置をコンピュータを利用して実現するためのコンピュータプログラムを提供することにある。 Another object of the present invention is to provide a computer program for realizing the table monitoring apparatus of the present invention using a computer.

上記の課題を解決するために、本発明に係るテーブル監視装置は、各々データを格納する複数のセルにより行列を構成するテーブルを監視するテーブル監視装置であって、例示用テーブル、解析対象テーブル、及び前記例示用テーブル中の注目データを特定する指定データを入力する入力手段と、前記例示用テーブル及び前記解析対象テーブルのデータの類似性に基づいた識別子行或いは識別子列を前記例示用テーブル中から選択し、当該識別子行或いは識別子列のデータを含むオブジェクトを前記例示用テーブル中から抽出するとともに、前記識別子行或いは識別子列のデータを前記オブジェクトの識別子と定める識別子・オブジェクト抽出手段と、前記例示用テーブルから前記注目データを含む正例オブジェクトを抽出し、前記解析対象テーブルから前記正例オブジェクトの識別子と一致する識別子を含む注目オブジェクトを抽出するオブジェクト同定手段と、前記注目オブジェクトから、前記正例オブジェクト中の注目データの位置と同じ位置にあるデータを注目データとして同定する注目データ同定手段と、前記識別子・オブジェクト抽出手段が行う識別子抽出処理において複数の識別子行候補或いは識別子列候補が発見された場合に、該複数の候補のセルに含まれるテキストからキーワードを選択し、該キーワードをＷｅｂ検索サイトへ送信し、この応答を受信する検索手段と、前記検索手段が受信した応答に基づき、キーワードごとの検索結果件数を求め、各キーワードの検索結果件数をそれぞれ比較し、この比較結果に基づいて最適な識別子行候補或いは識別子列候補を選択し、この選択結果の識別子行候補或いは識別子列候補を前記識別子・オブジェクト抽出手段に通知する識別子選択手段とを備えたことを特徴とする。 In order to solve the above problems, a table monitoring apparatus according to the present invention is a table monitoring apparatus that monitors a table that forms a matrix with a plurality of cells each storing data, and includes an example table, an analysis target table, And an input means for inputting designated data for specifying attention data in the example table, and an identifier row or an identifier column based on the similarity between the data of the example table and the analysis target table from the example table. Selecting and extracting an object including data of the identifier row or identifier column from the example table, an identifier / object extracting means for defining the identifier row or identifier column data as an identifier of the object, and the example Extract a positive example object including the attention data from the table and analyze Object identifying means for extracting an object of interest including an identifier that matches the identifier of the positive example object from a table, and identifying the data at the same position as the data of attention in the positive example object as the target data from the target object When a plurality of identifier row candidates or identifier column candidates are found in the identifier extraction process performed by the attention data identification means and the identifier / object extraction means, a keyword is selected from the text included in the plurality of candidate cells. The search means for transmitting the keyword to the Web search site and receiving this response and the search result number for each keyword are obtained based on the response received by the search means, and the search result number for each keyword is compared respectively. Based on this comparison result, the optimum identifier row candidate or identifier column Select, characterized by comprising an identifier selection means for notifying an identifier line candidate or the identifier string candidates of the selection result to the identifier object extracting means.

本発明に係るテーブル監視装置においては、前記検索手段は、前記複数の候補のうち、セル内のテキストの文字列の長さが所定の文字数を超えるセルを含む候補をＷｅｂ検索対象から除外することを特徴とする。 In the table monitoring apparatus according to the present invention, the search means excludes, from among the plurality of candidates, a candidate including a cell whose length of a character string of text in a cell exceeds a predetermined number of characters from a Web search target. It is characterized by.

本発明に係るテーブル監視装置においては、前記検索手段は、前記複数の候補のうち、セル内のテキストが数値のみで且つ所定の桁数を超えるセルを含む候補をＷｅｂ検索対象から除外することを特徴とする。 In the table monitoring apparatus according to the present invention, the search means may exclude, from the plurality of candidates, a candidate including a cell whose text in a cell is only a numerical value and exceeds a predetermined number of digits, from a Web search target. Features.

本発明に係るテーブル監視装置においては、前記識別子選択手段は、前記候補ごとに前記検索結果件数の平均値を算出し、各前記候補の検索結果件数の平均値を比較し、最小の平均値となった前記候補を選択することを特徴とする。 In the table monitoring apparatus according to the present invention, the identifier selection unit calculates an average value of the number of search results for each candidate, compares the average value of the number of search results for each candidate, and calculates the minimum average value. The candidate is selected.

本発明に係るテーブル監視装置においては、前記識別子選択手段は、前記検索結果件数の平均値算出処理において、前記検索結果件数の中から最大値および最小値を除外し、残った検索結果件数の平均値を計算することを特徴とする。 In the table monitoring apparatus according to the present invention, the identifier selection unit excludes the maximum value and the minimum value from the search result number in the average value calculation process of the search result number, and averages the number of remaining search results. It is characterized by calculating a value.

本発明に係るＷｅｂページモニタリングシステムは、各々データを格納する複数のセルにより行列を構成するテーブルを有するＷｅｂページを監視するＷｅｂページモニタリングシステムであって、例示用の前記Ｗｅｂページと解析対象の前記Ｗｅｂページを通信ネットワークを介して取得するＷｅｂページ取得手段と、前記例示用Ｗｅｂページ中の注目データを指定する注目データ指定入力手段と、前記例示用Ｗｅｂページ中のテーブル、前記解析対象Ｗｅｂページ中のテーブル、及び前記注目データの指定データから、前記解析対象Ｗｅｂページのテーブル中の注目データを同定する請求項１から５のいずれかの項に記載のテーブル監視装置と、前記テーブル監視装置により同定された注目データを記憶する記憶手段と、前記記憶手段に記憶されている注目データと前記テーブル監視装置により得られた注目データとから、解析対象Ｗｅｂページ中の注目データの更新の有無を判定する判定手段と、前記判定手段の判定結果を利用者に対して通知する通知手段と、前記テーブル監視装置により同定された注目データと当該同定に使用された識別子とを対応付けて表示するための表示データを作成する表示データ作成手段とを備えたことを特徴とする。 The web page monitoring system according to the present invention is a web page monitoring system that monitors a web page having a table that forms a matrix with a plurality of cells each storing data, the web page for illustration and the analysis target Web page acquisition means for acquiring a Web page via a communication network, attention data designation input means for designating attention data in the example Web page, a table in the example Web page, and the analysis target Web page The table monitoring device according to any one of claims 1 to 5 is identified by the table monitoring device and the table monitoring device. Storage means for storing received attention data, and the storage means From the memorized attention data and the attention data obtained by the table monitoring device, determination means for determining whether or not attention data in the analysis target Web page is updated, and the determination result of the determination means to the user And a display data creating means for creating display data for displaying the attention data identified by the table monitoring device and the identifier used for the identification in association with each other. And

本発明に係るコンピュータプログラムは、各々データを格納する複数のセルにより行列を構成するテーブルを監視するためのコンピュータプログラムであって、例示用テーブル、解析対象テーブル、及び前記例示用テーブル中の注目データを特定する指定データを入力する機能と、前記例示用テーブル及び前記解析対象テーブルのデータの類似性に基づいた識別子行或いは識別子列を前記例示用テーブル中から選択し、当該識別子行或いは識別子列のデータを含むオブジェクトを前記例示用テーブル中から抽出するとともに、前記識別子行或いは識別子列のデータを前記オブジェクトの識別子と定める識別子・オブジェクト抽出機能と、前記例示用テーブルから前記注目データを含む正例オブジェクトを抽出し、前記解析対象テーブルから前記正例オブジェクトの識別子と一致する識別子を含む注目オブジェクトを抽出する機能と、前記注目オブジェクトから、前記正例オブジェクト中の注目データの位置と同じ位置にあるデータを注目データとして同定する機能と、前記識別子・オブジェクト抽出機能が行う識別子抽出処理において複数の識別子行候補或いは識別子列候補が発見された場合に、該複数の候補のセルに含まれるテキストからキーワードを選択し、該キーワードをＷｅｂ検索サイトへ送信し、この応答を受信する機能と、前記受信した応答に基づき、キーワードごとの検索結果件数を求め、各キーワードの検索結果件数をそれぞれ比較し、この比較結果に基づいて最適な識別子行候補或いは識別子列候補を選択し、この選択結果の識別子行候補或いは識別子列候補を前記識別子・オブジェクト抽出機能に通知する機能と、をコンピュータに実現させることを特徴とする。
これにより、前述のテーブル監視装置がコンピュータを利用して実現できるようになる。
A computer program according to the present invention is a computer program for monitoring a table that forms a matrix with a plurality of cells each storing data, and includes an example table, an analysis target table, and data of interest in the example table The identifier row or identifier column based on the similarity between the data of the example table and the analysis target table is selected from the example table, and the identifier row or identifier column of the identifier row or identifier column is selected. An object including data is extracted from the example table, and an identifier / object extraction function for defining the data of the identifier row or identifier column as the identifier of the object, and a positive example object including the attention data from the example table Extracted from the analysis target table A function of extracting an attention object including an identifier that matches an identifier of a positive example object; a function of identifying data at the same position as the position of attention data in the positive example object as attention data from the attention object; When a plurality of identifier row candidates or identifier column candidates are found in the identifier extraction process performed by the identifier / object extraction function , a keyword is selected from text included in the plurality of candidate cells, and the keyword is sent to the Web search site. The function of transmitting and receiving this response and the number of search results for each keyword are obtained based on the received response, the number of search results for each keyword is compared, and an optimal identifier row candidate or Select an identifier column candidate, and select an identifier row candidate or an identifier column candidate as a result of this selection. A function of notifying the serial identifier object extracting function, characterized in that to realize the computer.
As a result, the table monitoring apparatus described above can be realized using a computer.

本発明によれば、時々刻々と変化又は移動するデータを含むテーブル同士のデータの類似性に着目し、オブジェクト単位で同定を行った後に注目データの同定を行うことにより、例示用テーブル中の注目データの一例を提示するだけで解析対象テーブル中の注目データを同定することができる。これにより、特定のデータを監視して更新有りを自動的に検知することができる。さらに、Ｗｅｂ検索を利用した識別子選択処理により、識別子を表示データの表題としてそのまま利用できるような、ユーザに判りやすい意味のある識別子を選択することができる。 According to the present invention, attention is focused on the similarity in data between tables including data that changes or moves from moment to moment, and attention data in the example table is identified by identifying the attention data after identifying each object. It is possible to identify the data of interest in the analysis target table simply by presenting an example of the data. Thereby, it is possible to automatically detect the presence of update by monitoring specific data. Furthermore, an identifier selection process using Web search can select a meaningful identifier that can be easily understood by the user so that the identifier can be used as a title of display data.

また、本発明のテーブル監視装置を使用してＷｅｂページ中のテーブル内に含まれる特定のデータを監視することにより、ユーザが注目する特定のデータの更新有りを自動的に検知すると共に、該特定データを収集してユーザに対して判りやすい表示を行うことができる。 In addition, by using the table monitoring apparatus of the present invention to monitor specific data included in a table in a Web page, it is possible to automatically detect that there is an update of specific data that the user is interested in and Data can be collected for easy-to-understand display for the user.

初めに本発明が扱うテーブルについての前提条件を説明する。
テーブルは、各々データを格納するセルを複数有し、複数のセルにより行列を構成する。
テーブルには同種のオブジェクトが複数並んでいる。例えば、株式売買高等のランキングや、週間天気予報などが記載されたテーブルを扱う。オブジェクトは複数のセルから構成される。
列方向に同種のデータが並ぶとともに行方向に関連するデータが並んでおり、一つの行には単数又は複数のオブジェクトが並んでいる。或いは、行方向に同種のデータが並ぶとともに列方向に関連するデータが並んでおり、一つの列には単数又は複数のオブジェクトが並んでいる。
オブジェクト単位で生成、削除、及びテーブル内の移動が行われる。 First, preconditions for the table handled by the present invention will be described.
Each table has a plurality of cells for storing data, and a plurality of cells form a matrix.
The table has several objects of the same type. For example, it deals with a table in which rankings such as stock trading volume and weekly weather forecasts are described. An object is composed of a plurality of cells.
The same kind of data is arranged in the column direction and data related to the row direction are arranged, and one or more objects are arranged in one row. Alternatively, the same kind of data is arranged in the row direction and data related to the column direction are arranged, and one or more objects are arranged in one column.
Generation, deletion, and movement within the table are performed in units of objects.

以下、図面を参照し、本発明の一実施形態について説明する。本実施形態では、Ｗｅｂページ中のテーブル内に含まれる特定のデータを監視する装置を例に挙げて説明する。
図１は、本発明の一実施形態に係るテーブル監視装置１の構成を示すブロック図である。図１において、テーブル監視装置１は、入力部１１、識別子・オブジェクト抽出部１２、オブジェクト同定部１３、注目データ同定部１４、出力部１５、例示用Ｗｅｂページ更新部１６、検索部１７及び識別子選択部１８を備える。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the present embodiment, an apparatus that monitors specific data included in a table in a Web page will be described as an example.
FIG. 1 is a block diagram showing a configuration of a table monitoring apparatus 1 according to an embodiment of the present invention. In FIG. 1, the table monitoring apparatus 1 includes an input unit 11, an identifier / object extraction unit 12, an object identification unit 13, an attention data identification unit 14, an output unit 15, an exemplary web page update unit 16, a search unit 17, and an identifier selection. The unit 18 is provided.

入力部１１は、Ｗｅｂページデータを入力する。Ｗｅｂページデータは例えばＨＴＭＬ（HyperText Markup Language）データである。Ｗｅｂページ中にはテーブル形式でデータが格納されている。入力されるＷｅｂページには、例示用と解析対象の２種類がある。例示用Ｗｅｂページは、ユーザが注目するデータ（注目データ）を指定するときに用いられたページである。また、入力部１１は、その注目データを特定する指定データを入力する。この指定データは、当該注目データであってもよい。解析対象Ｗｅｂページは、実際に注目データを同定する対象となるページである。例示用Ｗｅｂページと解析対象Ｗｅｂページは同一ＵＲＬ（Uniform Resource Locator）のページである。 The input unit 11 inputs web page data. Web page data is, for example, HTML (HyperText Markup Language) data. Data is stored in a table format in the Web page. There are two types of Web pages that are input, for example and for analysis. The exemplary web page is a page used when designating data (attention data) to which the user pays attention. Further, the input unit 11 inputs designation data for specifying the attention data. The designation data may be the attention data. The analysis target web page is a page that is a target for actually identifying the attention data. The example web page and the analysis target web page are pages having the same URL (Uniform Resource Locator).

識別子・オブジェクト抽出部１２は、例示用Ｗｅｂページと解析対象Ｗｅｂページの双方のテーブルについて、同一行同士をそれぞれ比較し、また同一列同士をそれぞれ比較し、同一行内または同一列内に完全に一致するテキストが２つ以上存在しない行又は列のいずれか一つを例示用Ｗｅｂページのテーブルから選択する。ここで行が選択されたならば当該行を識別子行とし、一方、列が選択されたならば当該列を識別子列とする。そして、例示用Ｗｅｂページのテーブル中のセルの中から、識別子行ならば列方向、識別子列ならば行方向の全セルまたは一部のセルをオブジェクトとして切り出し、各オブジェクトと識別子行／識別子列が交差する位置のセルをオブジェクトの識別子とする。 The identifier / object extraction unit 12 compares the same rows with each other in the tables of the example Web page and the analysis target Web page, and also compares the same columns with each other, and completely matches the same row or the same column. From the table of the example Web page, select one of the rows or columns for which there are no two or more texts to be displayed. If a row is selected here, the row is set as an identifier row, and if a column is selected, the column is set as an identifier column. Then, from the cells in the table of the exemplary web page, all cells or a part of the cells in the column direction for the identifier row and in the row direction for the identifier column are cut out as objects. The cell at the intersecting position is used as the identifier of the object.

検索部１７および識別子選択部１８は、Ｗｅｂ検索を利用した識別子選択処理を行うブロックである。Ｗｅｂ検索を利用した識別子選択処理は、識別子・オブジェクト抽出部１２が行う識別子抽出処理において複数の識別子行／識別子列候補が発見された場合に、それら複数の候補の中から、ユーザに判りやすい識別子を含む候補を選択するためのものである。 The search unit 17 and the identifier selection unit 18 are blocks that perform an identifier selection process using Web search. The identifier selection process using the Web search is performed when an identifier extraction process performed by the identifier / object extraction unit 12 finds a plurality of identifier row / identifier column candidates, and an identifier that can be easily understood by the user from the plurality of candidates. This is for selecting a candidate including.

検索部１７は、Ｗｅｂ検索サイトへクエリ（検索するキーワード）を送信し、その応答を受信する。該キーワードは、識別子・オブジェクト抽出部１２によって発見された複数の識別子行／識別子列候補のセルに含まれるテキストから選択される。 The search unit 17 transmits a query (keyword to be searched) to the Web search site and receives the response. The keyword is selected from text included in a plurality of identifier row / identifier column candidate cells discovered by the identifier / object extraction unit 12.

識別子選択部１８は、検索部１７が受信した応答に基づき、キーワードごとの検索結果件数（検索ヒット数）を求める。そして、各キーワードの検索結果件数をそれぞれ比較し、この比較結果に基づいて最適な識別子行／識別子列候補を選択する。この選択結果の識別子行／識別子列候補は、識別子・オブジェクト抽出部１２に通知される。 The identifier selection unit 18 obtains the number of search results (number of search hits) for each keyword based on the response received by the search unit 17. Then, the number of search results for each keyword is compared, and an optimum identifier row / identifier column candidate is selected based on the comparison result. The identifier row / identifier column candidates of this selection result are notified to the identifier / object extraction unit 12.

オブジェクト同定部１３は、例示用Ｗｅｂページから注目データを含むオブジェクト（正例オブジェクト）を取り出す。そして、解析対象Ｗｅｂページから正例オブジェクトの識別子と一致する識別子を含むオブジェクトを切り出し、このオブジェクトを注目データが含まれるオブジェクト（注目オブジェクト）とする。なお、解析対象Ｗｅｂページ中に正例オブジェクトと一致するオブジェクトがない場合はエラーとし、同定不能とする。 The object identification unit 13 extracts an object (positive example object) including attention data from the exemplary Web page. Then, an object including an identifier that matches the identifier of the positive example object is cut out from the analysis target Web page, and this object is set as an object (attention object) including attention data. If there is no object that matches the positive object in the analysis target Web page, an error is assumed and identification is impossible.

注目データ同定部１４は、注目オブジェクトから、正例オブジェクト中の注目データの位置と同じ位置にあるデータを注目データとして同定し切り出す。
出力部１５は、注目データ同定部１４によって切り出された解析対象Ｗｅｂページ中の注目データを出力する。 The attention data identification unit 14 identifies, as attention data, data that is at the same position as the position of attention data in the positive example object from the attention object.
The output unit 15 outputs the attention data in the analysis target Web page cut out by the attention data identification unit 14.

例示用Ｗｅｂページ更新部１６は、オブジェクト識別子を正しく抽出するために、当該Ｗｅｂページの情報更新の周期（例えば１日や１時間など）の定数倍の周期で上記した処理（識別子・オブジェクトの抽出、オブジェクトの同定、注目データの同定）を繰り返す。そして、注目データの同定が可能であった場合、当該解析対象Ｗｅｂページを新たな例示用Ｗｅｂページとして更新する。なお、上記定数倍をいくつにするかは通信コストや負荷に応じて設定変更する。 In order to correctly extract the object identifier, the example Web page update unit 16 performs the above-described processing (identifier / object extraction) at a constant multiple of the information update period (for example, one day or one hour) of the Web page. , Identification of object, identification of attention data). If the data of interest can be identified, the analysis target Web page is updated as a new example Web page. Note that the number of constant multiplications is changed according to the communication cost and load.

上記したようにテーブル監視装置１は、Ｗｅｂページ中のテーブルのデータ構造の特徴を利用して関連するデータの集合（オブジェクト）を抽出し、注目データを含むオブジェクトの同定を行った後に注目データの同定を行う。これにより、ユーザが指定した注目データの値が変化したり、又は注目データの位置がテーブル内で移動したりする場合でも、例示用Ｗｅｂページ中の注目データの一例を提示するだけで、実際の解析対象Ｗｅｂページ中から当該注目データを同定することが可能となり、Ｗｅｂページ中から注目データを自動的かつ継続的に取得することが実現できる。 As described above, the table monitoring apparatus 1 extracts a collection (object) of related data using the characteristics of the data structure of the table in the Web page, identifies the object including the attention data, and then extracts the attention data. Identify. As a result, even if the value of the attention data specified by the user changes or the position of the attention data moves in the table, it is only necessary to present an example of the attention data in the example Web page. The attention data can be identified from the analysis target Web page, and the attention data can be automatically and continuously acquired from the Web page.

なお、本実施形態に係るテーブル監視装置は、専用のハードウェアにより実現されるものであってもよく、あるいはパーソナルコンピュータ等の汎用のコンピュータシステムにより構成され、図１に示される装置の各機能を実現するためのプログラムを実行することによりその機能を実現させるものであってもよい。 The table monitoring apparatus according to the present embodiment may be realized by dedicated hardware, or may be configured by a general-purpose computer system such as a personal computer, and each function of the apparatus shown in FIG. You may implement | achieve the function by running the program for implement | achieving.

また、そのテーブル監視装置には、周辺機器として入力装置、表示装置等（いずれも図示せず）が接続されるものとする。ここで、入力装置とはキーボード、マウス等の入力デバイスのことをいう。表示装置とはＣＲＴ（Cathode Ray Tube）や液晶表示装置等のことをいう。
また、上記周辺機器については、テーブル監視装置に直接接続するものであってもよく、あるいは通信回線を介して接続するようにしてもよい。 In addition, an input device, a display device, and the like (none of which are shown) are connected to the table monitoring device as peripheral devices. Here, the input device refers to an input device such as a keyboard and a mouse. The display device refers to a CRT (Cathode Ray Tube), a liquid crystal display device or the like.
The peripheral device may be connected directly to the table monitoring device, or may be connected via a communication line.

次に、上記した図１に示すテーブル監視装置１の詳細な動作を説明する。図２〜図７は、図１に示すテーブル監視装置１の処理フロー図である。
先ず、図２において、利用者が利用者端末により例示用Ｗｅｂページ（Ｗｅｂ１）を見ながら注目データを指定する（ステップＳ１０１）。例えば、注目データを有するセルのセル番号をキーボード操作により入力したり、或いはマウス等を操作して当該セルをクリックする。この指定データはテーブル監視装置１に入力される。次いで、テーブル監視装置１は、その入力された指定データに基づき、例示用Ｗｅｂページ（Ｗｅｂ１）中から、当該指定された注目データを含むテーブルＴ１を取り出す（ステップＳ１０２）。次いで、テーブル監視装置１は、解析対象Ｗｅｂページ（Ｗｅｂ２）を取得する（ステップＳ１０３）。次いで、この解析対象Ｗｅｂページ（Ｗｅｂ２）中から、テーブルＴ１と同じテーブルＴ２を取り出す（ステップＳ１０４）。次いで、テーブルＴ１とテーブルＴ２からオブジェクトの識別子列、或いは識別子行を抽出する（ステップＳ１０５）。 Next, the detailed operation of the table monitoring apparatus 1 shown in FIG. 1 will be described. 2 to 7 are process flow diagrams of the table monitoring apparatus 1 shown in FIG.
First, in FIG. 2, the user designates attention data while viewing the example Web page (Web1) with the user terminal (step S101). For example, the cell number of the cell having the attention data is input by keyboard operation, or the mouse is clicked by operating the mouse or the like. This designation data is input to the table monitoring apparatus 1. Next, the table monitoring apparatus 1 takes out the table T1 including the designated attention data from the example web page (Web1) based on the inputted designated data (step S102). Next, the table monitoring apparatus 1 acquires an analysis target Web page (Web2) (step S103). Next, the same table T2 as the table T1 is taken out from the analysis target Web page (Web2) (step S104). Next, the identifier column or identifier row of the object is extracted from the tables T1 and T2 (step S105).

ここで、図４、図５を参照して、ステップＳ１０５の識別子抽出処理を説明する。
先ず、識別子が行方向か、列方向かを判定する。
図４において、テーブルＴ１の「（行，列）＝（ｉ，ｊ）」番目のセルとテーブルＴ２のセルとを比較し（ステップＳ２０１）、テーブルＴ１の「（行，列）＝（ｉ，ｊ）」番目のセルの文字列と完全一致するセルがテーブルＴ２中にいくつあるかを全ての（ｉ，ｊ）について以下の条件Ａ１，Ａ２で算出し、２つのテーブル（行方向テーブル，列方向テーブル）を生成する（ステップＳ２０２）。
条件Ａ１；テーブルＴ１の（ｉ，ｊ）番目のセルの文字列が、テーブルＴ２のｉ行目に含まれている個数を計数する。行方向テーブル中の（ｉ，ｊ）番目のセルには条件Ａ１の計数結果を格納する。これにより、テーブルＴ１と同じ行列数の行方向テーブルが生成される。
条件Ａ２；テーブルＴ１の（ｉ，ｊ）番目のセルの文字列が、テーブルＴ２のｊ列目に含まれている個数を計数する。列方向テーブル中の（ｉ，ｊ）番目のセルには条件Ａ２の計数結果を格納する。これにより、テーブルＴ１と同じ行列数の行方向テーブルが生成される。 Here, the identifier extraction processing in step S105 will be described with reference to FIGS.
First, it is determined whether the identifier is in the row direction or the column direction.
In FIG. 4, the “(row, column) = (i, j)”-th cell of the table T1 is compared with the cell of the table T2 (step S201), and “(row, column) = (i, j, j) The number of cells in the table T2 that completely match the character string of the "th cell" is calculated for all (i, j) under the following conditions A1 and A2, and two tables (row direction table, column) (Direction table) is generated (step S202).
Condition A1: Count the number of character strings in the (i, j) th cell of the table T1 included in the i-th row of the table T2. The count result of condition A1 is stored in the (i, j) th cell in the row direction table. As a result, a row direction table having the same number of matrices as the table T1 is generated.
Condition A2: The number of character strings in the (i, j) th cell of the table T1 included in the jth column of the table T2 is counted. The count result of condition A2 is stored in the (i, j) th cell in the column direction table. As a result, a row direction table having the same number of matrices as the table T1 is generated.

次いで、行方向テーブル又は列方向テーブルのいずれか一方をステップＳ２０３〜Ｓ２０９の手順で選択する。ステップＳ２０３では、２以上の数値を含むテーブルがあるかを判断する。この判断の結果が「ＹＥＳ」の場合、２以上の数値を含む方のテーブルを選択する（ステップＳ２０４）。一方、ステップＳ２０３の判断結果が「ＮＯ」の場合、テーブル中の数値の合計をテーブル毎に算出し、これら合計値を比較し、合計値が異なるかを判断する（ステップＳ２０５）。この判断の結果が「ＹＥＳ」の場合、合計値が大きい方のテーブルを選択する（ステップＳ２０６）。 Next, either the row direction table or the column direction table is selected by the procedure of steps S203 to S209. In step S203, it is determined whether there is a table including two or more numerical values. If the result of this determination is “YES”, the table containing the numerical value of 2 or more is selected (step S204). On the other hand, if the determination result in step S203 is “NO”, the sum of the numerical values in the table is calculated for each table, and these total values are compared to determine whether the total values are different (step S205). If the result of this determination is “YES”, the table with the larger total value is selected (step S206).

一方、ステップＳ２０５の判断結果が「ＮＯ」の場合、以下の条件Ｂ１〜Ｂ４に従ってテーブルの選択を行う。
条件Ｂ１；テーブルＴ１の同一行に同じ文字列を含むセルがある場合は、行方向テーブルを選択する。
条件Ｂ２；テーブルＴ１の同一列に同じ文字列を含むセルがある場合は、列方向テーブルを選択する。
条件Ｂ３；テーブルＴ２の同一行に同じ文字列を含むセルがある場合は、行方向テーブルを選択する。
条件Ｂ４；テーブルＴ２の同一列に同じ文字列を含むセルがある場合は、列方向テーブルを選択する。
上記条件Ｂ１〜Ｂ４のいずれかに合致するかを判断し（ステップＳ２０７）、合致する場合には合致した条件でテーブルを選択する（ステップＳ２０８）。 On the other hand, if the determination result in step S205 is “NO”, the table is selected according to the following conditions B1 to B4.
Condition B1: When there is a cell containing the same character string in the same row of the table T1, the row direction table is selected.
Condition B2: If there is a cell containing the same character string in the same column of the table T1, the column direction table is selected.
Condition B3: If there is a cell containing the same character string in the same row of the table T2, the row direction table is selected.
Condition B4: If there is a cell containing the same character string in the same column of the table T2, the column direction table is selected.
It is determined whether any of the above conditions B1 to B4 is met (step S207). If the conditions are met, the table is selected under the matched conditions (step S208).

一方、上記条件Ｂ１〜Ｂ４のいずれにも合致しない場合（ステップＳ２０７でＮＯ）、テーブルＴ１又はＴ２を用いてテーブル中のセル間の類似度を算出し、類似度の高い方向は行方向であるか、或いは列方向であるかを判定し、この類似度が高い方向と等しい方向のテーブルを選択する（ステップＳ２０９）。なお、上記類似度の算出方法については後述する。 On the other hand, if none of the above conditions B1 to B4 is met (NO in step S207), the degree of similarity between the cells in the table is calculated using the table T1 or T2, and the direction in which the degree of similarity is high is the row direction. Or a column direction is selected, and a table in a direction equal to the direction having a high similarity is selected (step S209). The method for calculating the similarity will be described later.

上記した手順により行方向テーブル或いは列方向テーブルが選択されると、次に、この選択されたテーブルに基づいて図５の処理により識別子行或いは識別子列を求める。すなわち、行方向テーブルが選択されたならば識別子行を求める。一方、列方向テーブルが選択されたならば識別子列を求める。以下の説明では、行方向テーブルから識別子行を求める場合を例に挙げて説明するが、列方向テーブルから識別子列を求める場合も同様であり、行を列に、上を左に、下を右に、右を下に、左を上に、それぞれ読み替えればよい。なお、図５中のカッコ内の記載は列方向テーブルから識別子列を求める場合に対応している。 When a row direction table or a column direction table is selected by the above-described procedure, an identifier row or an identifier column is next obtained by the process of FIG. 5 based on the selected table. That is, if the row direction table is selected, the identifier row is obtained. On the other hand, if the column direction table is selected, an identifier column is obtained. In the following description, the case of obtaining an identifier row from a row direction table is described as an example, but the same applies to the case of obtaining an identifier column from a column direction table, where rows are columns, top is left, bottom is right In addition, the right may be read down and the left up. The description in parentheses in FIG. 5 corresponds to the case where the identifier column is obtained from the column direction table.

図５において、先ず、行方向のテーブルにおいて２以上の値を含む行を識別子行候補から除外する（ステップＳ２１０）。この結果、まだ２つ以上の識別子行候補が残っている場合（ステップＳ２１１でＹＥＳ）、テーブルＴ１の同一行に同じ文字列を含む複数のセルがあるときは当該行をさらに識別子行候補から除外する（ステップＳ２１２）。この結果、まだ２つ以上の識別子行候補が残っている場合（ステップＳ２１３でＹＥＳ）、テーブルＴ２の同一行に同じ文字列を含む複数のセルがあるときは当該行をさらに識別子行候補から除外する（ステップＳ２１４）。 In FIG. 5, first, a row including two or more values in the row direction table is excluded from the identifier row candidates (step S210). As a result, when two or more identifier row candidates still remain (YES in step S211), when there are a plurality of cells containing the same character string in the same row of the table T1, the row is further excluded from the identifier row candidates. (Step S212). As a result, when two or more identifier row candidates still remain (YES in step S213), when there are a plurality of cells including the same character string in the same row of the table T2, the row is further excluded from the identifier row candidates. (Step S214).

この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ２１５でＹＥＳ）、複数識別子対応処理を実行する（ステップＳ２１６）。この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ２１７でＹＥＳ）、注目点（ユーザ指定された注目データ）を含む行をさらに識別子行候補から除外する（ステップＳ２１８）。この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ２１９でＹＥＳ）、再度、複数識別子対応処理を実行する（ステップＳ２２０）。この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ２２１でＹＥＳ）、Ｗｅｂ検索を利用した識別子選択処理を実行する（ステップＳ２２２）。この後、ステップＳ２４２へ進む。なお、上記した複数識別子対応処理（Ｓ２１６，Ｓ２２０）およびＷｅｂ検索を利用した識別子選択処理（Ｓ２２２，Ｓ２４１）については後述する。 As a result, when two or more identifier row candidates still remain (YES in step S215), a multiple identifier handling process is executed (step S216). As a result, if two or more identifier row candidates still remain (YES in step S217), a row including the point of interest (attention data designated by the user) is further excluded from the identifier row candidates (step S218). . As a result, when two or more identifier row candidates still remain (YES in step S219), the multiple identifier handling process is executed again (step S220). As a result, if two or more identifier row candidates still remain (YES in step S221), an identifier selection process using Web search is executed (step S222). Thereafter, the process proceeds to step S242. The multiple identifier handling process (S216, S220) and the identifier selection process (S222, S241) using Web search will be described later.

他方、上記ステップＳ２１１で識別子行候補が２つ未満であった場合（ステップＳ２１１でＮＯ）、識別子行候補が０個であるか判断し（ステップＳ２３０）、識別子行候補が０個ならば（ステップＳ２３０でＹＥＳ）、識別子なしとして処理を終了する（ステップＳ２３１）。ここで、識別子行候補が１個ならば（ステップＳ２３０でＮＯ）、ステップＳ２４２へ進む。
また、上記ステップＳ２１３，Ｓ２１５，Ｓ２１９のいずれかで識別子行候補が２つ未満であった場合（ステップＳ２１３，Ｓ２１５，Ｓ２１９のいずれかでＮＯ）、識別子行候補が０個であるか判断する（ステップＳ２４０）。この判断の結果、識別子行候補が０個となっている場合には（ステップＳ２４０でＹＥＳ）、直前の候補除外処理を取り消した状態に戻して、Ｗｅｂ検索を利用した識別子選択処理を実行する（ステップＳ２４１）。一方、ステップＳ２４０でＮＯならばステップＳ２４２へ進む。
また、上記ステップＳ２１７，Ｓ２２１のいずれかで識別子行候補が２つ未満であった場合は（ステップＳ２１９，Ｓ２２１のいずれかでＮＯ）、ステップＳ２４２へ進む。
次いで、ステップＳ２４２では、上記した処理の結果として選択された識別子行を設定する。 On the other hand, if there are less than two identifier row candidates in step S211 (NO in step S211), it is determined whether there are 0 identifier row candidates (step S230). If YES in S230, the process is terminated with no identifier (step S231). If there is one identifier row candidate (NO in step S230), the process proceeds to step S242.
Further, when there are less than two identifier row candidates in any of the above steps S213, S215, and S219 (NO in any of steps S213, S215, and S219), it is determined whether there are 0 identifier row candidates ( Step S240). If the number of identifier row candidates is zero as a result of this determination (YES in step S240), the previous candidate exclusion process is returned to the canceled state, and the identifier selection process using Web search is executed ( Step S241). On the other hand, if NO in step S240, the process proceeds to step S242.
If there are less than two identifier row candidates in any of the above steps S217 and S221 (NO in any of steps S219 and S221), the process proceeds to step S242.
Next, in step S242, an identifier row selected as a result of the above processing is set.

以上のステップＳ１０５の識別子抽出処理によって、識別子行或いは識別子列が抽出される。 The identifier row or the identifier column is extracted by the identifier extraction process in step S105 described above.

説明を図２に戻す。ステップＳ１０６では、識別子行或いは識別子列が設定されたかを判断し、設定されていない場合には（ステップＳ１０６でＮＯ）、エラーを出力して処理を終了する。一方、識別子行或いは識別子列が設定されている場合には（ステップＳ１０６でＹＥＳ）、ステップＳ１０７へ進む。 Returning to FIG. In step S106, it is determined whether an identifier row or an identifier column has been set. If it has not been set (NO in step S106), an error is output and the process ends. On the other hand, when an identifier row or an identifier column is set (YES in step S106), the process proceeds to step S107.

以下の説明では、上記した識別子抽出処理で識別子列が抽出された場合を例に挙げて説明するが、識別子行が抽出された場合も同様であり、列を行に読み替えればよい。
ステップＳ１０７では、テーブルＴ１から行方向の全部または一部のセルをオブジェクトとして取り出し、各オブジェクトについて識別子列と交差するセルのテキストをそのオブジェクトの識別子（オブジェクト識別子）として割り当てる。 In the following description, the case where an identifier column is extracted by the above-described identifier extraction processing will be described as an example. However, the same applies to the case where an identifier row is extracted, and the column may be read as a row.
In step S107, all or some of the cells in the row direction are extracted from the table T1 as objects, and the text of the cell that intersects the identifier column for each object is assigned as the identifier (object identifier) of the object.

次いで、図３のステップＳ１１０では、注目点を含むテーブルＴ１のオブジェクトのオブジェクト識別子と同じテキストを含む行方向のオブジェクトを、テーブルＴ２から検索する。この検索の結果、該当するオブジェクトが見つからなかった場合は（ステップＳ１１１でＮＯ）、エラーを出力して処理を終了する。一方、該当する行方向のオブジェクトが見つかった場合には、当該オブジェクト中のセルのうち、テーブルＴ１のオブジェクト中の注目点のセルと同じ位置（同じ列）にあるセルを、テーブルＴ２の注目点のセル（注目データを含むセル）として利用者端末へ出力する（ステップＳ１１２）。 Next, in step S110 of FIG. 3, an object in the row direction including the same text as the object identifier of the object in the table T1 including the target point is searched from the table T2. If no corresponding object is found as a result of this search (NO in step S111), an error is output and the process ends. On the other hand, when a corresponding row-direction object is found, a cell in the same position (in the same column) as the cell of the target point in the object of the table T1 among the cells in the object is selected as the target point of the table T2. To the user terminal (step S112).

次に、図６を参照して、上記した複数識別子対応処理（図５のステップＳ２１６，Ｓ２２０）を説明する。以下の説明では、複数の識別子列候補が存在する場合を例に挙げて説明するが、複数の識別子行候補が存在する場合も同様であり、列を行に、また右を下に、また縦を横にそれぞれ読み替えればよい。なお、図６中のカッコ内の記載は複数の識別子行候補が存在する場合に対応している。 Next, with reference to FIG. 6, the above-described multiple identifier handling process (steps S216 and S220 in FIG. 5) will be described. In the following description, a case where there are a plurality of identifier column candidates will be described as an example. However, the same applies when there are a plurality of identifier row candidates. Should be read side by side. The description in parentheses in FIG. 6 corresponds to the case where there are a plurality of identifier row candidates.

図６において、先ず、全ての識別子列の組み合わせについて各類似度を算出する（ステップＳ３０１）。この類似度算出方法は後述する。次いで、類似度と所定の閾値とを比較し、一つでも類似度が閾値以上である場合には（ステップＳ３０２でＹＥＳ）、識別子列同士が類似しており、オブジェクトが複数のオブジェクトに分かれる可能性があるため、ステップＳ３０３へ進み、更なる処理を実行する。
一方、全ての類似度が閾値未満である場合には（ステップＳ３０２でＮＯ）、識別子列同士が類似していないので、そのまま処理を終了する。 In FIG. 6, first, each similarity is calculated for all combinations of identifier strings (step S301). This similarity calculation method will be described later. Next, the similarity is compared with a predetermined threshold, and if at least one similarity is equal to or greater than the threshold (YES in step S302), the identifier strings are similar and the object can be divided into a plurality of objects. Therefore, the process proceeds to step S303, and further processing is executed.
On the other hand, if all the similarities are less than the threshold (NO in step S302), the identifier strings are not similar to each other, and the process is terminated as it is.

次いで、ステップＳ３０３では変数ｉを初期値“１”に設定する。次いで、類似している各識別子列についてｉ個右の列との間の類似度を算出し（ステップＳ３０４）、この類似度と閾値を比較し、閾値未満ならばそのまま処理を終了する（ステップＳ３０５でＮＯ）。一方、閾値以上ならば変数ｉに１加算する（ステップＳ３０６）。そして、ｉ番目の列が識別子列であるか又はｉ個右の列がないならば（ステップＳ３０７でＹＥＳ）ステップＳ３０８へ進み、そうでなければ（ステップＳ３０７でＮＯ）ステップＳ３０４へ戻る。これにより、類似している各識別子列について、隣り合う識別子列に挟まれた全ての列との間の類似度が算出される。次いで、ステップＳ３０８では、それら全ての類似度が閾値以上か否かを判断し、全類似度が閾値以上であればオブジェクトは２つ以上の識別子列によって分割されると設定する。次いで、識別子列から右に（ｉ−１）番目とｉ番目の間でテーブルＴ１を分割し、分割された２つのテーブルを左から順に縦方向につなぎ、新たなテーブルＴ１を生成する（ステップＳ３０９）。同様にテーブルＴ２についても分割後に縦方向につなぎ、新たなテーブルＴ２を生成する（ステップＳ３０９）。 In step S303, the variable i is set to an initial value “1”. Next, the similarity between i-numbered right columns is calculated for each similar identifier column (step S304), the similarity is compared with a threshold value, and if it is less than the threshold value, the process is terminated as it is (step S305). NO). On the other hand, if it is equal to or greater than the threshold value, 1 is added to the variable i (step S306). If the i-th column is an identifier column or there is no i-th right column (YES in step S307), the process proceeds to step S308; otherwise (NO in step S307), the process returns to step S304. As a result, the similarity between all similar identifier columns and all columns sandwiched between adjacent identifier columns is calculated. Next, in step S308, it is determined whether or not all the similarities are equal to or greater than a threshold. If the total similarity is equal to or greater than the threshold, the object is set to be divided by two or more identifier strings. Next, the table T1 is divided between the (i−1) -th and the i-th to the right from the identifier column, and the two divided tables are connected in the vertical direction sequentially from the left to generate a new table T1 (step S309). ). Similarly, the table T2 is connected in the vertical direction after being divided, and a new table T2 is generated (step S309).

次に、図７を参照して、上記した類似度算出処理を説明する。
図７において、先ず、比較対象の一方について、セル内の文字列を取り出し、変数ｉに初期値「１」を設定する（ステップＳ４０１）。次いで、取り出した文字列中のｉ番目の文字を取り出し、この文字が、数値ならば“Ｎ”に、ｉｍｇタグならば“Ｉ”に、ａタグならば“Ａ”に、その他のテキストならば“Ｔ”に、それぞれ置換する（ステップＳ４０３〜Ｓ４０９）。例えば、文字列“３月２３日（火）”は置換列“ＮＴＮＮＴＴＴＴ”に変換される。次いで、変数ｉに１加算し（ステップＳ４１０）、ｉ番目の文字が上記取り出した文字列中にあればステップＳ４０２に戻り（ステップＳ４１１でＹＥＳ）、無ければ比較対象のもう一方についても上記ステップＳ４０１〜Ｓ４１１の処理を行う（ステップＳ４１１でＮＯ、ステップＳ４１２）。次いで、各比較対象の置換列について、ＤＰマッチングを利用して、記号の組み合わせ方と記号列の長さから類似度を算出する。 Next, the similarity calculation process described above will be described with reference to FIG.
In FIG. 7, first, for one of the comparison targets, the character string in the cell is extracted, and an initial value “1” is set to the variable i (step S401). Next, the i-th character in the extracted character string is extracted. If this character is a numerical value, it is “N”, if it is an img tag, it is “I”, if it is an a tag, it is “A”, and if it is other text. Replace with “T” (steps S403 to S409). For example, the character string “March 23 (Tuesday)” is converted to the replacement string “NTNNTTTTT”. Next, 1 is added to the variable i (step S410). If the i-th character is in the extracted character string, the process returns to step S402 (YES in step S411), and if not, the other comparison target is also in step S401. To S411 (NO in step S411, step S412). Next, for each comparison target replacement sequence, the degree of similarity is calculated from the combination of symbols and the length of the symbol sequence using DP matching.

次に、図８を参照して、上記したＷｅｂ検索を利用した識別子選択処理（図５のステップＳ２２２，Ｓ２４１）を説明する。以下の説明では、複数の識別子行候補が存在する場合を例に挙げて説明するが、複数の識別子列候補が存在する場合も同様であり、行を列に、また上を左にそれぞれ読み替えればよい。なお、図８中のカッコ内の記載は複数の識別子列候補が存在する場合に対応している。 Next, an identifier selection process (steps S222 and S241 in FIG. 5) using the above-described Web search will be described with reference to FIG. In the following description, a case where there are a plurality of identifier row candidates will be described as an example, but the same applies when there are a plurality of identifier column candidates, and the rows can be read as columns and the top as left. That's fine. The description in parentheses in FIG. 8 corresponds to the case where there are a plurality of identifier string candidates.

図８において、先ず、複数の識別子行候補のうち、セル内のテキストの文字列の長さがＬ文字を超えるセルを含む行を識別子行候補から除外する（ステップＳ５０１）。Ｌは、制限する文字数の値（例えば１０）である。この候補除外処理によって、説明文などの識別子には不適当なテキストを格納するセルの行が識別子行候補から除外される。 In FIG. 8, first, of the plurality of identifier row candidates, a row including a cell in which the length of the character string of the text in the cell exceeds L characters is excluded from the identifier row candidates (step S501). L is a value of the number of characters to be limited (for example, 10). By this candidate exclusion process, a row of a cell storing text inappropriate for an identifier such as an explanatory note is excluded from the identifier row candidates.

この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ５０２でＹＥＳ）、セル内のテキストが数値のみで且つＭ桁を超えるセルを含む行を識別子行候補から除外する（ステップＳ５０３）。Ｍは、制限する桁数の値（例えば５）である。この候補除外処理によって、人には判別し辛く識別子には不適当な桁数の数値を格納するセルの行が識別子行候補から除外される。 As a result, when two or more identifier row candidates still remain (YES in step S502), a row including a cell whose text in the cell is only numeric and exceeds M digits is excluded from the identifier row candidates ( Step S503). M is a value of the number of digits to be limited (for example, 5). By this candidate exclusion process, a row of a cell that stores a numerical value with a number of digits inappropriate for an identifier, which is difficult for humans to discriminate, is excluded from identifier row candidates.

この結果、まだ２つ以上の識別子行候補が残っている場合には（ステップＳ５０４でＹＥＳ）、ステップＳ５０８へ進む。一方、上記した候補除外処理（Ｓ５０１，Ｓ５０３）によって識別子行候補が２つ未満となった場合（ステップＳ５０２またはＳ５０４でＮＯ）、識別子行候補が０個であるか判断する（ステップＳ５０５）。この判断の結果、識別子行候補が０個となっている場合には（ステップＳ５０５でＹＥＳ）、直前の候補除外処理を取り消した状態に戻して、ステップＳ５０８へ進む。一方、ステップＳ５０５でＮＯならば、つまり識別子行候補が一つに絞られたならばステップＳ５０７へ進み、当該識別子行候補を選択する（ステップＳ５０７）。 As a result, when two or more identifier row candidates still remain (YES in step S504), the process proceeds to step S508. On the other hand, if there are less than two identifier row candidates (NO in step S502 or S504) by the candidate exclusion process (S501, S503), it is determined whether there are 0 identifier row candidates (step S505). As a result of this determination, if there are no identifier line candidates (YES in step S505), the process returns to the state in which the immediately preceding candidate exclusion process is canceled, and the process proceeds to step S508. On the other hand, if “NO” in the step S505, that is, if the identifier row candidates are narrowed down to one, the process proceeds to a step S507 to select the identifier row candidate (step S507).

ステップＳ５０８では、まだ残っている各識別子行候補ごとに、一行に含まれるセル数をカウントする（ステップＳ５０８）。次いで、このカウント値ｎと閾値Ｎ（例えばＮ＝１０）を比較し、カウント値ｎが閾値Ｎ未満であるか否かを判断する（ステップＳ５０９）。この結果に応じて以下の場合分け（ケースＡ，Ｂ）を行い、当該識別子行候補に含まれるセル中のテキストをキーワードとしてＷｅｂ検索を行う。 In step S508, the number of cells included in one row is counted for each remaining identifier row candidate (step S508). Next, the count value n is compared with a threshold value N (for example, N = 10) to determine whether the count value n is less than the threshold value N (step S509). According to this result, the following case classification (cases A and B) is performed, and a Web search is performed using text in a cell included in the identifier row candidate as a keyword.

ケースＡ；
カウント値ｎが閾値Ｎ未満である場合（ステップＳ５０９でＹＥＳ）、当該識別子行候補に含まれる全てのセルをキーワード選択対象とし、キーワード選択対象の一セルに含まれるテキストを一キーワードに選択する。これにより、当該識別子行候補に含まれるセルの個数分のキーワードが選択される。そして、それら選択したセル個数分のキーワードをＷｅｂ検索サイトに送信し、各キーワードごとの検索結果（セル個数分）を受信する（ステップＳ５１０）。このキーワード検索は識別子行候補ごとに行い、識別子行候補ごとの検索結果を得る。 Case A;
When the count value n is less than the threshold value N (YES in step S509), all the cells included in the identifier row candidate are set as keyword selection targets, and the text included in one cell of the keyword selection target is selected as one keyword. As a result, as many keywords as the number of cells included in the identifier row candidate are selected. Then, the keywords for the selected number of cells are transmitted to the Web search site, and the search results (for the number of cells) for each keyword are received (step S510). This keyword search is performed for each identifier row candidate, and a search result for each identifier row candidate is obtained.

ケースＢ；
カウント値ｎが閾値Ｎ以上である場合（ステップＳ５０９でＮＯ）、当該識別子行候補に含まれるセルのうち、上から数えてＮ個のセルをキーワード選択対象とし、キーワード選択対象の一セルに含まれるテキストを一キーワードに選択する。これにより、Ｎ個のキーワードが選択される。そして、それら選択したＮ個のキーワードをＷｅｂ検索サイトに送信し、各キーワードごとの検索結果（Ｎ個）を受信する（ステップＳ５１１）。このキーワード検索は識別子行候補ごとに行い、識別子行候補ごとの検索結果を得る。 Case B;
When the count value n is greater than or equal to the threshold value N (NO in step S509), among the cells included in the identifier row candidate, N cells counted from the top are set as keyword selection targets and included in one cell of the keyword selection target. Selected text as one keyword. As a result, N keywords are selected. Then, the selected N keywords are transmitted to the Web search site, and the search results (N) for each keyword are received (step S511). This keyword search is performed for each identifier row candidate, and a search result for each identifier row candidate is obtained.

次いで、識別子行候補ごとに、検索結果件数の平均値を算出する（ステップＳ５１２）。この検索結果件数の平均値算出処理では、一識別子行候補について、各キーワードの検索結果件数（検索ヒット数）の中から最大値および最小値を除外し、残った検索結果件数の平均値を計算する。最大値および最小値を除外することにより、特異な検索結果件数を除き、適切な平均値を求めることができる。 Next, an average value of the number of search results is calculated for each identifier row candidate (step S512). In the average value calculation process for the number of search results, the maximum and minimum values are excluded from the number of search results for each keyword (number of search hits) for one identifier row candidate, and the average value of the remaining search results is calculated. To do. By excluding the maximum value and the minimum value, it is possible to obtain an appropriate average value excluding the number of unique search results.

次いで、各識別子行候補の検索結果件数の平均値を比較し、最小の平均値となった識別子行候補を選択する（ステップＳ５１３）。ここで、最小の平均値となった識別子行候補を選択する理由は、検索結果件数（検索ヒット数）が少ない方が、当該キーワードの使われ方がより限定されているので、当該キーワードが有する意味の数が少ないからである。つまり、識別子としては、保有する意味の数が少ないものほど、ユーザにとってキーワードの意味を連想しやすく、判別しやすいためである。なお、数値は、使用される場面ごとに意味が異なるので、どのような意味を持ったものであるのかが判別し辛いものである。一方、会社名などの固有名詞は、その意味がほぼ限定されるので、判別しやすいものである。 Next, the average value of the number of search results for each identifier row candidate is compared, and the identifier row candidate having the smallest average value is selected (step S513). Here, the reason why the identifier row candidate having the smallest average value is selected is that the keyword is used because the use of the keyword is more limited when the number of search results (number of search hits) is smaller. This is because the number of meanings is small. That is, as the identifier has a smaller number of meanings, it is easier for the user to associate the meaning of the keyword and to determine it. Note that the numerical value has a different meaning for each scene used, and it is difficult to determine what the meaning is. On the other hand, proper nouns such as company names are easy to distinguish because their meanings are almost limited.

例えば、図９の株価ランキングテーブルでは、上記した図４、図５の識別子抽出処理において、「コード」の行と「名称」の行とが識別子行候補になる。次いで、上記した図８のＷｅｂ検索を利用した識別子選択処理により、各識別子行候補のセル中のテキストをキーワードとしてＷｅｂ検索を行う。この結果、検索結果件数（検索ヒット数）は、「コード」をキーワードとするよりも「名称」をキーワードとした方が少ない。これにより、「名称」の行が識別子行に選択されて、「名称」が識別子となる。この結果、銘柄の名称というユーザに判りやすい識別子を選択することができる。 For example, in the stock price ranking table of FIG. 9, in the identifier extraction process of FIGS. 4 and 5, the “code” row and the “name” row are identifier row candidates. Next, a web search is performed using the text in the cell of each identifier row candidate as a keyword by the identifier selection process using the web search of FIG. 8 described above. As a result, the number of search results (number of search hits) is smaller when “name” is a keyword than when “code” is a keyword. As a result, the “name” line is selected as the identifier line, and “name” becomes the identifier. As a result, it is possible to select an identifier that is easily understood by the user, that is, the name of the brand.

上述したように本実施形態によれば、特定のデータを格納するセルの位置が変動するテーブルにおいて、特定のデータを一意に同定することができる。さらに、Ｗｅｂ検索を利用した識別子選択処理により、ユーザに判りやすい識別情報を選択することができる。 As described above, according to the present embodiment, specific data can be uniquely identified in a table in which the position of a cell storing specific data varies. Furthermore, identification information that can be easily understood by the user can be selected by identifier selection processing using Web search.

図１０は、本発明の一実施形態に係るＷｅｂページモニタリングシステム２０の構成を示すブロック図である。図１０において、Ｗｅｂページモニタリングシステム２０は、図１のテーブル監視装置１、Ｗｅｂページ取得部２１、注目データ指定入力部２２、注目データ記憶部２３、比較部２４、通知部２５及び表示データ作成部２６を備える。 FIG. 10 is a block diagram showing the configuration of the web page monitoring system 20 according to one embodiment of the present invention. 10, the Web page monitoring system 20 includes the table monitoring apparatus 1, the Web page acquisition unit 21, the attention data designation input unit 22, the attention data storage unit 23, the comparison unit 24, the notification unit 25, and the display data generation unit illustrated in FIG. 1. 26.

Ｗｅｂページ取得部２１は、ユーザ指定されたＵＲＬを使用して例示用Ｗｅｂページと解析対象Ｗｅｂページを通信ネットワークを介して取得する。例示用Ｗｅｂページは、注目データ指定入力部２２及びテーブル監視装置１に出力される。解析対象Ｗｅｂページはテーブル監視装置１に出力される。
注目データ指定入力部２２は、Ｗｅｂページ取得部２１により取得された例示用Ｗｅｂページからユーザ操作により注目データを指定する機能を有し、注目データを指定するデータをテーブル監視装置１へ出力する。 The Web page acquisition unit 21 acquires an exemplary Web page and an analysis target Web page via a communication network using a URL specified by the user. The exemplary web page is output to the attention data designation input unit 22 and the table monitoring apparatus 1. The analysis target web page is output to the table monitoring apparatus 1.
The attention data designation input unit 22 has a function of designating attention data by a user operation from the exemplary Web page acquired by the Web page acquisition unit 21, and outputs data specifying the attention data to the table monitoring device 1.

注目データ記憶部２３は、テーブル監視装置１により同定された解析対象Ｗｅｂページのテーブル中の注目データを記憶する。
比較部２４は、テーブル監視装置１により今回得られた解析対象Ｗｅｂページのテーブル中の注目データと、注目データ記憶部２３に前回記憶された注目データとを比較し、この比較結果を通知部２５へ出力する。また、注目データに変化があった場合、新しい注目データを表示データ作成部２６に出力する。 The attention data storage unit 23 stores attention data in the table of the analysis target Web page identified by the table monitoring apparatus 1.
The comparison unit 24 compares the attention data in the table of the analysis target Web page obtained this time by the table monitoring device 1 with the attention data previously stored in the attention data storage unit 23, and the comparison result is notified to the notification unit 25. Output to. Further, when there is a change in the attention data, new attention data is output to the display data creation unit 26.

通知部２５は、比較部２４の比較結果から、解析対象Ｗｅｂページ中の注目データの更新の有無を利用者に対して通知する。例えば、更新有りの場合に電子メールで通知する。このとき、更新された最新の注目データを電子メールに記載してもよい。これにより、テーブル監視装置１を使用してＷｅｂページ中のテーブル内に含まれる特定のデータを監視し、ユーザが注目する特定のデータの更新有りを自動的に検知することができる。 The notification unit 25 notifies the user of whether or not attention data in the analysis target Web page is updated based on the comparison result of the comparison unit 24. For example, when there is an update, notification is made by e-mail. At this time, the updated latest attention data may be described in the e-mail. Thereby, it is possible to monitor the specific data included in the table in the Web page using the table monitoring device 1 and automatically detect the update of the specific data that the user pays attention to.

表示データ作成部２６は、比較部２４から受信した注目データを使用して表示データを作成する。この表示データ作成処理では、ユーザが注目するデータに対応付けて該当する識別子を表示するように表示データを構成する。この識別子は、テーブル監視装置１により注目データの同定に使用されたものであって、ユーザに判りやすいものである。 The display data creation unit 26 creates display data using the attention data received from the comparison unit 24. In this display data creation process, the display data is configured to display the corresponding identifier in association with the data that the user pays attention to. This identifier is used by the table monitoring apparatus 1 to identify the data of interest, and is easily understood by the user.

例えば、図９の株価ランキングテーブルにおいてユーザが指定した注目するセルが「出来高」のセルである場合、注目データは出来高の値となる。そして、識別子と出来高の値の組がテーブル監視装置１から出力される。表示データ作成部２６は、比較部２４から注目データを受信した時に、この受信日時（データ取得日時）を取得する。そして、識別子とデータ取得日時と出来高の値とを対応付けて表示するための表示データを作成する。図１１は、注目データの表示例である。図１１の例では、表示データを表示する端末の画面１００に、＜＞内に識別子として銘柄の名称（例えば、○○工業（株））が表示される。そして、識別子（銘柄の名称）に対応付けて、出来高の値とその取得日時が、“○月×日１１：３０，１２３００”のように取得日時，出来高の値の順で表示される。 For example, if the cell of interest specified by the user in the stock price ranking table of FIG. 9 is a “volume” cell, the data of interest is the value of the volume. Then, a set of identifier and volume value is output from the table monitoring apparatus 1. The display data creation unit 26 acquires this reception date and time (data acquisition date and time) when receiving attention data from the comparison unit 24. Then, display data for displaying the identifier, the data acquisition date and time, and the volume value in association with each other is created. FIG. 11 is a display example of attention data. In the example of FIG. 11, a brand name (for example, XX Industrial Co., Ltd.) is displayed as an identifier in <> on the screen 100 of the terminal that displays the display data. Then, in association with the identifier (name of brand), the value of the volume and the acquisition date / time are displayed in the order of the acquisition date / time and the value of the volume, such as “○ month × day 11:30, 12300”.

このように本実施形態では、テーブル監視装置１がユーザに判りやすい識別子（図１１では、銘柄の名称）を選択する。そして、例えば図１１の例では、その識別子（銘柄の名称）が出来高の値に対応付けられて表示されるので、ユーザは表示されている出来高の値が、どの銘柄のものであるのかを容易に判別することができる。 As described above, in the present embodiment, the table monitoring device 1 selects an identifier (in FIG. 11, the name of the brand) that can be easily understood by the user. For example, in the example of FIG. 11, the identifier (name of brand) is displayed in association with the volume value, so that the user can easily determine which brand the volume value displayed is for. Can be determined.

上述したように本実施形態によれば、図１のテーブル監視装置１を使用してＷｅｂページ中のテーブル内に含まれる特定のデータを監視することにより、ユーザが注目する特定のデータの更新有りを自動的に検知することができる。さらに、該特定データを収集してユーザに対して判りやすい表示を行うことができる。 As described above, according to the present embodiment, specific data that is noticed by the user is updated by monitoring specific data included in the table in the Web page using the table monitoring apparatus 1 of FIG. Can be detected automatically. Furthermore, the specific data can be collected and displayed in an easy-to-understand manner for the user.

なお、図２〜図８に示す各ステップを実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによりテーブル監視処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Note that the table monitoring is performed by recording a program for realizing each step shown in FIGS. 2 to 8 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. Processing may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、本発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design changes and the like within a scope not departing from the gist of the present invention.

本発明の一実施形態に係るテーブル監視装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the table monitoring apparatus 1 which concerns on one Embodiment of this invention. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 図１に示すテーブル監視装置１の処理フロー図である。It is a processing flow figure of the table monitoring apparatus 1 shown in FIG. 監視対象のテーブルの具体例を示す図である。It is a figure which shows the specific example of the table of monitoring object. 本発明の一実施形態に係るＷｅｂページモニタリングシステム２０の構成を示すブロック図である。It is a block diagram which shows the structure of the web page monitoring system 20 which concerns on one Embodiment of this invention. 図１０に示すＷｅｂページモニタリングシステム２０が作成する表示データによる注目データの表示例である。It is an example of a display of attention data by display data which Web page monitoring system 20 shown in Drawing 10 creates.

Explanation of symbols

１…テーブル監視装置、１１…入力部、１２…識別子・オブジェクト抽出部、１３…オブジェクト同定部、１４…注目データ同定部、１５…出力部、１６…例示用Ｗｅｂページ更新部、１７…検索部、１８…識別子選択部、２０…Ｗｅｂページモニタリングシステム、２１…Ｗｅｂページ取得部、２２…注目データ指定入力部、２３…注目データ記憶部、２４…比較部、２５…通知部、２６…表示データ作成部。

DESCRIPTION OF SYMBOLS 1 ... Table monitoring apparatus, 11 ... Input part, 12 ... Identifier / object extraction part, 13 ... Object identification part, 14 ... Attention data identification part, 15 ... Output part, 16 ... Example Web page update part, 17 ... Search part , 18 ... identifier selection section, 20 ... Web page monitoring system, 21 ... Web page acquisition section, 22 ... attention data designation input section, 23 ... attention data storage section, 24 ... comparison section, 25 ... notification section, 26 ... display data Creation department.

Claims

A table monitoring device for monitoring a table constituting a matrix by a plurality of cells each storing data,
An input unit for inputting specification data for specifying the table of interest, the analysis target table, and the attention data in the table of illustration;
An identifier row or identifier column based on the similarity between the data of the example table and the analysis target table is selected from the example table, and an object including data of the identifier row or identifier column is selected from the example table. And an identifier / object extraction means for defining the identifier row or identifier column data as the identifier of the object,
An object identification means for extracting a positive example object including the attention data from the example table and extracting an attention object including an identifier that matches the identifier of the positive example object from the analysis target table;
Attention data identification means for identifying, as attention data, data at the same position as the attention data position in the positive example object from the attention object;
When a plurality of identifier row candidates or identifier column candidates are found in the identifier extraction process performed by the identifier / object extraction means, a keyword is selected from texts included in the plurality of candidate cells, and the keyword is selected as a Web search site. Search means to send to and receive this response;
Based on the response received by the search means, obtain the number of search results for each keyword, compare the number of search results for each keyword, select the optimal identifier row candidate or identifier column candidate based on the comparison result, Identifier selection means for notifying the identifier / object extraction means of an identifier row candidate or an identifier column candidate of the selection result;
A table monitoring apparatus comprising:

2. The search unit according to claim 1, wherein the search unit excludes, from the Web search target, a candidate including a cell in which a length of a character string of a text in a cell exceeds a predetermined number of characters among the plurality of candidates. Table monitoring device.

2. The table according to claim 1, wherein the search unit excludes, from the Web search target, candidates including a cell whose text in a cell is only a numerical value and exceeds a predetermined number of digits, among the plurality of candidates. Monitoring device.

The identifier selecting means calculates an average value of the number of search results for each candidate, compares the average value of the number of search results for each candidate, and selects the candidate having the minimum average value. The table monitoring apparatus according to claim 1.

The identifier selecting means, in the average value calculation process of the search result number, excludes a maximum value and a minimum value from the search result number and calculates an average value of the remaining search result number. Item 5. The table monitoring device according to Item 4.

A web page monitoring system for monitoring a web page having a table that forms a matrix with a plurality of cells each storing data,
Web page acquisition means for acquiring the Web page for analysis and the Web page to be analyzed via a communication network;
Attention data designation input means for designating attention data in the exemplary web page;
The attention data in the table of the analysis target Web page is identified from the table in the example Web page, the table in the analysis target Web page, and the specified data of the attention data. The table monitoring device according to the section;
Storage means for storing attention data identified by the table monitoring device;
Determination means for determining whether or not the attention data in the analysis target Web page is updated from the attention data stored in the storage means and the attention data obtained by the table monitoring device;
Notification means for notifying the user of the determination result of the determination means;
Display data creating means for creating display data for displaying the attention data identified by the table monitoring device and the identifier used for the identification in association with each other;
A web page monitoring system comprising:

A computer program for monitoring a table that forms a matrix with a plurality of cells each storing data,
An example table, an analysis target table, and a function of inputting designation data for specifying attention data in the example table;
An identifier row or identifier column based on the similarity between the data of the example table and the analysis target table is selected from the example table, and an object including data of the identifier row or identifier column is selected from the example table. And an identifier / object extraction function for extracting the data of the identifier row or identifier column as the identifier of the object ,
A function of extracting a positive example object including the attention data from the example table, and extracting a target object including an identifier that matches the identifier of the positive example object from the analysis target table;
A function for identifying, as attention data, data at the same position as the attention data position in the positive example object from the attention object;
When a plurality of identifier row candidates or identifier column candidates are found in the identifier extraction processing performed by the identifier / object extraction function , a keyword is selected from texts included in the plurality of candidate cells, and the keyword is selected as a Web search site. To send and receive this response,
Based on the received response, the number of search results for each keyword is obtained, the number of search results for each keyword is compared, and an optimum identifier row candidate or identifier column candidate is selected based on the comparison result. A function for notifying an identifier row candidate or an identifier column candidate to the identifier / object extraction function;
A computer program for causing a computer to realize the above.