JP2008250941A

JP2008250941A - Document collecting method, document collecting program and document collecting device

Info

Publication number: JP2008250941A
Application number: JP2007094947A
Authority: JP
Inventors: Katsutoshi Iifushi; 勝俊飯伏; Minako Hashimoto; 三奈子橋本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-03-30
Filing date: 2007-03-30
Publication date: 2008-10-16
Anticipated expiration: 2027-03-30
Also published as: JP4825717B2

Abstract

<P>PROBLEM TO BE SOLVED: To make it possible to collect a wide variety of documents for mobile terminals. <P>SOLUTION: A document collecting server collects documents for personal computers from sites for personal computers having close links between the sites, and makes an analysis for extracting only URLs (uniform resource locators) (A) to (D) of documents for mobile terminals by applying a predetermined pattern matching rule (e.g., extraction or the like of URL for a portable version URL http://xxx) to the collected documents for the sites for personal computers utilizing a point of giving notice of sites for mobile terminals at the sites for personal computers, and a link structure between the sites for mobile terminals can be interpolated with information of sites for personal computers (black circles) by collecting documents for mobile terminals based on the URLs of documents for mobile terminals. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集方法、文書収集プログラム及び文書収集装置に関する。 The present invention relates to a document collection method, a document collection program, and a document collection apparatus for collecting documents for mobile terminals by following website links.

近年、携帯電話機、ＰＨＳ、ＰＤＡ等の携帯端末が広範に普及していることから、これらの携帯端末を通じてインターネットへのアクセスが頻繁に行われるようになってきており、インターネット上において多種多様な携帯端末向けサイトが提供されている。 In recent years, since mobile terminals such as mobile phones, PHS, and PDAs have become widespread, access to the Internet has been frequently made through these mobile terminals. A site for terminals is provided.

これに伴って、検索エンジンの検索対象となる文書を収集したり、情報抽出のための文書を収集したりなど多様な目的に利用するために、インターネット上で公開されている文書をハイパーテキスト中のハイパーリンクを手がかりとして収集する方法が提案されている。 Along with this, documents published on the Internet are used in hypertext to be used for various purposes such as collecting documents to be searched by search engines and collecting documents for extracting information. A method has been proposed for collecting hyperlinks from cues.

例えば、特許文献１には、ウェブページを携帯電話機で表示可能なものであるか判別し、表示可能である場合には、そのウェブページを保存するとともにリンク情報を取得させ、このリンク情報をもとに次のウェブページを取得させて表示可否の判別を再帰的に行い、携帯電話機との適合性が高いウェブページを順次収集するウェブページ収集システムが開示されている。 For example, in Patent Document 1, it is determined whether a web page can be displayed on a mobile phone. If the web page can be displayed, the web page is saved and link information is acquired. In addition, a web page collection system is disclosed in which the next web page is acquired, the display possibility is recursively determined, and web pages highly compatible with a mobile phone are sequentially collected.

特開２００３−２１６５２５号公報JP 2003-216525 A

しかしながら、上記した従来技術（特許文献１）では、収集しようとする文書が携帯端末向けであるかを判別するものの、固定端末向け文書を収集するＷｅｂクローラと同様に、あくまでウェブサイト間でリンクが張られていることを前提にしたものであるため、ウェブサイト間のリンクが疎である携帯端末向けサイトの文書を収集しようとしてもその数には自ずから限界があるという問題があった。 However, in the above-described prior art (Patent Document 1), although it is determined whether a document to be collected is for a portable terminal, a link between websites is just like a Web crawler that collects documents for a fixed terminal. Since it is based on the assumption that there are links, there is a problem that the number of documents for mobile terminal sites where links between websites are sparse is limited.

これを具体的に説明すると、固定端末向けに公開されているハイパーテキストでは、自サイトのみならず他サイトの文書に対しても積極的にハイパーリンクが張られているので、これを手がかりに新規のウェブサイトを検知・収集することができるが、その一方で、携帯端末向けに公開されているハイパーテキストでは、キャリアによる小額課金の仕組みが構築されており、利用者を自サイト内に留めておくことが文書提供者にとって重要となっているため、他サイトの文書に対してハイパーリンクを張ることに消極的となっている。 This will be explained in detail. In the hypertext published for fixed terminals, hyperlinks are actively created not only for the local site but also for documents on other sites. However, in the hypertext published for mobile devices, a mechanism for small-scale billing by carriers has been built, and users are kept within their own site. Since this is important for document providers, it is reluctant to create hyperlinks to documents on other sites.

このように、携帯端末向けサイトにおいては、他サイトに対して張られるリンクが少ないため、ウェブサイトを巡回するだけのＷｅｂクローラでは、未知のウェブサイトを検知することは難しく、多種多様な携帯端末向け文書を収集することはできない。 In this way, in the mobile terminal site, since there are few links to other sites, it is difficult to detect an unknown website with a web crawler that only circulates the website, and a wide variety of mobile terminals. It is not possible to collect documents for use.

以上のことから、携帯端末向けサイトにおいて提供される文書を収集する場合に、いかにして収集できる文書の範囲を拡張し、多種多様な携帯端末向け文書を収集するかが極めて重要な課題となっている。 From the above, when collecting documents provided on a mobile device site, how to expand the range of documents that can be collected and collect a wide variety of mobile device documents is an extremely important issue. ing.

そこで、本発明は、上述した従来技術による課題（問題点）を解消するためになされたものであり、携帯端末向けサイトにおいて提供される文書を収集する場合に、収集できる文書の範囲を拡張し、もって多種多様な携帯端末向け文書を収集することができる文書収集方法、文書収集プログラム及び文書収集装置を提供することを目的とする。 Therefore, the present invention has been made to solve the above-described problems (problems) of the prior art, and expands the range of documents that can be collected when collecting documents provided on a mobile terminal site. Accordingly, an object of the present invention is to provide a document collection method, a document collection program, and a document collection apparatus that can collect a wide variety of documents for portable terminals.

上述した課題を解決し、目的を達成するために、本発明に係る文書収集方法は、コンピュータが、ウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集方法であって、前記コンピュータが、前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集工程と、前記固定端末向け文書収集工程により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析工程と、前記文書解析工程により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集工程と、を実行することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, a document collection method according to the present invention is a document collection method in which a computer collects documents for mobile terminals by following links on websites, and the computer , A document collection process for a fixed terminal that collects documents for a fixed terminal from a site for a fixed terminal in the website, and position information of the document for a mobile terminal from the document for the fixed terminal collected by the document collection process for the fixed terminal A document analysis process for analyzing the mobile terminal, and a mobile terminal document collection process for collecting the mobile terminal document based on the location information of the mobile terminal document analyzed by the document analysis process. And

また、本発明に係る文書収集方法は、前記固定端末向け文書収集工程は、前記固定端末向け文書に含まれる画像をさらに収集し、前記文書解析工程は、前記固定端末向け文書収集工程により前記固定端末向け文書として画像が収集された場合に、当該画像に埋め込まれた所定のコードから前記携帯端末向け文書の位置情報を解析することを特徴とする。 In the document collection method according to the present invention, the fixed terminal document collection step further collects images included in the fixed terminal document, and the document analysis step includes the fixed terminal document collection step. When an image is collected as a document for a terminal, position information of the document for a portable terminal is analyzed from a predetermined code embedded in the image.

また、本発明に係る文書収集方法は、前記文書解析工程は、前記固定端末向け文書収集工程により収集された固定端末向け文書から前記携帯端末のメールアドレスの入力フォームを解析し、前記コンピュータが、さらに、前記固定端末向け文書収集工程により前記固定端末向け文書が収集された固定端末向けサイトに所定のメールアドレスを送信するメールアドレス送信工程と、前記固定端末向けサイトにより応答メールが応答された携帯端末から当該応答メールを取得するメール取得工程と、前記メール取得工程により取得された応答メールから前記携帯端末向け文書の位置情報を解析するメール解析工程とを実行することを特徴とする。 Further, in the document collection method according to the present invention, the document analysis step analyzes the input form of the e-mail address of the portable terminal from the document for the fixed terminal collected by the document collection step for the fixed terminal, and the computer Further, an e-mail address transmitting step of transmitting a predetermined e-mail address to the site for the fixed terminal from which the document for the fixed terminal is collected by the document collecting step for the fixed terminal; A mail acquisition step of acquiring the response mail from the terminal and a mail analysis step of analyzing position information of the document for the portable terminal from the response mail acquired by the mail acquisition step are performed.

また、本発明に係る文書収集方法は、前記携帯端末向け文書収集工程は、前記携帯端末を表すユーザエージェントを用いて、前記固定端末向け文書収集工程により収集された固定端末向け文書の位置情報にアクセスすることを特徴とする。 Further, in the document collection method according to the present invention, the document collection step for the portable terminal uses the user agent representing the portable terminal to store the position information of the document for the fixed terminal collected in the document collection step for the fixed terminal. It is characterized by access.

また、本発明に係る文書収集方法は、前記コンピュータが、さらに、前記携帯端末向け文書収集工程により前記携帯端末を表すユーザエージェントを用いて収集された文書が前記固定端末向け文書収集工程により収集された固定端末向け文書と同一の内容であるか否かを判定する重複可否判定工程を実行するとともに、前記携帯端末向け文書収集工程は、前記重複可否判定工程により同一の内容であると判定された場合に、当該文書を破棄することを特徴とする。 In the document collection method according to the present invention, the computer further collects documents collected using the user agent representing the portable terminal by the document collection step for portable terminals by the document collection step for fixed terminals. The duplication possibility determination step for determining whether or not the content is the same as the document for the fixed terminal is executed, and the document collection step for the portable terminal is determined to have the same content by the duplication possibility determination step. In this case, the document is discarded.

また、本発明に係る文書収集プログラムは、コンピュータにウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集処理を実行させる文書収集プログラムであって、前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集手順と、前記固定端末向け文書収集手順により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析手順と、前記文書解析手順により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集手順と、をコンピュータに実行させることを特徴とする。 A document collection program according to the present invention is a document collection program for causing a computer to execute a document collection process for collecting documents for mobile terminals by following website links. Document collection procedure for fixed terminal for collecting documents for fixed terminal, document analysis procedure for analyzing position information of document for portable terminal from documents for fixed terminal collected by document collection procedure for fixed terminal, and document analysis It is characterized by causing a computer to execute a document collection procedure for a portable terminal that collects the document for a portable terminal based on the position information of the document for a portable terminal analyzed by the procedure.

また、本発明に係る文書収集装置は、ウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集装置であって、前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集手段と、前記固定端末向け文書収集手段により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析手段と、前記文書解析手段により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集手段と、を備えたことを特徴とする。 The document collection device according to the present invention is a document collection device that collects documents for mobile terminals by following website links, and is a fixed device that collects documents for fixed terminals from a site for fixed terminals among the websites. Document collecting means for terminal, document analyzing means for analyzing position information of document for portable terminal from documents for fixed terminal collected by document collecting means for fixed terminal, and for portable terminal analyzed by said document analyzing means And a portable terminal document collecting unit that collects the portable terminal document based on the document position information.

本発明によれば、ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集し、該収集した固定端末向け文書から携帯端末向け文書の位置情報を解析し、該解析した携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集するように構成したので、固定端末向けサイトの情報で携帯端末向けサイト間のリンク構造を補間することができ、リンクされずに点在する広範な携帯端末向けサイトから多種多様な携帯端末向け文書を収集することが可能になる。 According to the present invention, a document for a fixed terminal is collected from a site for a fixed terminal among websites, position information of the document for a portable terminal is analyzed from the collected document for a fixed terminal, and the analyzed document for the portable terminal is analyzed. Since it is configured to collect documents for mobile terminals based on location information, the link structure between sites for mobile terminals can be interpolated with information on sites for fixed terminals, and it is scattered without being linked. A wide variety of mobile terminal documents can be collected from a wide range of mobile terminal sites.

また、本発明によれば、固定端末向け文書に含まれる画像をさらに収集し、固定端末向け文書として画像を収集した場合に、当該画像に埋め込まれた所定のコードから携帯端末向け文書の位置情報を解析するように構成したので、固定端末向けサイトが携帯端末向けサイトの告知を画像に埋め込んでいる場合でも、携帯端末向け文書の位置情報を抽出することができ、収集できる携帯端末向け文書の範囲を効果的に拡張することが可能になる。 In addition, according to the present invention, when the images included in the document for fixed terminals are further collected and the images are collected as the documents for fixed terminals, the position information of the document for portable terminals is obtained from the predetermined code embedded in the images. Therefore, even if the site for the fixed device embeds the notification of the site for the mobile device in the image, the location information of the document for the mobile device can be extracted and collected. The range can be effectively expanded.

また、本発明によれば、収集された固定端末向け文書から携帯端末のメールアドレスの入力フォームを解析し、固定端末向け文書が収集された固定端末向けサイトに所定のメールアドレスを送信し、応答メールが応答された携帯端末から当該応答メールを取得し、該取得した応答メールから携帯端末向け文書の位置情報を解析するように構成したので、固定端末向けサイトによる携帯端末向けサイトの告知が利用者の入力操作を伴うものであったとしても、携帯向け端末文書の位置情報を自動的に抽出することができ、収集できる携帯端末向け文書の範囲を実効的に拡張することが可能になる。 Further, according to the present invention, the input form of the mobile terminal e-mail address is analyzed from the collected document for the fixed terminal, the predetermined e-mail address is transmitted to the site for the fixed terminal from which the document for the fixed terminal is collected, and the response Since the response mail is acquired from the mobile terminal to which the mail is responded and the position information of the document for the mobile terminal is analyzed from the acquired response mail, the notification of the site for the mobile terminal by the site for the fixed terminal is used. Even if the user's input operation is involved, the position information of the portable terminal document can be automatically extracted, and the range of portable terminal documents that can be collected can be effectively expanded.

また、本発明によれば、携帯端末を表すユーザエージェントを用いて、収集した固定端末向け文書の位置情報にアクセスするように構成したので、アクセス元の装置種別に応じて提供する文書が異なるウェブサイトからでも携帯端末向け文書を収集することができ、収集できる携帯端末向け文書の範囲を多面的に拡張することが可能になる。 In addition, according to the present invention, the user agent representing the mobile terminal is used to access the collected location information of the document for the fixed terminal, so that the document to be provided differs depending on the access source device type. Documents for mobile terminals can be collected even from sites, and the range of documents for mobile terminals that can be collected can be expanded in many ways.

また、本発明によれば、携帯端末を表すユーザエージェントを用いて収集された文書が収集した固定端末向け文書と同一の内容であるか否かを判定し、同一の内容であると判定した場合に、当該文書を破棄するように構成したので、携帯端末として取得した文書であっても固定端末向け文書と同一の余分な情報を破棄することが可能になる。 In addition, according to the present invention, it is determined whether or not the document collected using the user agent representing the mobile terminal has the same content as the collected document for the fixed terminal, and it is determined that the document has the same content In addition, since the document is configured to be discarded, it is possible to discard the same extra information as the document for the fixed terminal even if the document is acquired as a portable terminal.

以下に添付図面を参照して、本発明に係る文書収集装置（文書収集方法）の好適な実施例を詳細に説明する。なお、以下では、本発明に係る文書収集サーバを実施例１及び２として説明した後に、本発明に含まれる他の実施例を実施例３として説明する。 Exemplary embodiments of a document collection apparatus (document collection method) according to the present invention will be described below in detail with reference to the accompanying drawings. In the following description, the document collection server according to the present invention will be described as the first and second embodiments, and then another embodiment included in the present invention will be described as the third embodiment.

以下の実施例１では、実施例１に係る文書収集サーバの概要および特徴、この文書収集サーバの構成および処理の流れを順に説明する。 In the following first embodiment, the outline and features of the document collection server according to the first embodiment, the configuration of the document collection server, and the flow of processing will be described in order.

［概要および特徴（実施例１）］
まず最初に、図１及び図２を用いて、本実施例１に係る文書収集サーバの概要および特徴を説明する。図１は、実施例１に係る文書収集サーバの概要を説明するための説明図であり、図２は、実施例１に係る文書収集サーバの特徴を説明するための概念図である。 [Outline and Features (Example 1)]
First, the outline and features of the document collection server according to the first embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 is an explanatory diagram for explaining an outline of the document collection server according to the first embodiment, and FIG. 2 is a conceptual diagram for explaining features of the document collection server according to the first embodiment.

図１に示すように、この文書収集サーバ１０は、ウェブサイトのリンクをたどって携帯端末向け文書を収集することを概要とするものであり、ウェブ上に公開される多種多様な文書を提供する文書提供サーバ３０とインターネット１を介して接続されるとともに、携帯端末向け文書の収集を支援するための携帯電話機５０とケーブル３を介して接続されている。 As shown in FIG. 1, this document collection server 10 outlines collecting documents for mobile terminals by following website links, and provides a wide variety of documents published on the web. In addition to being connected to the document providing server 30 via the Internet 1, it is also connected via a cable 3 to a mobile phone 50 for supporting collection of documents for mobile terminals.

そして、この文書提供サーバ３０には、固定端末向け文書（以下、ＰＣ向け文書）を提供する文書提供サーバ３０Ａと、携帯端末向け文書（以下、携帯向け文書）を提供する文書提供サーバ３０Ｂと、これらＰＣ向け文書及び携帯端末向け文書の両者を提供する文書提供サーバ３０Ｃとが存在する。 The document providing server 30 includes a document providing server 30A that provides a document for a fixed terminal (hereinafter, a document for PC), a document providing server 30B that provides a document for a portable terminal (hereinafter, portable document), There is a document providing server 30C that provides both the PC document and the portable terminal document.

ここで、本実施例１に係る文書収集サーバ１０は、ウェブ上に公開されるウェブサイトのうちＰＣ向けサイトからＰＣ向け文書を収集し、該収集したＰＣ向け文書から携帯向け文書の位置情報を解析し、該解析した携帯向け文書の位置情報をもとに、当該携帯向け文書を収集する点にその特徴がある。 Here, the document collection server 10 according to the first embodiment collects the PC document from the PC site among the websites published on the Web, and obtains the position information of the portable document from the collected PC document. This is characterized in that the portable document is collected based on the analyzed position information of the portable document.

すなわち、本実施例１では、携帯向けサイトの告知がＰＣ向けサイトで行われていることに着眼し、ＰＣ向けサイト内に記述されている携帯向けサイトへのＵＲＬを通じて携帯向け文書を収集することで、携帯向けサイト間において疎であるリンク構造を補間することができるようにしている。 That is, in the first embodiment, attention is paid to the fact that the site for the mobile site is being notified on the PC site, and the mobile document is collected through the URL to the mobile site described in the PC site. Therefore, it is possible to interpolate a sparse link structure between mobile sites.

これを図２を用いて説明すると、文書収集サーバ１０は、ＰＣ向けサイト（白丸）はサイト間のリンクが密であり、既存のＷｅｂクローラ等を用いてＰＣ向け文書を収集するのは容易であるため、ＰＣ向けサイトからＰＣ向け文書をまず収集しておく。その一方で、携帯向けサイト（黒丸）は、利用者の自サイトへの囲い込みが行われているので、サイト間のリンクが疎となり、単に携帯向けサイト内のリンクをたどっていたのでは他サイトの文書を収集することはできない。 This will be explained with reference to FIG. 2. The document collection server 10 has a dense link between sites for PC sites (white circles), and it is easy to collect documents for PCs using an existing Web crawler or the like. Therefore, PC documents are first collected from the PC site. On the other hand, since the mobile site (black circle) is enclosed in the user's own site, the links between sites are sparse, and simply following the link in the mobile site is another site. Documents cannot be collected.

そこで、文書収集サーバ１０では、携帯向けサイトの告知がＰＣ向けサイトで行われている点を利用して、収集しておいたＰＣ向けサイトの文書に所定のパターンマッチングルール（例えば、携帯版ＵＲＬｈｔｔｐ：／／・・・ならばそのＵＲＬを抽出等）を適用して携帯向け文書のＵＲＬ（イ）〜（ニ）だけを抽出する解析を行い、その携帯向け文書のＵＲＬをもとに文書提供サーバ３０から携帯向け文書を収集する。例えば、ＰＣ向けサイトにおいては、自サイトの携帯端末版の文書位置情報（ＵＲＬ等のアドレス）が利用者に告知されていたり、また、ニュースサイトにおいては、携帯端末向けの新規なサービスや話題のサービスが記事として掲載されるとともにそのＵＲＬが利用者に告知されている。 Therefore, the document collection server 10 uses the fact that the notification of the mobile site is performed on the PC site, and uses a predetermined pattern matching rule (for example, mobile version URL) on the collected PC site document. http: // extracting the URL, etc.) to extract only the URLs (i) to (d) of the portable document, and the document based on the URL of the portable document A portable document is collected from the providing server 30. For example, in the site for PC, the document position information (address such as URL) of the mobile terminal version of its own site is notified to the user, and in the news site, a new service or topic for mobile terminals is announced. The service is posted as an article and its URL is notified to the user.

したがって、本実施例１では、ＰＣ向けサイトからＰＣ向け文書を収集しておき、そのＰＣ向けサイト内に記述されている携帯向けサイトへの文書位置情報を通じて携帯向け文書を収集するように構成したので、ＰＣ向けサイトの情報で携帯向けサイト間のリンク構造を補間することができ、特定の携帯向けサイトの文書だけでなく、リンクされずに点在する広範な携帯向けサイトから多種多様な携帯向け文書を収集することが可能になる。 Therefore, in the first embodiment, the PC document is collected from the PC site, and the mobile document is collected through the document position information to the mobile site described in the PC site. Therefore, it is possible to interpolate the link structure between mobile sites with PC site information, and not only from specific mobile site documents, but also from a wide variety of mobile sites scattered without being linked. It becomes possible to collect documents for.

［文書収集サーバの構成］
続いて、図３を用いて、本実施例１に係る文書収集サーバの構成を説明する。図３は、実施例１に係る文書収集サーバの構成を示す機能ブロック図である。同図に示すように、この文書収集サーバ１０は、インターネット１又はケーブル３との間で各種通信を制御するためのインタフェースである通信制御ＩＦ部１１と、ＰＣ向け文書管理ＤＢ１２と、携帯向け文書管理ＤＢ１３と、制御部１４とを備える。 [Configuration of document collection server]
Next, the configuration of the document collection server according to the first embodiment will be described with reference to FIG. FIG. 3 is a functional block diagram illustrating the configuration of the document collection server according to the first embodiment. As shown in the figure, the document collection server 10 includes a communication control IF unit 11 that is an interface for controlling various communications with the Internet 1 or the cable 3, a PC document management DB 12, and a portable document. A management DB 13 and a control unit 14 are provided.

ＰＣ向け文書管理ＤＢ１２は、パーソナルコンピュータ等の固定端末向けのウェブページ（ＰＣ向け文書）を管理するためのデータベースであり、具体的には、ＰＣ向け文書ＩＤごとに、ＰＣ向け文書の文書位置情報、収集状況フラグ、解析状況フラグ及び文書のソースコードを対応付けて記憶している。例えば、図４に示す例では、ＰＣ向け文書ＩＤ「００００１」の文書のソースコードが収集済みであり、かつ未解析である状況を示している。なお、この収集状況フラグは、ＰＣ向け文書が収集済みであるか、未収集であるか、或いは、取得失敗であるかを示すフラグであり、収集済みを「１」、未収集を「０」、取得失敗を「−１」で表しており、また、解析状況フラグは、ＰＣ向け文書が解析済みであるか、未解析であるか、或いは、解析不要であるのかを示すフラグであり、解析済みを「１」、未解析を「０」、解析不要「−１」で表している。 The PC document management DB 12 is a database for managing a web page (PC document) for a fixed terminal such as a personal computer. Specifically, for each PC document ID, document position information of the PC document. The collection status flag, the analysis status flag, and the document source code are stored in association with each other. For example, the example shown in FIG. 4 shows a situation where the source code of the document with the document ID “00001” for PC has been collected and has not been analyzed. This collection status flag is a flag indicating whether a PC document has been collected, has not been collected, or has failed to be acquired. “1” indicates that the document has been collected and “0” indicates that the document has not been collected. The acquisition failure is represented by “−1”, and the analysis status flag is a flag indicating whether the PC document has been analyzed, has not been analyzed, or analysis is not necessary. "1", "0" for unanalyzed, and "-1" for analysis unnecessary.

携帯向け文書管理ＤＢ１３は、携帯電話機、ＰＨＳ、ＰＤＡ等の携帯端末向けのウェブページ（携帯向け文書）を管理するためのデータベースであり、具体的には、携帯向け文書ＩＤごとに、携帯向け文書の文書位置情報、収集状況フラグ、Ｕｓｅｒ−Ａｇｅｎｔ及び文書のソースコードを対応付けて記憶している。なお、この収集状況フラグは、携帯向け文書が収集済みであるか、未収集であるか、或いは、取得失敗であるかを示すフラグであり、収集済みを「１」、未収集を「０」、取得失敗を「−１」で表している。 The portable document management DB 13 is a database for managing web pages (portable documents) for portable terminals such as mobile phones, PHSs, PDAs, etc. Specifically, for each portable document ID, a portable document is stored. The document position information, the collection status flag, the User-Agent, and the document source code are stored in association with each other. The collection status flag is a flag indicating whether the portable document has been collected, has not been collected, or has failed to be acquired. , Acquisition failure is represented by “−1”.

例えば、図５に示す例では、携帯向け文書ＩＤ「００００１」及び「００００２」の文書のソースコードについては収集済みであり、また、携帯向け文書ＩＤ「００００３」及び「００００９」の文書のソースコードについては未収集である状況を示している。一方、図６に示す例では、携帯向け文書ＩＤ「００００１」、「００００２」、「００００４」、「００００５」、「００００８」及び「００００９」の文書のソースコードについては収集済みであり、また、携帯向け文書ＩＤ「００００３」、「００００６」及び「００００７」の文書のソースコードについては取得失敗である状況を示している。 For example, in the example shown in FIG. 5, the source codes of the documents with portable document IDs “00001” and “00002” have been collected, and the source codes of the documents with portable document IDs “00003” and “00009” are collected. Indicates the status of not yet collected. On the other hand, in the example shown in FIG. 6, the source codes of the documents with portable document IDs “00001”, “00002”, “00004”, “00005”, “00008”, and “000009” have been collected. The source code of the documents with portable document IDs “00003”, “00006”, and “00007” indicates that acquisition is unsuccessful.

制御部１４は、文書収集サーバ１０を全体制御する制御部であり、ＰＣ向け文書収集部１４ａと、文書解析部１４ｂと、メールアドレス送信処理部１４ｃと、メール取得・解析部１４ｄと、携帯向け文書収集部１４ｅとを有する。実際には、これらの機能部に対応するプログラムを図示しないＲＯＭや不揮発性メモリに記憶しておき、これらのプログラムをＣＰＵにロードして実行し、ＰＣ向け文書収集部１４ａ、文書解析部１４ｂ、メールアドレス送信処理部１４ｃ、メール取得・解析部１４ｄ及び携帯向け文書収集部１４ｅにそれぞれ対応するプロセスを実行させることになる。 The control unit 14 is a control unit that controls the document collection server 10 as a whole, and includes a PC document collection unit 14a, a document analysis unit 14b, a mail address transmission processing unit 14c, a mail acquisition / analysis unit 14d, and a portable unit. And a document collection unit 14e. In practice, programs corresponding to these functional units are stored in a ROM or a non-volatile memory (not shown), and these programs are loaded into the CPU and executed, and the PC document collection unit 14a, document analysis unit 14b, Processes corresponding to the mail address transmission processing unit 14c, the mail acquisition / analysis unit 14d, and the portable document collection unit 14e are executed.

ＰＣ向け文書収集部１４ａは、文書提供サーバ３０Ａ又は３０ＣからＰＣ向け文書を収集する処理部である。具体的には、ＨＴＭＬ中のリンクをたどって文書提供サーバ３０Ａ又は３０ＣからＰＣ向け文書を収集し、該収集したＰＣ向け文書をＰＣ向け文書管理ＤＢ１２に登録する。 The PC document collection unit 14a is a processing unit that collects PC documents from the document providing server 30A or 30C. Specifically, the document for PC is collected from the document providing server 30A or 30C by following the link in HTML, and the collected document for PC is registered in the document management DB 12 for PC.

文書解析部１４ｂは、ＰＣ向け文書管理ＤＢ１２に記憶されたＰＣ向け文書から携帯向け文書の位置情報を解析する処理部である。具体的には、ＰＣ向け文書管理ＤＢ１２に記憶されるＰＣ向け文書に図７に示すパターンマッチングルールを適用して携帯向け文書の位置情報を抽出する解析を行う。例えば、図８に示す例では、パターンマッチングルールの「携帯サイトＵＲＬ．｛０，２０｝（ｈｔｔｐ［ｓ］：￥／￥／・・・」に適合するため、ＵＲＬ「ｈｔｔｐ：／／ｊｐ．ｈｕｊｉｔｓｕ・・・」が抽出される。 The document analysis unit 14 b is a processing unit that analyzes position information of a portable document from a PC document stored in the PC document management DB 12. Specifically, the analysis for extracting the position information of the portable document is performed by applying the pattern matching rule shown in FIG. 7 to the PC document stored in the PC document management DB 12. For example, in the example shown in FIG. 8, the URL matches “http://jp.com” because it matches the pattern matching rule “mobile site URL. {0, 20} (http [s]: ¥ / ¥ /... "hutsutsu ..." is extracted.

また、文書解析部１４ｂは、ＰＣ向け文書管理ＤＢ１２にてＰＣ向け文書として画像が収集されていた場合に、当該画像に埋め込まれた所定のコードから携帯向け文書の位置情報を解析する。例えば、ＰＣ向け文書管理ＤＢ１２から読み出した画像を入力としてＱＲコードのデコードを行い、その結果、デコードが成功し、かつデコードされた文字列に「ｈｔｔｐ」が含まれる場合に、携帯向け文書の位置情報としてそのＵＲＬを携帯向け文書管理ＤＢ１３に追加登録するとともに未収集フラグを立てる。 Further, when an image is collected as a PC document in the PC document management DB 12, the document analysis unit 14 b analyzes the position information of the portable document from a predetermined code embedded in the image. For example, when the QR code is decoded using the image read from the PC document management DB 12 as an input, and as a result, the decoding is successful and the decoded character string includes “http”, the position of the portable document The URL is additionally registered in the portable document management DB 13 as information and an uncollected flag is set.

ここで、かかる画像解析を行う理由は、ＰＣ向けサイトにて携帯向けサイトの告知が行われる場合には、ＵＲＬそのものが記述されているとは限らず、図９に示すように、ＱＲコードが表示されることもあり、かかる場合にも携帯向け文書の位置情報を抽出するためである。 Here, the reason for performing such image analysis is that, when a mobile site is notified on a PC site, the URL itself is not always described, and as shown in FIG. This is because the position information of the portable document may be extracted even in such a case.

メールアドレス送信処理部１４ｃは、文書解析部１４ｂによってＰＣ向け文書から携帯端末のメールアドレスの入力フォームが解析された場合に、該ＰＣ向けサイトに所定のメールアドレスを送信する処理部である。例えば、図１０に示すように、文書解析部１４ｂによってＰＣ向け文書中のｆｏｒｍから文字列の自由入力要素と携帯端末のメールドメイン選択要素が解析された場合に、図示しないメールアドレス一覧リストのうち、フォーム中のメールドメイン選択要素に該当するものを文書提供装置３０Ａ又は３０Ｃに送信する。なお、ここでは、入力フォームの中に文字列の自由入力要素と携帯端末のメールドメイン選択要素の両方が含まれる例を説明したが、入力フォームに自由入力要素だけを含む場合にも本発明を同様に適用することができる。 The e-mail address transmission processing unit 14c is a processing unit that transmits a predetermined e-mail address to the PC-oriented site when the document analysis unit 14b analyzes the input form of the e-mail address of the portable terminal from the PC-oriented document. For example, as shown in FIG. 10, when the free input element of the character string and the mail domain selection element of the mobile terminal are analyzed from the form in the document for PC by the document analysis unit 14b, Then, the information corresponding to the mail domain selection element in the form is transmitted to the document providing apparatus 30A or 30C. Here, an example in which both the free input element of the character string and the mail domain selection element of the mobile terminal are included in the input form has been described, but the present invention is also applied to the case where the input form includes only the free input element. The same can be applied.

メール取得・解析部１４ｄは、文書提供装置３０Ａ又は３０Ｃにより応答メールが応答された携帯電話機５０から当該応答メールを取得して携帯向け文書の位置情報を解析する処理部である。具体的には、携帯電話機５０から応答メールのバックアップを取得し、該取得した応答メールの中から、「ｈｔｔｐ」で始まり、かつ空白もしくは改行で終わる文字列を抽出する解析を行う。 The mail acquisition / analysis unit 14d is a processing unit that acquires the response mail from the mobile phone 50 to which the response mail is responded by the document providing device 30A or 30C and analyzes the position information of the portable document. Specifically, a backup of the response mail is acquired from the mobile phone 50, and an analysis is performed to extract a character string starting with “http” and ending with a blank or a line feed from the acquired response mail.

ここで、メールアドレスの送信、その応答メールの取得・解析を行うこととしたのは、ＰＣ向けサイトにて携帯向けサイトの告知が行われる場合には、ＵＲＬそのものが記述される他、図１１に示すように、利用者が自身のメールアドレスを入力して初めて携帯向けサイトの位置情報が得られるケースもあり、かかるケースにも携帯向け文書の位置情報を自動的に抽出するためである。 Here, the transmission of the e-mail address and the acquisition / analysis of the response e-mail are performed because the URL itself is described when the mobile site is notified on the PC site, as shown in FIG. In some cases, the location information of the portable site can be obtained only after the user inputs his / her e-mail address. In this case, the location information of the portable document is automatically extracted.

携帯向け文書収集部１４ｅは、携帯向け文書管理ＤＢ１３に記憶された携帯向け文書の位置情報をもとに当該携帯端末向け文書を収集する処理部である。具体的には、携帯向け文書管理ＤＢ１３に記憶された携帯向け文書ＩＤのうち、収集状況フラグとして未収集フラグ「０」が格納されているものの文書位置情報にアクセスし、文書提供サーバ３０Ｂ又は３０Ｃから当該携帯向け文書を取得する。 The portable document collection unit 14e is a processing unit that collects documents for portable terminals based on position information of portable documents stored in the portable document management DB 13. Specifically, among the portable document IDs stored in the portable document management DB 13, access is made to the document position information of the uncollected flag “0” stored as the collection status flag, and the document providing server 30B or 30C is accessed. To obtain the mobile document.

［処理の流れ］
次に、本実施例１に係る文書収集サーバ１０の各種処理の手順を説明する。なお、ここでは、（１）ＰＣ向け文書収集処理、（２）文書解析処理、（３）画像解析処理、（４）メールアドレス送信処理、（５）メール取得・解析処理、（６）携帯向け文書収集処理の順に説明する。 [Process flow]
Next, procedures of various processes of the document collection server 10 according to the first embodiment will be described. Here, (1) PC document collection processing, (2) document analysis processing, (3) image analysis processing, (4) email address transmission processing, (5) email acquisition / analysis processing, (6) mobile phone use The document collection process will be described in this order.

（１）ＰＣ向け文書収集処理
前述したように、ここでは、図１２を用いて、本実施例１に係るＰＣ向け文書収集処理を説明する。図１２は、実施例１に係るＰＣ向け文書収集処理の手順を示すフローチャートである。同図に示すように、ＰＣ向け文書収集部１４ａは、まず起点文書位置情報をＰＣ向け文書管理ＤＢ１２に追加する（ステップＳ１２０１）。 (1) PC Document Collection Processing As described above, here, the PC document collection processing according to the first embodiment will be described with reference to FIG. FIG. 12 is a flowchart illustrating a procedure of document collection processing for PC according to the first embodiment. As shown in the figure, the PC document collection unit 14a first adds the origin document position information to the PC document management DB 12 (step S1201).

続いて、ＰＣ向け文書収集部１４ａは、ＰＣ向け文書管理ＤＢ１２から未収集フラグがオンである文書位置情報を取得する（ステップＳ１２０２）。このとき、未収集フラグがオンである文書位置情報がなければ、全ての文書位置情報に対応するＰＣ向け文書を収集していることになるので文書位置情報の取得に失敗し（ステップＳ１２０３肯定）、処理を終了する。 Subsequently, the PC document collection unit 14a acquires document position information whose uncollected flag is on from the PC document management DB 12 (step S1202). At this time, if there is no document position information in which the uncollected flag is on, the PC documents corresponding to all the document position information are collected, so acquisition of the document position information fails (Yes in step S1203). The process is terminated.

ここで、未収集フラグがオンである文書位置情報を取得できた場合（ステップＳ１２０３否定）には、ＰＣ向け文書収集部１４ａは、取得した文書位置情報にインターネット１を介してアクセスする（ステップＳ１２０４）。 If the document position information whose uncollected flag is ON can be acquired (No at Step S1203), the PC document collection unit 14a accesses the acquired document position information via the Internet 1 (Step S1204). ).

このとき、文書提供サーバ３０Ａ又は３０Ｃから当該文書位置情報に対応するＰＣ向け文書を取得できれば（ステップＳ１２０５肯定）、ＰＣ向け文書収集部１４ａは、取得したＰＣ向け文書、収集済みフラグ及び未解析フラグを保存するとともに（ステップＳ１２０６）、そのＰＣ向け文書からリンクされている文書ならびに参照されている画像の位置情報を取得し（ステップＳ１２０７）、該取得した未登録の文書位置情報及び未収集フラグをＰＣ向け文書管理ＤＢ１２に追加登録する（ステップＳ１２０８）。 At this time, if the PC document corresponding to the document position information can be acquired from the document providing server 30A or 30C (Yes in step S1205), the PC document collection unit 14a acquires the acquired PC document, the collected flag, and the unanalyzed flag. (Step S1206), the linked document and the position information of the referenced image are acquired from the PC document (step S1207), and the acquired unregistered document position information and uncollected flag are obtained. It is additionally registered in the PC document management DB 12 (step S1208).

一方、文書提供サーバ３０Ａ又は３０Ｃから当該文書位置情報に対応するＰＣ向け文書を取得できなければ（ステップＳ１２０５否定）、ＰＣ向け文書収集部１４ａは、取得失敗フラグ及び解析不要フラグをＰＣ向け文書管理ＤＢ１２に保存する（ステップＳ１２０９）。 On the other hand, if the PC document corresponding to the document position information cannot be acquired from the document providing server 30A or 30C (No in step S1205), the PC document collection unit 14a sets the acquisition failure flag and the analysis unnecessary flag to the PC document management. Save in the DB 12 (step S1209).

なお、この「ＰＣ向け文書収集処理」は、未収集フラグがオンである文書位置情報がなくなるまで（ステップＳ１２０３否定）、ＰＣ向け文書収集部１４ａによりステップＳ１２０２〜ステップＳ１２０９までの処理が繰り返し行われる。 In this “PC document collection processing”, the processing from step S1202 to step S1209 is repeatedly performed by the PC document collection unit 14a until there is no document position information whose uncollected flag is on (No in step S1203). .

（２）文書解析処理
次に、図１３を用いて、本実施例１に係る文書解析処理を説明する。図１３は、実施例１に係る文書解析処理の手順を示すフローチャートである。同図に示すように、文書解析部１４ｂは、ＰＣ向け文書管理ＤＢ１２から未解析フラグがオンであるＰＣ向け文書を取得する（ステップＳ１３０１）。 (2) Document Analysis Processing Next, the document analysis processing according to the first embodiment will be described with reference to FIG. FIG. 13 is a flowchart illustrating the procedure of document analysis processing according to the first embodiment. As shown in the figure, the document analysis unit 14b acquires a PC document whose unanalyzed flag is on from the PC document management DB 12 (step S1301).

ここで、未解析フラグがオンであるＰＣ向け文書がなければ、全てのＰＣ向け文書に対して解析を行っていることになるのでＰＣ向け文書の取得に失敗し（ステップＳ１３０２肯定）、処理を終了する。 Here, if there is no PC document with the unanalyzed flag turned on, all PC documents have been analyzed, so acquisition of PC documents has failed (Yes in step S1302), and the processing is performed. finish.

また、未解析フラグがオンであるＰＣ向け文書を取得できた場合（ステップＳ１３０２否定）には、文書解析部１４ｂは、取得したＰＣ向け文書が画像であるか否かを判定し、それが画像でなければ（ステップＳ１３０３否定）、ＰＣ向け文書管理ＤＢ１２に記憶されるＰＣ向け文書に図７に示すパターンマッチングルールを適用して携帯向け文書の位置情報を抽出し（ステップＳ１３０４）、一方、取得したＰＣ向け文書が画像であれば（ステップＳ１３０３肯定）、後述する「画像解析処理」を行ってＱＲコード等に埋め込まれた携帯向け文書の位置情報を抽出する（ステップＳ１３０５）。 If the PC document having the unanalyzed flag turned on can be acquired (No in step S1302), the document analysis unit 14b determines whether the acquired PC document is an image, and this is an image. If not (No in step S1303), the pattern matching rule shown in FIG. 7 is applied to the PC document stored in the PC document management DB 12 to extract the position information of the portable document (step S1304). If the PC document is an image (Yes at step S1303), “image analysis processing” described later is performed to extract position information of the portable document embedded in the QR code or the like (step S1305).

その後、文書解析部１４ｂは、このようにしてパターンマッチングルールを用いて抽出した携帯向け文書の位置情報もしくは「画像解析処理」を通じて抽出した携帯向け文書の位置情報と未収集フラグを携帯向け文書管理ＤＢ１３に追加する（ステップＳ１３０６）。 Thereafter, the document analysis unit 14b manages the position information of the portable document extracted using the pattern matching rule in this way or the position information of the portable document extracted through the “image analysis process” and the uncollected flag. It adds to DB13 (step S1306).

そして、文書解析部１４ｂは、ＰＣ向け文書中のｆｏｒｍに文字列の自由入力要素と携帯端末のメールドメイン選択要素が含まれているか否かを判定し（ステップＳ１３０７）、これらが含まれていれば（ステップＳ１３０７肯定）、後述する「メールアドレス送信処理」を行って当該ＰＣ向け文書を提供する文書提供サーバ３０に所定のメールアドレスを送信する（ステップＳ１３０８）。 Then, the document analysis unit 14b determines whether or not the form in the PC document includes the free input element of the character string and the mail domain selection element of the mobile terminal (step S1307). If (Yes at step S1307), a “mail address transmission process” described later is performed to transmit a predetermined mail address to the document providing server 30 that provides the document for the PC (step S1308).

続いて、文書解析部１４ｂは、上記したステップＳ１３０１〜ステップＳ１３０８の処理にて解析を行った文書位置情報のレコードに解析済みフラグを設定し（ステップＳ１３０９）、再度、ＰＣ向け文書管理ＤＢ１２から未解析フラグがオンであるＰＣ向け文書を取得する（ステップＳ１３０１）。 Subsequently, the document analysis unit 14b sets an analyzed flag in the document position information record analyzed in the above-described processing in steps S1301 to S1308 (step S1309), and again from the PC document management DB 12 A document for PC whose analysis flag is on is acquired (step S1301).

その後、未解析フラグがオンであるＰＣ向け文書がなくなるまで（ステップＳ１３０２否定）、文書解析部１４ｂは、ステップＳ１３０１〜ステップＳ１３０９までの処理を繰り返し行う。 Thereafter, the document analysis unit 14b repeats the processing from step S1301 to step S1309 until there are no PC documents whose unanalyzed flag is on (No in step S1302).

（３）画像解析処理
次に、図１４を用いて、本実施例１に係る画像解析処理を説明する。図１４は、実施例１に係る画像解析処理の手順を示すフローチャートである。なお、この「画像解析処理」は、図１３に示したステップＳ１３０５の処理に対応している。 (3) Image Analysis Processing Next, image analysis processing according to the first embodiment will be described with reference to FIG. FIG. 14 is a flowchart illustrating a procedure of image analysis processing according to the first embodiment. This “image analysis processing” corresponds to the processing in step S1305 shown in FIG.

図１４に示すように、文書解析部１４ｂは、ＰＣ向け文書管理ＤＢ１２から読み出した画像を入力としてＱＲコードのデコードを行い（ステップＳ１４０１）、その結果、デコードが成功し、かつデコードされた文字列に「ｈｔｔｐ」が含まれる場合（ステップＳ１４０２否定かつステップＳ１４０３肯定）に、当該デコード結果を携帯向け文書の位置情報と特定し（ステップＳ１４０４）、図１３に示すステップＳ１３０６の処理に移行する。 As shown in FIG. 14, the document analysis unit 14b decodes the QR code by using the image read from the PC document management DB 12 as an input (step S1401). As a result, the decoding is successful and the decoded character string. If “http” is included in the URL (No in Step S1402 and Yes in Step S1403), the decoding result is specified as the position information of the portable document (Step S1404), and the process proceeds to Step S1306 shown in FIG.

また、デコードが成功したとしてもデコードされた文字列に「ｈｔｔｐ」が含まれない場合（ステップＳ１４０２否定かつステップＳ１４０３否定）、並びに、デコードが失敗した場合（ステップＳ１４０２肯定）には、文書解析部１４ｂは、携帯向け文書の位置情報は未獲得と特定し（ステップＳ１４０５）、図１３に示すステップＳ１３０６の処理に移行する。 Even if the decoding is successful, if the decoded character string does not include “http” (No at Step S1402 and No at Step S1403) and if the decoding fails (Yes at Step S1402), the document analysis unit 14b specifies that the position information of the portable document has not been acquired (step S1405), and the process proceeds to step S1306 shown in FIG.

このように、ＰＣ向け文書として画像が収集された場合に、当該画像に埋め込まれた所定のコードから携帯向け文書の位置情報を解析するように構成したので、ＰＣ向けサイトが携帯向けサイトの告知を画像に埋め込んでいる場合でも、携帯向け文書の位置情報を抽出することができ、収集できる携帯向け文書の範囲を効果的に拡張することが可能になる。 As described above, when an image is collected as a PC document, since the position information of the portable document is analyzed from a predetermined code embedded in the image, the PC site notifies the portable site. Even in the case of embedded in the image, the position information of the portable document can be extracted, and the range of portable documents that can be collected can be effectively expanded.

（４）メールアドレス送信処理
次に、図１５を用いて、本実施例１に係るメールアドレス送信処理を説明する。図１５は、実施例１に係るメールアドレス送信処理の手順を示すフローチャートである。なお、この「メールアドレス送信処理」は、図１３に示したステップＳ１３０８の処理に対応しており、文書解析部１４ｂによってＰＣ向け文書中のｆｏｒｍに文字列の自由入力要素と携帯端末のメールドメイン選択要素が含まれていると判定された場合（ステップＳ１３０７肯定）に開始される。 (4) Mail Address Transmission Process Next, the mail address transmission process according to the first embodiment will be described with reference to FIG. FIG. 15 is a flowchart illustrating a procedure of mail address transmission processing according to the first embodiment. This “mail address transmission process” corresponds to the process of step S1308 shown in FIG. 13, and the document analysis unit 14b uses a free input element of a character string in the form in the PC document and the mail domain of the portable terminal. The process is started when it is determined that the selected element is included (Yes at step S1307).

図１５に示すように、メールアドレス送信処理部１４ｃは、ｆｏｒｍタグのａｃｔｉｏｎ属性を取得し、メールアドレス送信先と決定してから（ステップＳ１５０１）、図示しないメールアドレス一覧を読み込む（ステップＳ１５０２）。 As shown in FIG. 15, the mail address transmission processing unit 14c acquires the action attribute of the form tag, determines the mail address transmission destination (step S1501), and then reads a mail address list (not shown) (step S1502).

ここで、かかるメールアドレス一覧の読込み当初は、全てのメールアドレスについて処理が完了しておらず（ステップＳ１５０３否定）、メールアドレス送信処理部１４ｃは、メールアドレス一覧から未処理のメールアドレスの処理対象として特定し（ステップＳ１５０４）、ＰＣ向け文書のｆｏｒｍ中のメールドメイン選択要素中に処理対象のメールアドレスのドメインと一致するものがあるか否かを判定する（ステップＳ１５０５）。 Here, at the beginning of reading of the mail address list, the processing has not been completed for all mail addresses (No at step S1503), and the mail address transmission processing unit 14c performs processing of unprocessed mail addresses from the mail address list. (Step S1504), and it is determined whether or not any mail domain selection element in the form of the PC document matches the domain of the mail address to be processed (step S1505).

ここで、処理対象のメールアドレスのドメインと一致するものがあれば（ステップＳ１５０５肯定）、メールアドレス送信処理部１４ｃは、処理対象のメールアドレスのユーザ名をｆｏｒｍの自由入力要素の値とするとともに（ステップＳ１５０６）、ｆｏｒｍの選択要素の値を処理対象のメールアドレスのドメインに対応した値に設定し（ステップＳ１５０７）、自由入力要素及び選択要素の値をメールアドレス送信先に送信し（ステップＳ１５０８）、処理対象のメールアドレスを処理済とする（ステップＳ１５０９）。 If there is something that matches the domain of the mail address to be processed (Yes at step S1505), the mail address transmission processing unit 14c sets the user name of the mail address to be processed as the value of the free input element of form. (Step S1506), the value of the form selection element is set to a value corresponding to the domain of the mail address to be processed (Step S1507), and the values of the free input element and the selection element are transmitted to the mail address destination (Step S1508). ), The mail address to be processed is processed (step S1509).

また、処理対象のメールアドレスのドメインと一致するものがなければ（ステップＳ１５０５否定）、メールアドレス送信処理部１４ｃは、処理対象のメールアドレスをそのまま処理済とする（ステップＳ１５０９）。 If there is no domain that matches the domain of the processing target email address (No at step S1505), the email address transmission processing unit 14c determines that the processing target email address has been processed as it is (step S1509).

その後、メールアドレス一覧に記載されている全てのメールアドレスについて処理が完了するまで（ステップＳ１５０３否定）、メールアドレス送信処理部１４ｃは、上記したステップＳ１５０４〜ステップＳ１５０９までの処理を繰り返し行う。 Thereafter, the mail address transmission processing unit 14c repeats the processes from step S1504 to step S1509 described above until the processing is completed for all mail addresses described in the mail address list (No in step S1503).

（５）メール取得・解析処理
次に、図１６を用いて、本実施例１に係るメール取得・解析処理を説明する。図１６は、実施例１に係るメール取得・解析処理の手順を示すフローチャートである。このメール取得・解析処理は、上記した「メールアドレス送信処理」と連動しており、メールアドレス送信処理部１４ｃにより文書提供サーバ３０Ａ又は３０Ｃにメールアドレスが送信されていれば、文書提供サーバ３０Ａ又は３０Ｃからの応答メールが未読メールとして蓄積される。 (5) Mail Acquisition / Analysis Processing Next, mail acquisition / analysis processing according to the first embodiment will be described with reference to FIG. FIG. 16 is a flowchart illustrating the procedure of the mail acquisition / analysis process according to the first embodiment. This mail acquisition / analysis process is linked to the above-mentioned “mail address transmission process”, and if the mail address is transmitted to the document providing server 30A or 30C by the mail address transmission processing unit 14c, the document providing server 30A or The response mail from 30C is accumulated as unread mail.

同図に示すように、メール取得・解析部１４ｄは、携帯電話機５０からメールのバックアップを取得し（ステップＳ１６０１）、未読メールが存在するか否かを確認する（ステップＳ１６０２）。 As shown in the figure, the mail acquisition / analysis unit 14d acquires a backup of the mail from the mobile phone 50 (step S1601), and checks whether there is an unread mail (step S1602).

ここで、未読メールが存在することを確認した場合（ステップＳ１６０２否定）には、メール取得・解析部１４ｄは、未読メールの１つを処理対象メールと特定し（ステップＳ１６０３）、処理対象メール中から「ｈｔｔｐ」で始まり、かつ空白もしくは改行で終わる文字列を当該処理対象メールから抽出して携帯向け文書の位置情報と特定し（ステップＳ１６０４）、この携帯向け文書の位置情報及び未収集フラグを携帯向け文書管理ＤＢ１３に追加するとともに（ステップＳ１６０５）、当該処理対象メールを既読としてマーキングする（ステップＳ１６０６）。 If it is confirmed that there is an unread mail (No at step S1602), the mail acquisition / analysis unit 14d identifies one of the unread mails as a processing target mail (step S1603), A character string starting with “http” and ending with a blank or a line feed is extracted from the processing target mail and specified as position information of the portable document (step S1604), and the position information of the portable document and the uncollected flag are set. While adding to the portable document management DB 13 (step S1605), the processing target mail is marked as read (step S1606).

その後、未読メールがなくなるまで（ステップＳ１６０２否定）、メール取得・解析部１４ｄは、上記したステップＳ１６０３〜ステップＳ１６０６までの処理を繰り返し行う。 Thereafter, until there is no unread mail (No at Step S1602), the mail acquisition / analysis unit 14d repeats the processes from Step S1603 to Step S1606.

このように、ＰＣ向け文書から携帯端末のメールアドレスの入力フォームを解析し、ＰＣ向け文書が収集されたＰＣ向けサイトに所定のメールアドレスを送信し、そのＰＣ向けサイトにより応答メールが応答された携帯端末から当該応答メールを取得し、該取得した応答メールから携帯向け文書の位置情報を解析するように構成したので、ＰＣ向けサイトによる携帯向けサイトの告知が利用者の入力操作を伴うものであったとしても、携帯向け文書の位置情報を自動的に抽出することができ、収集できる携帯向け文書の範囲を実効的に拡張することが可能になる。 In this way, the input form of the e-mail address of the mobile terminal is analyzed from the PC document, the predetermined e-mail address is transmitted to the PC site where the PC document is collected, and the response mail is responded by the PC site. Since the response mail is acquired from the mobile terminal and the position information of the mobile document is analyzed from the acquired response mail, the notification of the mobile site by the PC site is accompanied by a user input operation. Even if there is, the position information of the portable document can be automatically extracted, and the range of portable documents that can be collected can be effectively expanded.

（６）携帯向け文書収集処理
次に、図１７を用いて、本実施例１に係る携帯向け文書収集処理を説明する。図１７は、実施例１に係る携帯向け文書収集処理の手順を示すフローチャートである。同図に示すように、携帯向け文書収集部１４ｅは、携帯向け文書管理ＤＢ１３から未収集フラグがオンである文書位置情報を取得する（ステップＳ１７０１）。 (6) Portable Document Collection Processing Next, portable document collection processing according to the first embodiment will be described with reference to FIG. FIG. 17 is a flowchart illustrating the procedure of the portable document collection process according to the first embodiment. As shown in the figure, the portable document collection unit 14e acquires document position information whose uncollected flag is on from the portable document management DB 13 (step S1701).

このとき、未収集フラグがオンである文書位置情報がなければ、全ての文書位置情報に対応する携帯向け文書を収集していることになるので文書位置情報の取得に失敗し（ステップＳ１７０２肯定）、処理を終了する。 At this time, if there is no document position information for which the uncollected flag is ON, it means that portable documents corresponding to all the document position information are collected, and acquisition of the document position information fails (Yes in step S1702). The process is terminated.

ここで、未収集フラグがオンである文書位置情報を取得できた場合（ステップＳ１７０２否定）には、携帯向け文書収集部１４ｅは、取得した文書位置情報にインターネット１を介してアクセスする（ステップＳ１７０３）。 If the document position information whose uncollected flag is ON can be acquired (No at Step S1702), the portable document collection unit 14e accesses the acquired document position information via the Internet 1 (Step S1703). ).

このとき、文書提供サーバ３０Ｂ又は３０Ｃから当該文書位置情報に対応する携帯向け文書を取得できれば（ステップＳ１７０４肯定）、携帯向け文書収集部１４ｅは、取得した携帯向け文書及び収集済みフラグを保存するとともに（ステップＳ１７０５）、その携帯向け文書からリンクされている文書の位置情報を取得し（ステップＳ１７０６）、該取得した未登録の文書位置情報及び未収集フラグを携帯向け文書管理ＤＢ１３に追加登録する（ステップＳ１７０７）。 At this time, if the portable document corresponding to the document position information can be acquired from the document providing server 30B or 30C (Yes in step S1704), the portable document collection unit 14e stores the acquired portable document and the collected flag. (Step S1705) The position information of the linked document is acquired from the portable document (Step S1706), and the acquired unregistered document position information and the uncollected flag are additionally registered in the portable document management DB 13 ( Step S1707).

一方、文書提供サーバ３０Ｂ又は３０Ｃから当該文書位置情報に対応する携帯向け文書を取得できなければ（ステップＳ１７０４否定）、携帯向け文書収集部１４ｅは、取得失敗フラグを携帯向け文書管理ＤＢ１３に保存する（ステップＳ１７０８）。 On the other hand, if the portable document corresponding to the document position information cannot be acquired from the document providing server 30B or 30C (No in step S1704), the portable document collection unit 14e stores the acquisition failure flag in the portable document management DB 13. (Step S1708).

なお、この「携帯向け文書収集処理」は、未収集フラグがオンである文書位置情報がなくなるまで（ステップＳ１７０２否定）、携帯向け文書収集部１４ｅによりステップＳ１７０１〜ステップＳ１７０８までの処理が繰り返し行われる。 In this “portable document collection process”, the steps from S1701 to S1708 are repeatedly performed by the portable document collection unit 14e until there is no document position information whose uncollected flag is on (No in step S1702). .

上述してきたように、本実施例１では、ウェブ上に公開されるウェブサイトのうちＰＣ向けサイトからＰＣ向け文書を収集し、該収集したＰＣ向け文書から携帯向け文書の位置情報を解析し、該解析した携帯向け文書の位置情報をもとに、当該携帯向け文書を収集するように構成したので、ＰＣ向けサイトの情報で携帯向けサイト間のリンク構造を補間することができ、リンクされずに点在する広範な携帯向けサイトから多種多様な携帯向け文書を収集することが可能である。 As described above, in the first embodiment, the PC document is collected from the PC site among the websites published on the web, the position information of the portable document is analyzed from the collected PC document, Since the mobile document is collected based on the analyzed location information of the mobile document, the link structure between the mobile sites can be interpolated with the information on the PC site, and the link is not performed. It is possible to collect a wide variety of portable documents from a wide range of portable sites scattered around.

次に、本実施例２に係る文書収集サーバについて説明する。なお、本実施例２では、上記した実施例１と同一の構成及び機能が同一であるものについては説明を省略し、両者の間に差異がある部分のみを説明する。 Next, the document collection server according to the second embodiment will be described. In the second embodiment, the description of the same configuration and function as those of the first embodiment will be omitted, and only the difference between them will be described.

ここで、本実施例２では、携帯端末を表す「Ｕｓｅｒ−Ａｇｅｎｔ」を用いて、ＰＣ向け文書管理ＤＢ１２により収集されたＰＣ向け文書の位置情報にアクセスする点に特徴があり、これにより、アクセス元の装置種別に応じて提供する文書が異なるウェブサイトからでも、携帯向け文書を収集することができるようにしている。 Here, the second embodiment is characterized in that the location information of the PC document collected by the PC document management DB 12 is accessed using the “User-Agent” representing the mobile terminal. Portable documents can be collected even from websites that provide different documents according to the original device type.

つまり、携帯向け文書管理ＤＢ１３に記憶される未収集フラグがオンであるＵＲＬにアクセスする場合に、ＨＴＴＰのリクエストヘッダのＵｓｅｒ−Ａｇｅｎｔ名を携帯端末の機種名に変更して文書提供サーバ３０にアクセスすれば、文書提供サーバ３０にはあたかも携帯端末からのリクエストであると識別させ、アクセス元の装置種別に応じて提供する文書が異なる場合には、携帯向け文書を提供させることができる。 In other words, when accessing a URL for which the uncollected flag stored in the portable document management DB 13 is on, the User-Agent name in the HTTP request header is changed to the model name of the portable terminal and the document providing server 30 is accessed. Then, the document providing server 30 can identify the request as if it is a request from a portable terminal, and if the document to be provided differs depending on the device type of the access source, the portable document can be provided.

そのため、ＰＣ向け文書として収集したものであっても、同一の文書位置情報に携帯端末でアクセスすれば、携帯向け文書を得られる可能性を排除できないので、携帯端末を表す「Ｕｓｅｒ−Ａｇｅｎｔ」を用いて、当該ＰＣ向け文書の位置情報に再アクセスする必要がある。 Therefore, even if it is collected as a document for PC, if the same document position information is accessed with a mobile terminal, the possibility of obtaining a document for mobile cannot be excluded. Therefore, “User-Agent” representing the mobile terminal is set. It is necessary to re-access the position information of the document for the PC.

これを背景にして、本実施例２では、図１８に示すように、ＰＣ向け文書管理ＤＢ１２から未解析フラグがオンであるＰＣ向け文書を１件取得するとともに、そのＰＣ向け文書の文書位置情報及び未収集フラグをそのまま携帯向け文書管理ＤＢ１３に追加するステップＳ１８０１を図１３に示したステップＳ１３０１の代わりに追加している点が実施例１に係る文書解析処理と相違する。 Against this background, in the second embodiment, as shown in FIG. 18, one PC document with the unanalyzed flag turned on is acquired from the PC document management DB 12, and the document position information of the PC document is also acquired. Further, step S1801 for adding the uncollected flag as it is to the portable document management DB 13 is added in place of step S1301 shown in FIG. 13, which is different from the document analysis processing according to the first embodiment.

これにより、たとえＰＣ向け文書から携帯向け文書の位置情報が解析できなくとも、収集対象の携帯向け文書の位置情報とすることができる反面、携帯端末を表すユーザエージェントを用いて、当該ＰＣ向け文書の位置情報に再アクセスしても、同じ内容の文書しか得られない可能性も残される。 Thereby, even if the position information of the portable document cannot be analyzed from the PC document, it can be set as the position information of the portable document to be collected, but on the PC document using the user agent representing the portable terminal. Even if the position information is re-accessed, there is a possibility that only documents having the same contents can be obtained.

そこで、本実施例２では、図１９に示すように、携帯端末を表す「Ｕｓｅｒ−Ａｇｅｎｔ」を用いて収集された文書がＰＣ向け文書管理ＤＢ１２に記憶されたＰＣ向け文書と同一の内容（ソースコード）であるか否かを判定するステップＳ１９０１を追加し、同一の内容であると判定した場合に、携帯向け文書管理ＤＢ１３に取得失敗フラグを設定し、当該文書を破棄させており、この点が実施例１に係る携帯向け文書収集処理と相違する。 Therefore, in the second embodiment, as shown in FIG. 19, a document collected using “User-Agent” representing a mobile terminal has the same content (source) as the PC document stored in the PC document management DB 12. Step S1901 for determining whether the content is the same code), and when it is determined that the content is the same, an acquisition failure flag is set in the portable document management DB 13 and the document is discarded. Is different from the portable document collection processing according to the first embodiment.

以上のように、本実施例２では、携帯端末を表す「Ｕｓｅｒ−Ａｇｅｎｔ」を用いて、ＰＣ向け文書管理ＤＢ１２により収集されたＰＣ向け文書の位置情報にアクセスするように構成したので、アクセス元の装置種別に応じて提供する文書が異なるウェブサイトからでも携帯向け文書を収集することができ、収集できる携帯向け文書の範囲を多面的に拡張することが可能である。 As described above, in the second embodiment, the “User-Agent” representing the mobile terminal is used to access the position information of the PC document collected by the PC document management DB 12. Mobile documents can be collected even from websites that provide different documents according to the device type, and the range of portable documents that can be collected can be expanded in many ways.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, another embodiment included in the present invention will be described below.

（１）応用例
例えば、本発明では、携帯端末ごとに当該携帯端末が有する性能を保持しておき、保持しておいた携帯端末を表すユーザエージェント並びに当該携帯端末の性能を表す端末情報を用いて、携帯向け文書管理ＤＢ１３に記憶される携帯端末向け文書の位置情報にアクセスするようにしてもよい。 (1) Application example For example, in the present invention, the performance possessed by the mobile terminal is retained for each mobile terminal, and the user agent representing the retained mobile terminal and the terminal information representing the performance of the mobile terminal are used. Thus, the position information of the document for the portable terminal stored in the portable document management DB 13 may be accessed.

すなわち、これによって、携帯端末の性能（例えば、処理可能な文書サイズ、ディスプレイのサイズ、解像度、色数、音源の音色数等）ごとに提供される携帯向け文書が異なる場合にそれぞれの性能の携帯向け文書を収集することができ、収集できる携帯向け文書の範囲を多面的に拡張することが可能である。 That is, when the portable document provided for each performance of the portable terminal (for example, the document size that can be processed, the display size, the resolution, the number of colors, the number of timbres of the sound source, etc.) differs, Documents can be collected, and the range of portable documents that can be collected can be expanded in many ways.

（２）機能の分散・統合
また、上記した実施例１及び２では、ＰＣ向け文書収集部１４ａ、文書解析部１４ｂ、メールアドレス送信処理部１４ｃ、メール取得・解析部１４ｄ及び携帯向け文書収集部１４ｅの全機能部を文書収集サーバ１０に集約する実施例を説明したが、本発明はこれに限定されるものではなく、これらの機能部の一部を外部の装置に分散させることもできる。 (2) Distribution / Integration of Functions In the first and second embodiments, the PC document collection unit 14a, the document analysis unit 14b, the mail address transmission processing unit 14c, the mail acquisition / analysis unit 14d, and the portable document collection unit Although the embodiment in which all the functional units 14e are aggregated in the document collection server 10 has been described, the present invention is not limited to this, and a part of these functional units can be distributed to an external device.

例えば、携帯端末のキャリアサーバに携帯向け文書収集部１４ｅのプロセスを組み込めば、リクエストヘッダのユーザエージェント名とともにリクエスト元のＩＰアドレスも携帯端末からのアクセスであることの判断基準としている文書提供サーバ３０に対して、携帯端末からのリクエストであると識別させることができる。 For example, if the process of the portable document collection unit 14e is incorporated into the carrier server of the portable terminal, the document providing server 30 that uses the user agent name in the request header as well as the IP address of the request source as a criterion for access from the portable terminal. Can be identified as a request from a mobile terminal.

（３）プログラム
ところで、上記の実施例１では、文書収集サーバ（文書収集装置）について説明したが、文書収集装置が有する構成をソフトウェアによって実現することで、同様の機能を有する文書収集プログラムを得ることができる。そこで、ここでは、文書収集プログラムを実行するコンピュータについて説明する。 (3) Program In the above-described first embodiment, the document collection server (document collection device) has been described. However, by realizing the configuration of the document collection device by software, a document collection program having the same function is obtained. be able to. Therefore, here, a computer that executes a document collection program will be described.

図２０は、実施例３に係る文書収集プログラムを実行するコンピュータの構成を示す機能ブロック図である。同図に示すように、このコンピュータ３００は、ＲＡＭ３１０と、ＣＰＵ３２０と、ＨＤＤ３３０と、無線ＬＡＮインタフェース３４０と、入出力インタフェース３５０とを有する。 FIG. 20 is a functional block diagram illustrating the configuration of a computer that executes a document collection program according to the third embodiment. As shown in the figure, the computer 300 includes a RAM 310, a CPU 320, an HDD 330, a wireless LAN interface 340, and an input / output interface 350.

ＲＡＭ３１０は、プログラムやプログラムの実行途中結果などを記憶するメモリであり、ＣＰＵ３２０は、ＲＡＭ３１０からプログラムを読み出して実行する中央処理装置である。ＨＤＤ３３０は、プログラムやデータを格納するディスク装置であり、無線ＬＡＮインタフェース３４０は、コンピュータ３００を無線ＬＡＮ経由で他のコンピュータに接続するためのインタフェースであり、入出力インタフェース３５０は、ディスプレイなどの入出力装置を接続するためのインタフェースである。 The RAM 310 is a memory that stores a program, a program execution result, and the like. The CPU 320 is a central processing unit that reads a program from the RAM 310 and executes the program. The HDD 330 is a disk device that stores programs and data, the wireless LAN interface 340 is an interface for connecting the computer 300 to another computer via a wireless LAN, and the input / output interface 350 is an input / output device such as a display. It is an interface for connecting devices.

そして、このコンピュータ３００において実行される文書収集プログラム３１１は、無線ＬＡＮインタフェース３４０を介して接続された他のコンピュータシステムのデータベースなどに記憶され、これらのデータベースから読み出されてコンピュータ３００にインストールされる。そして、インストールされた文書収集プログラム３１１は、ＨＤＤ３３０に記憶され、ＲＡＭ３１０に読み出されてＣＰＵ３２０によって実行される。 The document collection program 311 executed in the computer 300 is stored in a database or the like of another computer system connected via the wireless LAN interface 340, read from these databases, and installed in the computer 300. . The installed document collection program 311 is stored in the HDD 330, read out to the RAM 310, and executed by the CPU 320.

（付記１）コンピュータが、ウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集方法であって、
前記コンピュータが、
前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集工程と、
前記固定端末向け文書収集工程により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析工程と、
前記文書解析工程により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集工程と、
を実行することを特徴とする文書収集方法。 (Appendix 1) A document collection method in which a computer follows a link on a website and collects a document for a mobile terminal,
The computer is
A document collection process for fixed terminals that collects documents for fixed terminals from a site for fixed terminals of the website;
A document analysis step of analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection step for the fixed terminal;
Based on the location information of the document for the mobile terminal analyzed by the document analysis step, the document collection process for the mobile terminal that collects the document for the mobile terminal;
The document collection method characterized by performing.

（付記２）前記固定端末向け文書収集工程は、前記固定端末向け文書に含まれる画像をさらに収集し、
前記文書解析工程は、前記固定端末向け文書収集工程により前記固定端末向け文書として画像が収集された場合に、当該画像に埋め込まれた所定のコードから前記携帯端末向け文書の位置情報を解析することを特徴とする付記１に記載の文書収集方法。 (Supplementary Note 2) The document collection process for fixed terminals further collects images included in the document for fixed terminals,
In the document analysis step, when an image is collected as the document for the fixed terminal by the document collection step for the fixed terminal, the position information of the document for the portable terminal is analyzed from a predetermined code embedded in the image. The document collection method according to appendix 1, characterized by:

（付記３）前記文書解析工程は、前記固定端末向け文書収集工程により収集された固定端末向け文書から前記携帯端末のメールアドレスの入力フォームを解析し、
前記コンピュータが、さらに、
前記固定端末向け文書収集工程により前記固定端末向け文書が収集された固定端末向けサイトに所定のメールアドレスを送信するメールアドレス送信工程と、
前記固定端末向けサイトにより応答メールが応答された携帯端末から当該応答メールを取得するメール取得工程と、
前記メール取得工程により取得された応答メールから前記携帯端末向け文書の位置情報を解析するメール解析工程と
を実行することを特徴とする付記１に記載の文書収集方法。 (Appendix 3) The document analysis step analyzes the input form of the e-mail address of the mobile terminal from the document for the fixed terminal collected by the document collection step for the fixed terminal,
The computer further comprises:
An e-mail address transmitting step of transmitting a predetermined e-mail address to the site for the fixed terminal from which the document for the fixed terminal is collected by the document collecting step for the fixed terminal;
A mail acquisition step of acquiring the response mail from the mobile terminal to which the response mail is responded by the site for the fixed terminal;
The document collection method according to appendix 1, wherein: a mail analysis step of analyzing position information of the document for the portable terminal from the response mail acquired by the mail acquisition step is executed.

（付記４）前記携帯端末向け文書収集工程は、前記携帯端末を表すユーザエージェントを用いて、前記固定端末向け文書収集工程により収集された固定端末向け文書の位置情報にアクセスすることを特徴とする付記１、２または３に記載の文書収集方法。 (Additional remark 4) The said document collection process for portable terminals uses the user agent showing the said portable terminal, and accesses the positional information on the document for fixed terminals collected by the said document collection process for fixed terminals, It is characterized by the above-mentioned. The document collection method according to appendix 1, 2, or 3.

（付記５）前記コンピュータが、さらに、
前記携帯端末向け文書収集工程により前記携帯端末を表すユーザエージェントを用いて収集された文書が前記固定端末向け文書収集工程により収集された固定端末向け文書と同一の内容であるか否かを判定する重複可否判定工程を実行するとともに、
前記携帯端末向け文書収集工程は、前記重複可否判定工程により同一の内容であると判定された場合に、当該文書を破棄することを特徴とする付記４に記載の文書収集方法。 (Supplementary Note 5) The computer further includes:
It is determined whether or not the document collected using the user agent representing the mobile terminal in the document collection process for mobile terminals has the same content as the document for fixed terminals collected in the document collection process for fixed terminals While performing the duplication permission determination process,
The document collection method according to appendix 4, wherein the document collection process for portable terminals discards the document when it is determined by the duplication permission determination process that the contents are the same.

（付記６）前記コンピュータが、さらに、
前記携帯端末ごとに当該携帯端末が有する性能を所定の記憶部に保持する性能情報保持工程を実行するとともに、
前記携帯端末向け文書収集工程は、前記性能情報保持工程により保持された携帯端末を表すユーザエージェント並びに当該携帯端末の性能を表す端末情報を用いて、前記文書解析工程により解析された携帯端末向け文書の位置情報にアクセスすることを特徴とする付記１〜５のいずれか一つに記載の文書収集方法。 (Appendix 6) The computer further includes:
While performing a performance information holding step for holding the performance of the mobile terminal in a predetermined storage unit for each mobile terminal,
The portable terminal document collecting step uses the user agent representing the portable terminal held by the performance information holding step and the terminal information representing the performance of the portable terminal, and the portable terminal document analyzed by the document analyzing step The document collection method according to any one of appendices 1 to 5, wherein the location information is accessed.

（付記７）コンピュータにウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集処理を実行させる文書収集プログラムであって、
前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集手順と、
前記固定端末向け文書収集手順により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析手順と、
前記文書解析手順により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集手順と、
をコンピュータに実行させることを特徴とする文書収集プログラム。 (Appendix 7) A document collection program for causing a computer to execute a document collection process for collecting documents for mobile terminals by following website links,
Document collection procedure for fixed terminal that collects documents for fixed terminal from a site for fixed terminal among the websites,
A document analysis procedure for analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection procedure for the fixed terminal;
Based on the location information of the mobile terminal document analyzed by the document analysis procedure, the mobile terminal document collection procedure for collecting the mobile terminal document,
A document collection program characterized by causing a computer to execute.

（付記８）前記固定端末向け文書収集手順は、前記固定端末向け文書に含まれる画像をさらに収集し、
前記文書解析手順は、前記固定端末向け文書収集手順により前記固定端末向け文書として画像が収集された場合に、当該画像に埋め込まれた所定のコードから前記携帯端末向け文書の位置情報を解析することを特徴とする付記７に記載の文書収集プログラム。 (Supplementary Note 8) The fixed terminal document collection procedure further collects images included in the fixed terminal document,
In the document analysis procedure, when an image is collected as the document for the fixed terminal by the document collection procedure for the fixed terminal, the position information of the document for the portable terminal is analyzed from a predetermined code embedded in the image. The document collection program according to appendix 7, characterized by:

（付記９）前記文書解析手順は、前記固定端末向け文書収集手順により収集された固定端末向け文書から前記携帯端末のメールアドレスの入力フォームを解析し、
前記コンピュータに、さらに、
前記固定端末向け文書収集手順により前記固定端末向け文書が収集された固定端末向けサイトに所定のメールアドレスを送信するメールアドレス送信手順と、
前記固定端末向けサイトにより応答メールが応答された携帯端末から当該応答メールを取得するメール取得手順と、
前記メール取得手順により取得された応答メールから前記携帯端末向け文書の位置情報を解析するメール解析手順と
を実行させることを特徴とする付記７に記載の文書収集プログラム。 (Supplementary Note 9) The document analysis procedure analyzes the input form of the e-mail address of the mobile terminal from the document for fixed terminals collected by the document collection procedure for fixed terminals,
In addition to the computer,
An e-mail address transmission procedure for transmitting a predetermined e-mail address to the site for the fixed terminal from which the document for the fixed terminal is collected by the document collection procedure for the fixed terminal;
A mail acquisition procedure for acquiring the response mail from the mobile terminal to which the response mail is responded by the site for the fixed terminal;
The document collection program according to appendix 7, wherein a mail analysis procedure for analyzing position information of the document for the mobile terminal is executed from the response mail acquired by the mail acquisition procedure.

（付記１０）前記携帯端末向け文書収集手順は、前記携帯端末を表すユーザエージェントを用いて、前記固定端末向け文書収集手順により収集された固定端末向け文書の位置情報にアクセスすることを特徴とする付記７、８または９に記載の文書収集プログラム。 (Additional remark 10) The said document collection procedure for portable terminals uses the user agent showing the said portable terminal, and accesses the positional information on the document for fixed terminals collected by the said document collection procedure for fixed terminals, It is characterized by the above-mentioned. The document collection program according to appendix 7, 8 or 9.

（付記１１）前記コンピュータに、さらに、
前記携帯端末向け文書収集手順により前記携帯端末を表すユーザエージェントを用いて収集された文書が前記固定端末向け文書収集手順により収集された固定端末向け文書と同一の内容であるか否かを判定する重複可否判定手順を実行させるとともに、
前記携帯端末向け文書収集手順は、前記重複可否判定手順により同一の内容であると判定された場合に、当該文書を破棄することを特徴とする付記１０に記載の文書収集プログラム。 (Appendix 11) In addition to the computer,
It is determined whether or not the document collected using the user agent representing the portable terminal by the document collecting procedure for the portable terminal has the same content as the document for the fixed terminal collected by the document collecting procedure for the fixed terminal. While performing the duplication permission judgment procedure,
The document collection program according to appendix 10, wherein the document collection procedure for portable terminals discards the document when it is determined by the duplication permission determination procedure that the contents are the same.

（付記１２）前記コンピュータに、さらに、
前記携帯端末ごとに当該携帯端末が有する性能を所定の記憶部に保持する性能情報保持手順を実行させるとともに、
前記携帯端末向け文書収集手順は、前記性能情報保持手順により保持された携帯端末を表すユーザエージェント並びに当該携帯端末の性能を表す端末情報を用いて、前記文書解析手順により解析された携帯端末向け文書の位置情報にアクセスすることを特徴とする付記７〜１１のいずれか一つに記載の文書収集プログラム。 (Supplementary note 12) In addition to the computer,
While performing the performance information holding procedure for holding the performance of the mobile terminal in a predetermined storage unit for each mobile terminal,
The mobile terminal document collection procedure uses a user agent representing a mobile terminal held by the performance information holding procedure and terminal information representing the performance of the mobile terminal, and the mobile terminal document analyzed by the document analysis procedure The document collection program according to any one of appendices 7 to 11, wherein the location information is accessed.

（付記１３）ウェブサイトのリンクをたどって携帯端末向け文書を収集する文書収集装置であって、
前記ウェブサイトのうち固定端末向けサイトから固定端末向け文書を収集する固定端末向け文書収集手段と、
前記固定端末向け文書収集手段により収集された固定端末向け文書から前記携帯端末向け文書の位置情報を解析する文書解析手段と、
前記文書解析手段により解析された携帯端末向け文書の位置情報をもとに、当該携帯端末向け文書を収集する携帯端末向け文書収集手段と、
を備えたことを特徴とする文書収集装置。 (Supplementary note 13) A document collection device for collecting documents for mobile terminals by following links on websites,
Document collection means for fixed terminals that collects documents for fixed terminals from a site for fixed terminals among the websites;
Document analysis means for analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection means for the fixed terminal;
Based on the location information of the mobile terminal document analyzed by the document analysis means, the mobile terminal document collection means for collecting the mobile terminal document;
A document collecting apparatus comprising:

（付記１４）前記固定端末向け文書収集手段は、前記固定端末向け文書に含まれる画像をさらに収集し、
前記文書解析手段は、前記固定端末向け文書収集手段により前記固定端末向け文書として画像が収集された場合に、当該画像に埋め込まれた所定のコードから前記携帯端末向け文書の位置情報を解析することを特徴とする付記１３に記載の文書収集装置。 (Supplementary Note 14) The fixed terminal document collection means further collects images included in the fixed terminal document,
The document analyzing unit analyzes position information of the document for the portable terminal from a predetermined code embedded in the image when the image is collected as the document for the fixed terminal by the document collecting unit for the fixed terminal. Item 14. The document collection device according to Item 13, wherein the device is a document collection device.

（付記１５）前記文書解析手段は、前記固定端末向け文書収集手段により収集された固定端末向け文書から前記携帯端末のメールアドレスの入力フォームを解析し、
前記固定端末向け文書収集手段により前記固定端末向け文書が収集された固定端末向けサイトに所定のメールアドレスを送信するメールアドレス送信手段と、
前記固定端末向けサイトにより応答メールが応答された携帯端末から当該応答メールを取得するメール取得手段と、
前記メール取得手段により取得された応答メールから前記携帯端末向け文書の位置情報を解析するメール解析手段とをさらに備えた
ことを特徴とする付記１３に記載の文書収集装置。 (Supplementary Note 15) The document analysis means analyzes the input form of the e-mail address of the portable terminal from the document for fixed terminals collected by the document collection means for fixed terminals,
An e-mail address transmitting unit that transmits a predetermined e-mail address to the site for fixed terminals from which the document for fixed terminals is collected by the document collecting unit for fixed terminals;
Mail acquisition means for acquiring the response mail from the mobile terminal to which the response mail is responded by the site for the fixed terminal;
The document collection apparatus according to appendix 13, further comprising: mail analysis means for analyzing position information of the document for the portable terminal from the response mail acquired by the mail acquisition means.

（付記１６）前記携帯端末向け文書収集手段は、前記携帯端末を表すユーザエージェントを用いて、前記固定端末向け文書収集手段により収集された固定端末向け文書の位置情報にアクセスすることを特徴とする付記１３、１４または１５に記載の文書収集装置。 (Supplementary Note 16) The portable terminal document collection unit accesses the position information of the fixed terminal document collected by the fixed terminal document collection unit using a user agent representing the portable terminal. The document collection device according to attachment 13, 14 or 15.

（付記１７）前記携帯端末向け文書収集手段により前記携帯端末を表すユーザエージェントを用いて収集された文書が前記固定端末向け文書収集手段により収集された固定端末向け文書と同一の内容であるか否かを判定する重複可否判定手段をさらに備え、
前記携帯端末向け文書収集手段は、前記重複可否判定手段により同一の内容であると判定された場合に、当該文書を破棄することを特徴とする付記１６に記載の文書収集装置。 (Supplementary Note 17) Whether or not the document collected by the document collecting unit for portable terminal using the user agent representing the portable terminal has the same content as the document for fixed terminal collected by the document collecting unit for fixed terminal Further comprising duplication permission / inhibition judging means for judging
The document collection device according to appendix 16, wherein the portable terminal document collection unit discards the document when it is determined by the duplication permission determination unit that the contents are the same.

（付記１８）前記携帯端末ごとに当該携帯端末が有する性能を所定の記憶部に保持する性能情報保持手段をさらに備え、
前記携帯端末向け文書収集手段は、前記性能情報保持手段により保持された携帯端末を表すユーザエージェント並びに当該携帯端末の性能を表す端末情報を用いて、前記文書解析手段により解析された携帯端末向け文書の位置情報にアクセスすることを特徴とする付記１３〜１７のいずれか一つに記載の文書収集装置。 (Additional remark 18) The performance information holding means which hold | maintains the performance which the said portable terminal has for every said portable terminal in a predetermined memory | storage part,
The portable terminal document collecting means uses the user agent representing the portable terminal held by the performance information holding means and the terminal information representing the performance of the portable terminal, and the portable terminal document analyzed by the document analyzing means The document collection device according to any one of appendices 13 to 17, wherein the location information is accessed.

以上のように、本発明に係る文書収集方法、文書収集プログラム及び文書収集装置は、携帯端末向けサイトにおいて提供される文書を収集する場合に適している。 As described above, the document collection method, the document collection program, and the document collection apparatus according to the present invention are suitable for collecting documents provided on a mobile terminal site.

実施例１に係る文書収集サーバの概要を説明するための説明図である。3 is an explanatory diagram for explaining an overview of a document collection server according to Embodiment 1. FIG. 実施例１に係る文書収集サーバの特徴を説明するための概念図である。FIG. 3 is a conceptual diagram for explaining features of the document collection server according to the first embodiment. 実施例１に係る文書収集サーバの構成を示す機能ブロック図である。3 is a functional block diagram illustrating a configuration of a document collection server according to Embodiment 1. FIG. ＰＣ向け文書管理ＤＢに記憶される構成例を示す図である。It is a figure which shows the structural example memorize | stored in document management DB for PCs. 携帯向け文書管理ＤＢに記憶される構成例を示す図（１）である。It is FIG. (1) which shows the structural example memorize | stored in portable document management DB. 携帯向け文書管理ＤＢに記憶される構成例を示す図（２）である。It is FIG. (2) which shows the structural example memorize | stored in portable document management DB. パターンマッチルールの一例を示す図である。It is a figure which shows an example of a pattern matching rule. ＰＣ向けサイトに掲載される携帯向けサイトの告知例を示す図（１）である。It is a figure (1) which shows the notice example of the site for mobiles published on the site for PCs. ＰＣ向けサイトに掲載される携帯向けサイトの告知例を示す図（２）である。It is FIG. (2) which shows the notification example of the site for mobiles published on the site for PC. メールアドレスの送信対象となる文書の一例を示す図である。It is a figure which shows an example of the document used as the transmission object of a mail address. ＰＣ向けサイトに掲載される携帯向けサイトの告知例を示す図（３）である。It is a figure (3) which shows the notice example of the site for mobiles published on the site for PCs. 実施例１に係るＰＣ向け文書収集処理の手順を示すフローチャートである。6 is a flowchart illustrating a procedure of PC document collection processing according to the first embodiment. 実施例１に係る文書解析処理の手順を示すフローチャートである。6 is a flowchart illustrating a procedure of document analysis processing according to the first embodiment. 実施例１に係る画像解析処理の手順を示すフローチャートである。3 is a flowchart illustrating a procedure of image analysis processing according to the first embodiment. 実施例１に係るメールアドレス送信処理の手順を示すフローチャートである。5 is a flowchart illustrating a procedure of mail address transmission processing according to the first embodiment. 実施例１に係るメール取得・解析処理の手順を示すフローチャートである。6 is a flowchart illustrating a procedure of mail acquisition / analysis processing according to the first embodiment. 実施例１に係る携帯向け文書収集処理の手順を示すフローチャートである。3 is a flowchart illustrating a procedure for portable document collection processing according to the first embodiment. 実施例２に係る文書解析処理の手順を示すフローチャートである。12 is a flowchart illustrating a procedure of document analysis processing according to the second embodiment. 実施例２に係る携帯向け文書収集処理の手順を示すフローチャートである。12 is a flowchart illustrating a procedure of portable document collection processing according to the second embodiment. 実施例３に係る文書収集プログラムを実行するコンピュータの構成を示す機能ブロック図である。FIG. 10 is a functional block diagram illustrating a configuration of a computer that executes a document collection program according to a third embodiment.

Explanation of symbols

１インターネット
３ケーブル
１０文書収集サーバ
１１通信制御ＩＦ部
１２ＰＣ向け文書管理データベース
１３携帯向け文書管理データベース
１４制御部
１４ａＰＣ向け文書収集部
１４ｂ文書解析部
１４ｃメールアドレス送信処理部
１４ｄメール取得・解析部
１４ｅ携帯向け文書収集部
３０文書提供サーバ
５０携帯電話機 DESCRIPTION OF SYMBOLS 1 Internet 3 Cable 10 Document collection server 11 Communication control IF part 12 PC document management database 13 Portable document management database 14 Control part 14a PC document collection part 14b Document analysis part 14c Mail address transmission process part 14d Mail acquisition and analysis part 14e Mobile document collection unit 30 Document provision server 50 Mobile phone

Claims

A document collection method in which a computer follows a link on a website to collect a document for a mobile device,
The computer is
A document collection process for fixed terminals that collects documents for fixed terminals from a site for fixed terminals of the website;
A document analysis step of analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection step for the fixed terminal;
Based on the location information of the document for the mobile terminal analyzed by the document analysis step, the document collection process for the mobile terminal that collects the document for the mobile terminal;
The document collection method characterized by performing.

The document collection process for fixed terminals further collects images included in the document for fixed terminals,
In the document analysis step, when an image is collected as the document for the fixed terminal by the document collection step for the fixed terminal, the position information of the document for the portable terminal is analyzed from a predetermined code embedded in the image. The document collection method according to claim 1.

The document analysis step analyzes the input form of the e-mail address of the mobile terminal from the document for the fixed terminal collected by the document collection step for the fixed terminal,
The computer further comprises:
An e-mail address transmitting step of transmitting a predetermined e-mail address to the site for the fixed terminal from which the document for the fixed terminal is collected by the document collecting step for the fixed terminal;
A mail acquisition step of acquiring the response mail from the mobile terminal to which the response mail is responded by the site for the fixed terminal;
The document collection method according to claim 1, further comprising: executing a mail analysis step of analyzing position information of the document for the portable terminal from the response mail acquired by the mail acquisition step.

The document collection process for mobile terminals uses a user agent representing the mobile terminal to access position information of documents for fixed terminals collected by the document collection process for fixed terminals. The document collection method according to 2 or 3.

The computer further comprises:
It is determined whether or not the document collected using the user agent representing the mobile terminal in the document collection process for mobile terminals has the same content as the document for fixed terminals collected in the document collection process for fixed terminals While performing the duplication permission determination process,
The document collection method according to claim 4, wherein the document collection process for mobile terminals discards the document when it is determined that the content is the same in the duplication permission determination process.

A document collection program for executing a document collection process for collecting documents for mobile terminals by following a link on a website to a computer,
Document collection procedure for fixed terminal that collects documents for fixed terminal from a site for fixed terminal among the websites,
A document analysis procedure for analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection procedure for the fixed terminal;
Based on the location information of the mobile terminal document analyzed by the document analysis procedure, the mobile terminal document collection procedure for collecting the mobile terminal document,
A document collection program characterized by causing a computer to execute.

A document collection device that collects documents for mobile devices by following website links,
Document collection means for fixed terminals that collects documents for fixed terminals from a site for fixed terminals among the websites;
Document analysis means for analyzing position information of the document for the portable terminal from the document for the fixed terminal collected by the document collection means for the fixed terminal;
Based on the location information of the mobile terminal document analyzed by the document analysis means, the mobile terminal document collection means for collecting the mobile terminal document;
A document collecting apparatus comprising: