JP2004110151A

JP2004110151A - Apparatus, program, and method for searching unauthorized utilization of content

Info

Publication number: JP2004110151A
Application number: JP2002268632A
Authority: JP
Inventors: Takeya Fujii; 藤井　毅也
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2002-09-13
Filing date: 2002-09-13
Publication date: 2004-04-08

Abstract

<P>PROBLEM TO BE SOLVED: To reduce time required for collection of content and sensing check of a digital watermark in searching unauthorized utilization of contents. <P>SOLUTION: An apparatus for searching unauthorized utilization of contents comprises; a text analysis section which analyzes link information included in text information described in a hierarchy specified by address information; a search target content check section which detects an identifier embedded in the collected contents concerned by a digital watermark, and determines whether or not unauthorized utilization of the content concerned is carried out; and a priority setting section which sets based on an inputted keyword, an order of priority to the link information analyzed by the text analysis section which analyzes the link information included in collected text information. The priority setting section sets an order of priority for searching to a low order hierarchy based on either of a plus keyword which is included in a keyword and raises the order of priority or a minus keyword which is included in the keyword and lowers the order of priority, or based on both of them. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば著作権情報等の不正利用を防止するための情報が電子透かし情報として埋め込まれたコンテンツを探索し、埋め込まれた情報に基づいてコンテンツが不正利用されているものかどうかを判定するコンテンツ不正利用探索装置、コンテンツ不正利用探索プログラム、コンテンツ不正利用探索方法に関し、特に、コンテンツの収集や電子透かしの検出チェックに要する時間を短縮し、悪意あるユーザによるリンク構造やコンテンツの差し替えに容易に対応することを可能にする技術に係わる。
【０００２】
【従来の技術】
近年、コンテンツの著作権を保護する目的として、コンテンツの著作権情報を電子透かし等でコンテンツに埋め込む技術が精力的に研究されており、様々な情報の埋め込み方式が現在までに提案されている。例えば、ＭＰＥＧ符号、とりわけＤＣＴ係数、動きベクトル、量子化特性を変更することによる、情報の埋め込み方式が提案されている（例えば、非特許文献１参照）。また、直接拡散方式に従って、ＰＮ系列で画像信号を拡散し、画像に署名情報を合成する方式が提案されている（例えば、非特許文献２参照）。
【０００３】
このようなコンテンツへの情報の埋め込み方式に関する研究に伴い、最近では、コンテンツの不正利用を判定するための様々なシステムが提供されるようになっており、例えば、電子透かしを利用して購入者情報を予めコンテンツに埋め込んでおき、不正利用と思われるコンテンツに埋め込まれている情報を読み出してコンテンツが不正利用されているものかどうかを判定するシステムが提案されている。このようなシステムでは、不正利用と思われるコンテンツを探索、入手することが必要となるために、ウェブ上のコンテンツを収集するコンピュータプログラムである「ウェブロボット」を利用して収集したコンテンツの電子透かし情報のチェックを行う。このウェブロボットは、世界中のウェブサイトを常時巡回し、画像等のコンテンツが不正利用されていないかどうかを監視するものであり、監視の結果、コンテンツが不正利用されていると判断した場合には、ウェブロボットはコンテンツの不正利用を行っているウェブサイトに対し何らかの通告を行う。
【０００４】
また、入力されたキーワード又はコンテンツの指定情報に応じて収集する調査対象パターンを決定し、決定した調査対象パターンに応じてネットワークを介して調査対象コンテンツを収集し、収集した調査対象コンテンツが不正利用であるかどうかを判定する技術が開示されている（例えば、特許文献１参照）。
【０００５】
【特許文献１】
特開平２００１−７６０００号公報
【非特許文献１】
日本電信電話株式会社，”ＤＣＴを用いたデジタル動画像における著作権情報埋め込み方式”，電子情報通信学会１９９７年暗号と情報セキュリティシンポジウム，ＳＣＩＳ　’９７−３１Ｇ
【非特許文献２】
防衛大学校，”ＰＮ系列による画像への透かし署名法”，電子情報通信学会１９９７年暗号と情報セキュリティシンポジウム，ＳＣＩＳ　’９７＿２６Ｂ
【０００６】
【発明が解決しようとする課題】
しかしながら、上記のようなコンテンツの不正利用を判定するためのシステムでは、インターネット上の全てのコンテンツが調査対象となるために、コンテンツの収集や電子透かしのチェックが完了するまでに膨大な時間を要し、コンテンツを効率的に探索することができない。特に、インターネットのような開放系のネットワークは特定の団体により管理されるものではないために、悪意を持った人間が容易に管理者となり、コンテンツの内容やリンク構造を動的に差し替えることができることを考えると、このような状況は早急に解決すべきである。
【０００７】
なお、特開平２００１−７６０００号公報に開示さている技術は、条件に適合するコンテンツのみを探索することによりコンテンツの収集や電子透かしのチェックを行うまでに要する時間をある程度は短縮させることができるが、条件に適合するコンテンツを全て対等に扱って処理を行うために効率的な探索方法であるとは必ずしも言えない。
【０００８】
本発明は、以上に述べた状況を鑑みて成されたものであり、コンテンツの収集や電子透かしの検出チェックに要する時間を短縮し、悪意あるユーザによるリンク構造やコンテンツの差し替えに容易に対応することを可能にする、コンテンツ不正利用探索装置、コンテンツ不正利用探索プログラム、及びコンテンツ不正利用探索方法を提供することにある。
【０００９】
【課題を解決するための手段】
本発明によるコンテンツ不正利用探索装置（２ａ）は、電子ネットワークにおけるコンテンツの不正利用を探索するコンテンツ不正利用探索装置（２ａ）であって、キーワードと探索するコンテンツの最上位階層のアドレス情報とが入力される入力部（４）と、前記入力部（４）から入力された前記アドレス情報により特定される階層に記述されているテキスト情報を収集し、収集した当該テキスト情報内にリンクされているコンテンツを収集する探索対象収集部（３１）と、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを判定する探索対象コンテンツチェック部（３４）と、前記探索対象収集部（３１）が収集した前記テキスト情報に含まれる、前記テキスト情報の階層から下位階層へ探索するためのリンク情報を解析するテキスト解析部（３２）と、前記入力部（４）から入力された前記キーワードに基づいて、前記テキスト解析部（３２）が解析した前記リンク情報に優先順位を設定する優先順位設定部（３３）とを有し、前記優先順位設定部（３３）は、前記キーワードに含まれる、前記優先順位を上げるプラスキーワードと前記優先順位を下げるマイナスキーワードのどちらか一方、もしくは両方に基づいて、前記下位階層へ探索するためのリンク情報に前記優先順位を設定し、前記探索対象収集部（３１）は、前記優先順位設定部（３３）により設定された前記優先順位に従って、前記リンク情報により特定される階層に記述されているテキスト情報を収集し、収集した当該テキスト情報内にリンクされているコンテンツを収集し、前記探索対象コンテンツチェック部（３４）は、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを判定することを特徴とする。
【００１０】
また、本発明によるコンテンツ不正利用探索プログラムは、コンテンツ不正利用探索装置（２ａ）に、電子ネットワークにおけるコンテンツの不正利用の探索を実行させるためのコンテンツ不正利用探索プログラムであって、キーワードと探索するコンテンツの最上位階層のアドレス情報とが入力部（４）から入力されるステップ（Ｓ１，Ｓ２）と、前記入力部（４）から入力された前記アドレス情報により特定される階層に記述されているテキスト情報を探索対象収集部（３１）により収集し（Ｓ３）、収集した当該テキスト情報内にリンクされているコンテンツを前記探索対象収集部（３１）により収集するステップと（Ｓ５）、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを、探索対象コンテンツチェック部（３４）により判定するステップ（Ｓ６，Ｓ７，Ｓ８）と、前記探索対象収集部（３１）が収集した前記テキスト情報に含まれる、前記テキスト情報の階層から下位階層へ探索するためのリンク情報を、テキスト解析部（３２）により解析するステップ（Ｓ１２）と、前記入力部（４）から入力された前記キーワードに基づいて、前記テキスト解析部（３２）が解析した前記リンク情報に優先順位設定部（３３）により優先順位を設定するステップ（Ｓ１３）と、前記優先順位設定部（３３）が設定した前記優先順位に従って、前記リンク情報により特定される階層に記述されているテキスト情報を前記探索対象収集部（３１）により収集し（Ｓ１５）、収集した当該テキスト情報内にリンクされているコンテンツを前記探索対象収集部（３１）により収集するステップ（Ｓ５）と、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを、前記探索対象コンテンツチェック部（３４）により判定するステップ（Ｓ６，Ｓ７，Ｓ８）とを前記コンテンツ不正利用探索装置に実行させ（２ａ）、前記優先順位を設定するステップ（Ｓ１３）において、前記優先順位設定部（３３）が、前記キーワードに含まれる、前記優先順位を上げるプラスキーワードと優先順位を下げるマイナスキーワードのどちらか一方、もしくは両方に基づいて、前記下位階層へ探索するためのリンク情報に前記優先順位を設定するように機能させることを特徴とする。
【００１１】
また、本発明によるコンテンツ不正利用探索方法は、電子ネットワークにおけるコンテンツの不正利用を探索するコンテンツ不正利用探索装置（２ａ）によるコンテンツ不正利用探索方法であって、キーワードと探索するコンテンツの最上位階層のアドレス情報とが入力部（４）から入力されるステップ（Ｓ１，Ｓ２）と、前記入力部（４）から入力された前記アドレス情報により特定される階層に記述されているテキスト情報を探索対象収集部（３１）により収集し（Ｓ３）、収集した当該テキスト情報内にリンクされているコンテンツを前記探索対象収集部（３１）により収集するステップと（Ｓ５）、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを、探索対象コンテンツチェック部（３４）により判定するステップ（Ｓ６，Ｓ７，Ｓ８）と、前記探索対象収集部（３１）が収集した前記テキスト情報に含まれる、前記テキスト情報の階層から下位階層へ探索するためのリンク情報を、テキスト解析部（３２）により解析するステップ（Ｓ１２）と、前記入力部（４）から入力された前記キーワードに基づいて、前記テキスト解析部（３２）が解析した前記リンク情報に優先順位設定部（３３）により優先順位を設定するステップ（Ｓ１３）と、前記優先順位設定部（３３）が設定した前記優先順位に従って、前記リンク情報により特定される階層に記述されているテキスト情報を前記探索対象収集部（３１）により収集し（Ｓ１５）、収集した当該テキスト情報内にリンクされているコンテンツを前記探索対象収集部（３１）により収集するステップ（Ｓ５）と、前記探索対象収集部（３１）が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを、前記探索対象コンテンツチェック部（３４）により判定するステップ（Ｓ６，Ｓ７，Ｓ８）とを有し、前記優先順位を設定するステップ（Ｓ１３）において、前記優先順位設定部（３３）は、前記キーワードに含まれる、前記優先順位を上げるプラスキーワードと優先順位を下げるマイナスキーワードのどちらか一方、もしくは両方に基づいて、前記下位階層へ探索するためのリンク情報に前記優先順位を設定することを特徴とする。
【００１２】
このような構成によれば、電子ネットワーク上の全てのコンテンツを探索対象にするのではなく、入力されたキーワードに基づいて各リンク情報に探索の優先順位を付与し、この優先順位に基づいてコンテンツを探索する。このような構成によれば、注目するコンテンツを効率良く探索し、コンテンツの収集や電子透かしの検出チェックに要する時間を短縮することができる。
【００１３】
また、本発明によるコンテンツ不正利用探索装置（２ｂ）は、電子ネットワークにおけるコンテンツの不正利用を探索するコンテンツ不正利用探索装置（２ｂ）であって、コンテンツを特定可能なアドレス情報と当該アドレス情報の優先度と前記コンテンツを取得した取得日時とを組にして管理するコンテンツデータベース（８）と、キーワードと当該キーワードの重要度とを組にして管理するキーワードデータベース（９）と、探索対象のコンテンツを特定する起点アドレス情報が入力される入力部（４）と、前記コンテンツデータベース（８）が管理する前記アドレス情報を前記優先度の高い順に取り出して取得リストとして出力する巡回部（４１）と、前記取得リストに含まれる前記アドレス情報に基づいて前記探索対象のコンテンツを取得するコンテンツ取得部（４２）と、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストであるか否かを判定するハイパーテキスト判定部（４３）と、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストである場合に、当該ハイパーテキストが含む前記リンク情報を解析して、２次ノードアドレス情報と当該リンク情報から所定の範囲内にある近傍キーワードとを組にして形態素テーブルとして出力するテキスト解析部（４４）と、前記形態素テーブルに含まれる前記近傍キーワードを前記キーワードデータベース（９）から検索し、検索結果に応じて前記コンテンツデータベース（８）内の優先度を変動させる変動値を生成し、生成した前記変動値を前記アドレス情報と共に前記コンテンツデータベース（８）に登録するコンテンツ登録部（４５）と、前記コンテンツ取得部（４２）が取得した前記コンテンツが不正利用されているか否かを判定する不正利用判定部（３５）とを有し、前記巡回部（４１）は、前記優先度が所定の範囲内にある時、当該所定の範囲を所定の定数で割ることにより生成される複数の範囲の各々について、前記所定の範囲の最大値から近い順に前記コンテンツデータベース（８）から前記アドレス情報を取り出し、取り出した前記アドレス情報を前記取得日時が古い順にソートし、古いアドレス情報から優先的に前記取得リストへ追加し、前記取得リストに追加する前記アドレス情報の数が所定の巡回最大数に達した時点で当該取得リストを出力することを特徴とする。
【００１４】
また、本発明によるコンテンツ不正利用探索プログラムは、コンテンツ不正利用探索装置（２ｂ）に、電子ネットワークにおけるコンテンツの不正利用の探索を実行させるためのコンテンツ不正利用探索プログラムであって、探索対象のコンテンツを特定する起点アドレス情報が入力部（４）から入力されるステップと、コンテンツを特定可能なアドレス情報と当該アドレス情報の優先度と前記コンテンツを取得した取得日時とを組にして管理するコンテンツデータベース（８）が管理する前記アドレス情報を、巡回部（４１）により前記優先度の高い順に取り出して取得リストとして出力するステップと、前記取得リストに含まれる前記アドレス情報に基づいて前記探索対象のコンテンツをコンテンツ取得部（４２）により取得するステップと、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストであるか否かをハイパーテキスト判定部（４３）により判定するステップと、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストである場合に、テキスト解析部（４４）により当該ハイパーテキストが含む前記リンク情報を解析して、２次ノードアドレス情報と当該リンク情報から所定の範囲内にある近傍キーワードとを組にして形態素テーブルとして出力するステップと、コンテンツ登録部（４５）により、前記形態素テーブルに含まれる前記近傍キーワードをキーワードと当該キーワードの重要度とを組にして管理するキーワードデータベース（９）から検索し、検索結果に応じて前記コンテンツデータベース内の優先度を変動させる変動値を生成し、生成した前記変動値を前記アドレス情報と共に前記コンテンツデータベース（８）に登録するステップと、不正利用判定部（３５）により、前記コンテンツ取得部（４２）が取得した前記コンテンツが不正利用されているか否かを判定するステップとを前記コンテンツ不正利用探索装置に実行させ（２ｂ）、前記取得リストを出力するステップにおいて、前記巡回部（４１）が、前記優先度が所定の範囲内にある時、当該所定の範囲を所定の定数で割ることにより生成される複数の範囲の各々について、前記所定の範囲の最大値から近い順に前記コンテンツデータベース（８）から前記アドレス情報を取り出し、取り出した前記アドレス情報を前記取得日時が古い順にソートし、古いアドレス情報から優先的に前記取得リストへ追加し、前記取得リストに追加する前記アドレス情報の数が所定の巡回最大数に達した時点で当該取得リストを出力するように機能させることを特徴とする。
【００１５】
また、本発明によるコンテンツ不正利用探索方法は、電子ネットワークにおけるコンテンツの不正利用を探索するコンテンツ不正利用探索装置（２ｂ）によるコンテンツ不正利用探索方法であって、探索対象のコンテンツを特定する起点アドレス情報が入力部（４）から入力されるステップと、コンテンツを特定可能なアドレス情報と当該アドレス情報の優先度と前記コンテンツを取得した取得日時とを組にして管理するコンテンツデータベース（８）が管理する前記アドレス情報を、巡回部（４１）により前記優先度の高い順に取り出して取得リストとして出力するステップと、前記取得リストに含まれる前記アドレス情報に基づいて前記探索対象のコンテンツをコンテンツ取得部（４２）により取得するステップと、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストであるか否かをハイパーテキスト判定部（４３）により判定するステップと、前記コンテンツ取得部（４２）が取得した前記コンテンツがリンク情報を含むハイパーテキストである場合に、テキスト解析部（４４）により当該ハイパーテキストが含む前記リンク情報を解析して、２次ノードアドレス情報と当該リンク情報から所定の範囲内にある近傍キーワードとを組にして形態素テーブルとして出力するステップと、コンテンツ登録部（４５）により、前記形態素テーブルに含まれる前記近傍キーワードをキーワードと当該キーワードの重要度とを組にして管理するキーワードデータベース（９）から検索し、検索結果に応じて前記コンテンツデータベース内の優先度を変動させる変動値を生成し、生成した前記変動値を前記アドレス情報と共に前記コンテンツデータベース（８）に登録するステップと、不正利用判定部（３５）により、前記コンテンツ取得部（４２）が取得した前記コンテンツが不正利用されているか否かを判定するステップとを有し、前記取得リストを出力するステップにおいて、前記巡回部（４１）は、前記優先度が所定の範囲内にある時、当該所定の範囲を所定の定数で割ることにより生成される複数の範囲の各々について、前記所定の範囲の最大値から近い順に前記コンテンツデータベース（８）から前記アドレス情報を取り出し、取り出した前記アドレス情報を前記取得日時が古い順にソートし、古いアドレス情報から優先的に前記取得リストへ追加し、前記取得リストに追加する前記アドレス情報の数が所定の巡回最大数に達した時点で当該取得リストを出力することを特徴とする。
【００１６】
このような構成によれば、収集したコンテンツがハイパーテキストである場合、リンク情報近傍のキーワードを検索し、キーワードの有無に従って優先度を変動させ、優先度が高く、且つ、古いコンテンツから優先的に収集する。また、優先度が高いサイトからリンクされたサイトは優先度を継承する。このような構成によれば、悪意を持った人間によるリンク構造やコンテンツの差し替えに容易に対応することができる。
【００１７】
【発明の実施の形態】
以下、図１〜図１５を参照しながら、本発明の実施の形態について説明する。
【００１８】
尚、各図面を通じて同一もしくは同等の部位や構成要素には、同一もしくは同等の参照符号を付し、その説明を省略もしくは簡略化する。
【００１９】
［第１の実施形態］
〔コンテンツ不正利用探索装置の構成〕
本発明の第１の実施形態となるコンテンツ不正利用探索装置２ａは、例えばパーソナルコンピュータ、ワークステーション、汎用コンピュータ等のコンピュータ装置上に構成される。具体的には、コンテンツ不正利用探索装置２ａは、図２に示すように、ＣＰＵ１、ＲＡＭ２、ＲＯＭ３、入力Ｉ／Ｆ部４、通信制御部５、ＩＤデータ／ＵＲＬデータベース６を備え、インターネット７に接続可能な構成となっている。
【００２０】
ＣＰＵ１は、ＲＯＭ３内に記憶されたコンピュータプログラムに従ってコンテンツ不正利用探索装置２ａの動作制御を行う。また、ＲＡＭ２は、ＣＰＵ１が実行する各種処理に関するコンピュータプログラムやデータを一時的に格納するワークエリアを提供する。
【００２１】
ＲＯＭ３は、コンテンツ不正利用探索プログラム３ａ等の各種コンピュータプログラムやプログラムの実行に必要なデータを記憶する。なお、ＲＯＭ３は、磁気的、光学的記録媒体若しくは半導体メモリ等といった、ＣＰＵ１が読み取り可能な記録媒体を含んだ構成となっている。また、この記録媒体に格納されるコンピュータプログラムやデータは、インターネット７を介してその一部若しくは全部を受信するような構成にしても良い。
【００２２】
入力インタフェイス（Ｉ／Ｆ）部４は、後述する不正利用探索処理を実行する際に必要となる各種情報（キーワードと探索するコンテンツの最上位階層のアドレス情報等）を入力する際のインタフェイスの役割を担う。
【００２３】
上記通信制御部５は、例えばＨＴＴＰ（Ｈｙｐｅｒ　Ｔｅｘｔ　Ｔｒａｎｓｆｅｒ　Ｐｒｏｔｏｃｏｌ）、ＴＣＰ／ＩＰ（Ｔｒａｎｓｆｅｒ　Ｃｏｎｔｒｏｌ　Ｐｒｏｔｏｃｏｌ　／　Ｉｎｔｅｒｎｅｔ　Ｐｒｏｔｏｃｏｌ）等のデータ通信用プロトコル、例えばＳＭＴＰ（Ｓｉｍｐｌｅ　Ｍａｉｌ　Ｔｒａｎｓｆｅｒ　Ｐｒｏｔｏｃｏｌ）やＰＯＰ（Ｐｏｓｔ　Ｏｆｆｉｃｅ　Ｐｒｏｔｏｃｏｌ）等の電子メイル通信用プロトコルを実装する。通信制御部５は、これらのプロトコルを使用して、インターネット７を介して、各種データを送信すると共に、受信した各種データをＣＰＵ１が処理可能な形式に変換する。
【００２４】
ＩＤデータ／ＵＲＬデータベース６は、電子透かしによって管理対象のコンテンツに記録されているＩＤデータと、そのコンテンツの正当なＵＲＬアドレス情報及び正当な持ち主の連絡先となる電子メイルアドレスを格納する。
【００２５】
コンテンツ不正利用探索プログラム３ａは、図１に示すように、探索対象収集部３１、テキスト解析部３２と、優先順位設定部３３、探索対象コンテンツチェック部３４、不正使用判定部３５、警告メイル送信部３６を有する。
【００２６】
探索対象収集部３１は、入力Ｉ／Ｆ部４から入力されたアドレス情報により特定される階層に記述されているテキスト情報を収集し、収集した当該テキスト情報内にリンクされているコンテンツを収集する。
【００２７】
探索対象コンテンツチェック部３４は、探索対象収集部３１が収集した当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを判定する。
【００２８】
テキスト解析部３２は、探索対象収集部３１が収集したテキスト情報に含まれる、テキスト情報の階層から下位階層へ探索するためのリンク情報を解析する。
【００２９】
優先順位設定部３３は、入力Ｉ／Ｆ部４から入力されたキーワードに基づいて、テキスト解析部３２が解析したリンク情報に優先順位を設定する。
【００３０】
そして、探索対象収集部３１は、優先順位設定部３３により設定された優先順位に従って、リンク情報により特定される下位階層に記述されているテキスト情報を収集し、収集した下位階層の当該テキスト情報内にリンクされているコンテンツを収集し、探索対象コンテンツチェック部３４は、探索対象収集部３１が収集した下位階層の当該コンテンツに電子透かしによって埋め込まれた識別子を検出し、検出した識別子により当該コンテンツが不正利用されているか否かを判定する。
【００３１】
また、上記キーワードは、優先順位を上げるプラスキーワードと優先順位を下げるマイナスキーワードのどちらか一方、もしくは両方を含み、優先順位設定部３３は、プラスキーワードとマイナスキーワードのどちらか一方、もしくは両方を含むキーワードに基づいて、下位階層へ探索するためのリンク情報に前記優先順位を設定する。
【００３２】
〔コンテンツ不正利用探索装置の処理動作〕
次に、図３に示すフローチャートを参照して、コンテンツの不正利用を探索する際の、第１の実施形態によるコンテンツ不正利用探索装置２ａの処理動作について説明する。
【００３３】
ステップＳ１及びステップＳ２の処理において、ユーザが、入力Ｉ／Ｆ部４を介して最上位ＵＲＬ（Ｕｎｉｆｏｒｍ　Ｒｅｓｏｕｒｃｅ　Ｌｏｃａｔｏｒ）のアドレス情報と、探索するＵＲＬの優先度のもとになるキーワードを入力する。ここでは、ユーザが、例えば、「アイドル」、「歌手Ａ子」、「グループＢ」、「歌手Ｃ子」、「歌手Ｄ子」、「ＣＧ」、「絵画」をプラスキーワードとして入力したのもとして、以下の処理を説明する。
【００３４】
ステップＳ３の処理において、探索対象収集部３１は、通信制御部５を介して、入力された最上位ＵＲＬのアドレス位置にあるホームページのＨＴＭＬ文章をＲＡＭ２内にダウンロードする。
【００３５】
ステップＳ４の処理において、テキスト解析部３２は、探索対象収集部３１がダウンロードしたＨＴＭＬ文章内に画像等のコンテンツがリンクされているか否かを判別する。そして、判別の結果、コンテンツがリンクされている場合、ステップＳ５の処理に進む。一方、ステップＳ４の判別の結果、探索対象収集部３１がダウンロードしたＨＴＭＬ文章内にコンテンツがリンクされていない場合には、ステップＳ１２の処理に進む。
【００３６】
ステップＳ５の処理において、探索対象収集部３１は、ＨＴＭＬ文章にリンクされているコンテンツをＲＡＭ２にダウンロードする。
【００３７】
ステップＳ６の処理では、探索対象コンテンツチェック部３４は、ＲＡＭ２にダウンロードされたコンテンツの中から電子透かしを検出し、検出した電子透かしにより記録されているＩＤデータを認識する。なお、この電子透かしの方法は様々な方法が想定されるが、例えば、コンテンツが画像データの場合、画素の輝度を表すビット列を操作して電子透かしを埋め込んだり、コンテンツが音楽データの場合は、波形を周波数成分に分解して、位相をずらすなどの処理を施して電子透かしを埋め込んだりする方法などがある。
【００３８】
ステップＳ７〜Ｓ８の処理において、不正使用判定部３５は、ＩＤデータ／ＵＲＬデータベース６に格納されているＩＤデータを読み出して、電子透かしにより記録されているＩＤデータとを比較し、ダウンロードしたコンテンツが管理対象のコンテンツであるか否かを判別する。そして、判別の結果、管理対象となっている場合には、ステップＳ１０の処理において、不正使用判定部３５は、ＩＤデータ／ＵＲＬデータベース６に格納されているＵＲＬアドレスと、検出されたＵＲＬアドレスとを比較して、コンテンツをダウンロードしたＵＲＬアドレスが正当なものであるか否かを判別する。そして、判別の結果、正当なＵＲＬアドレスである場合、ステップＳ１２の処理に進む。一方、ステップＳ１０の判別の結果、正当なＵＲＬアドレスでない場合には、ステップＳ１０の処理に進む。
【００３９】
ステップＳ１０の処理において、不正使用判定部３５は、当該コンテンツは不正利用されているものと判断し、当該コンテンツの情報と送信先のメイルアドレス等を警告メイル送信部３６へ送る。
【００４０】
ステップＳ１１の処理において、警告メイル送信部３６は、当該コンテンツの情報、コンテンツのＵＲＬアドレス情報、不正発見日時等を記載した警告メイルを作成し、通信制御部５を介して送信先の電子メイルアドレス宛てに警告メイルを送信し、ステップＳ１２の処理に進む。
【００４１】
ステップＳ１２の処理において、テキスト解析部３２は、ダウンロードしたＨＴＭＬ文章中の、図４に示すような画像ソースを示すテキスト部分『ｉｍｇ　ｓｒｃ＝』（下線部Ａ）と、次の階層へのリンク先ＵＲＬアドレスを示すテキスト部分『Ａ　ＨＲＥＦ＝』（下線部Ｂ）の近傍（例えば、前後１行の計３行のテキスト部分Ｃ）において、入力Ｉ／Ｆ部４を介して入力されたキーワード（「アイドル」、「歌手Ａ子」、「グループＢ」、「歌手Ｃ子」、「歌手Ｄ子」、「ＣＧ」、「絵画」）が存在するか否かを解析するために、パターン検索を実行する。
【００４２】
ステップＳ１３の処理において、優先順位決定部３３は、入力されたキーワードが存在するテキスト部分が示す画像ソースやリンク先ＵＲＬアドレスの探索優先順位を＋１に設定する。ダウンロードしたＨＴＭＬ文章中のリンク先ＵＲＬアドレスを示すテキスト部分に対して、探索優先順位を設定し終わったら、ステップＳ１４の処理に進む。
【００４３】
ステップＳ１４の処理において、優先順位設定部３３は、設定した探索優先順位が所定の値（例えば１）以上であって、未処理の画像ソースやリンク先ＵＲＬアドレスがあるか否か判別する。そして、判別の結果、探索優先順位が所定の値以上の未処理の画像ソースやリンク先ＵＲＬアドレスがある場合、優先順位設定部３３は、探索優先順位が所定の値以上の画像ソースやリンク先ＵＲＬアドレス、若しくは、探索優先順にソートした画像ソースやリンク先ＵＲＬアドレスの全部又は一部を探索対象収集部３１に送り、ステップＳ１５の処理に進む。一方、判別の結果、探索優先順位が所定の値以上の未処理の画像ソースやリンク先ＵＲＬアドレスがない場合には、一連の探索処理は終了する。
【００４４】
ステップＳ１５の処理において、探索対象収集部３１は、送られたリンク先ＵＲＬアドレスが示すＨＴＭＬ文章を、通信制御部５を介してＲＡＭ２内にダウンロードし、ステップＳ４の処理に戻る。
【００４５】
そして、ステップＳ１５の処理にてダウンロードしたＨＴＭＬ文章内に画像などのコンテンツがリンクされている場合、あるいはステップＳ１４の処理において探索優先順位が所定の値以上の画像ソースがあった場合には、前述のステップＳ５〜ステップＳ１１の処理にてコンテンツをダウンロードし、不正利用されているか否かを判別する。
【００４６】
また、ステップＳ１５の処理にてダウンロードしたＨＴＭＬ文章内にコンテンツがリンクされていない場合、且つステップＳ１４の処理において探索優先順位が所定の値以上の画像ソースがなかった場合には、前述のステップＳ１２以降の、さらに次の下位階層を探索するための処理を行う。
【００４７】
なお、上記探索処理においては、ユーザは、優先的に探索するコンテンツを示すプラスキーワードを入力したものとして説明したが、探索の対象としないコンテンツを示すマイナスキーワードをプラスキーワードと共に若しくは単独で入力してもよい。この場合、ＣＰＵ１は、図５のフローチャートに示すように、入力されたプラスキーワード及びマイナスキーワードに基づいてパターン検索を実行し（ステップＳ１２Ａ）、プラスキーワードが存在するテキスト部分が示す画像ソースやリンク先ＵＲＬアドレスの探索優先順位を＋１に設定し、マイナスキーワードが存在するテキスト部分が示す画像ソースやリンク先ＵＲＬアドレスの探索優先順位を−１に設定する（ステップＳ１３Ａ，ステップＳ１３Ｂ）。
【００４８】
このような構成によれば、例えば、「アイドル」、「歌手Ａ子」、「グループＢ」、「歌手Ｃ子」、「歌手Ｄ子」等をプラスキーワードとし、「ＣＧ」や「絵画」をマイナスキーワードとして入力すると、「ＣＧ」や「絵画」と関連する画像コンテンツは探索対象から除外することが可能となり、写真の画像のみを効率的に探索することを可能となる。なお、図５のフローチャートにおける、他の処理ステップは図３のフローチャートに示すそれと同じであるので、ここではその説明は省略する。
【００４９】
［第２の実施形態］
〔コンテンツ不正利用探索装置の構成〕
第２の実施形態によるコンテンツ不正利用探索装置２ｂは、図６及び図７に示すように、第１の実施形態のコンテンツ不正利用探索装置２ａが備えるＩＤデータ／ＵＲＬデータベース６の代わりに、コンテンツデータベース８とキーワードデータベース９を備える。
【００５０】
コンテンツデータベース８は、図８に示すような、コンテンツのアドレス情報に相当するｎ個のＵＲＬが探索の優先度を付与されて記載されたテーブルデータを記憶する。なお、各アドレス情報はテーブル中でユニークなメインキーとなっており、また、優先度は比較可能な値により表現されている。
【００５１】
キーワードデータベース９は、図９に示すような、ｐ個のキーワードが重要度を付与されてテキスト形式で記載されたデータを記憶する。具体的には、このキーワードは、例えばアーティスト名「歌手Ａ子」、「グループＢ」、「歌手Ｃ子」、「歌手Ｄ子」等の固有名詞に類するテキストであり、重要度の重み付けを行ってユーザにより登録される。なお、重要度は正負どちらの値であっても良く、重要度が正の値である場合には探索の重要度は上がり、逆に重要度が負の値である場合には探索の重要度は下がる。また、ユーザが重要度を細かに指定することが面倒である場合には、多数のハイパーテキストを予め形態素解析し、全文書中の単語出現分布と１文章中の単語出現頻度の統計を取り、ＴＦ（Ｔｅｒｍ　Ｆｒｅｑｕｅｎｃｙ）・ＩＤＦ（Ｉｎｖｅｒｔｅｄ　Ｄｏｃｕｍｅｎｔ　Ｆｒｅｑｕｅｎｃｙ）法等で単語重要度を計算し、単語重要度表を作成し、ユーザがこの単語重要度表を参照して重要度を登録してもよい。具体的には、単語重要度表において、単語「歌手Ａ子」の重要度が０．１７であった場合には、この重要度に適当な常数を掛けた値０．３４をキーワードデータベース９に登録する重要度とすれば、ユーザが単語自体の出現頻度を意識する必要がなくなり、重要度の登録に要する労力を軽減することができる。
【００５２】
コンテンツ不正利用探索プログラム３ｂは、巡回部４１、コンテンツ取得部４２、ハイパーテキスト判定部４３、テキスト解析部４４、コンテンツ登録部４５、不正使用判定部３５、警告メイル送信部３６とから構成される。
【００５３】
入力Ｉ／Ｆ部４からは探索対象のコンテンツを特定する起点アドレス情報がユーザにより入力される。
【００５４】
巡回部４１は、コンテンツデータベース８が管理するアドレス情報を優先度の高い順に取り出して取得リストとして出力する。なお、巡回部４１は、優先度が所定の範囲内にある時、当該所定の範囲を所定の定数で割ることにより生成される複数の範囲の各々について、所定の範囲の最大値から近い順に前記コンテンツデータベース８からアドレス情報を取り出し、取り出したアドレス情報を取得日時が古い順にソートし、古いアドレス情報から優先的に取得リストへ追加し、取得リストに追加するアドレス情報の数が所定の巡回最大数に達した時点で当該取得リストを出力する。
【００５５】
コンテンツ取得部４２は、取得リストに含まれるアドレス情報に基づいて探索対象のコンテンツを取得する。
【００５６】
ハイパーテキスト判定部４３は、コンテンツ取得部４２が取得したコンテンツがリンク情報を含むハイパーテキストであるか否かを判定する。
【００５７】
テキスト解析部４４は、コンテンツ取得部４２が取得したコンテンツがリンク情報を含むハイパーテキストである場合に、当該ハイパーテキストが含むリンク情報を解析して、２次ノードアドレス情報と当該リンク情報から所定の範囲内にある近傍キーワードとを組にして形態素テーブルとして出力する。
【００５８】
コンテンツ登録部４５は、形態素テーブルに含まれる近傍キーワードをキーワードデータベース９から検索し、検索結果に応じてコンテンツデータベース８内の優先度を変動させる変動値を生成し、生成した変動値をアドレス情報と共にコンテンツデータベース８に登録する。
【００５９】
〔コンテンツ不正利用探索装置の処理動作〕
次に、図１０に示すフローチャートを参照しながら、コンテンツの不正利用を判定する際の、コンテンツ不正利用探索装置２ｂの処理動作について説明する。
【００６０】
ステップＳ２０の処理において、ユーザは、入力Ｉ／Ｆ部４を介して起点ノードのアドレス情報を入力する。なお、ここで入力される起点ノードとは、ディレクトリサービス型のポータルサイトを示し、起点ノードのアドレス情報はそのポータルサイトのＵＲＬを意味する。
【００６１】
ステップＳ２１の処理において、巡回部４１は、コンテンツデータベース８からアドレス情報を優先度の高い順に取り出し、コンテンツの取得リストを生成する。なお、この巡回部４１による処理は、データベース言語ＳＱＬ（Ｓｔｒｕｃｔｕｒｅｄ
Ｑｕｅｒｙ　Ｌａｎｇｕａｇｅ）により次のように表現することができる。
【００６２】
【数１】

ステップＳ２２の処理において、コンテンツ取得部４２は、巡回部４１が生成した取得リストから順次アドレス情報を取り出し、通信制御部５を介してインターネット７にアクセスし、アドレス情報が示すコンテンツを取得する。
【００６３】
ステップＳ２３の処理において、コンテンツ取得部４２は、コンテンツデータベース８を参照して、コンテンツを取得したアドレス情報ｂの優先度から予め定められた減衰値Ｒを減算する。なお、このコンテンツ取得部４２による処理は、データベース言語ＳＱＬにより次のように表現することができる。尚、アドレス情報ｂはコンテンツを取得したアドレス情報の識別子を示す。
【００６４】
【数２】

ステップＳ２４の処理において、ハイパーテキスト判定部４３は、コンテンツ取得部４２が取得したコンテンツに含まれるバイナリ・フィンガープリントやヘッダ文字列等を用いてコンテンツ種別を解析することにより、取得したコンテンツがハイパーテキストであるか否かを判別する。そして、判別の結果、ハイパーテキストでない場合、ステップＳ２９の不正発見処理に進む。一方、判別の結果、ハイパーテキストである場合には、ステップＳ２５の処理に進む。
【００６５】
ステップＳ２５の処理において、テキスト解析部４４は、ハイパーテキスト判定部４３が解析したハイパーテキスト及びプレーンテキスト内に含まれるリンク情報を解析する。なお、ここでいう「リンク情報」とは、ハイパーテキストがＨＴＭＬ形式で記述されている場合には、Ａタグ（『＜ａｈｒｅｆ＝”２次アドレス情報”＞アンカーテキスト＜／ａ＞』）や、ＩＭＧタグ（『＜ｉｍｇ　ｓｒｃ＝”２次ノードアドレス情報”ＡＬＴ＝”補足テキスト”＞』）等のテキストタグに相当し、コンテンツ及び他のインターネットサイトへのアクセスに必要な２次ノードアドレス情報を含む。
【００６６】
ステップＳ２６の処理において、テキスト解析部４４は、例えば行単位等の適当な範囲内でリンク情報近傍のテキストを切り出し、アンカーテキスト及び補足テキストと合わせて近傍キーワード群を生成する。
【００６７】
ステップＳ２７の処理において、テキスト解析部４４は、２次ノードアドレス情報と近傍キーワード群をセットにして、例えば図１１に示すような、任意の行を指定した読み出しが可能な形態素テーブルとして出力する。ここで、形態素テーブルにおいて、２次ノードアドレス情報はＵＲＬに類するアドレス情報であり、総数はｏ個となっている。また、２次ノードアドレス情報は同一アドレスが集団と成るようにソートされている。一方、近傍キーワードは、テキストとなっており、重複のないユニークな構成となっている。なお、図１１に示す形態素テーブルは、２次ノードアドレス情報１には近傍キーワード１，２，３が存在し、２次ノードアドレス情報２には近傍キーワード１のみが存在し、２次ノードアドレス情報３には近傍キーワード４のみが存在することを例示している。
【００６８】
ステップＳ２８の処理において、コンテンツ登録部４５は、テキスト解析部４４が出力した形態素テーブルを解析して変動値ｈを算出し、算出した変動値ｈをコンテンツデータベース８の優先度に加算する。なお、このコンテンツ登録処理の詳細については後述する。
【００６９】
ステップＳ２９の処理において、不正使用判定部３５は、第１の実施形態で説明したように、電子透かし抽出アルゴリズム等を利用してコンテンツが不正利用されているか否かを判別する。そして、判別の結果、不正利用でない場合には、ステップＳ２２の処理に戻り、判別の結果、不正利用である場合には、ステップＳ３０の処理に進む。
【００７０】
ステップＳ３０の処理において、不正使用判定部３５は、不正に利用されているコンテンツの情報及びそのアドレス情報をＲＡＭ３に記憶し、警告メイル送信部３６は、不正に利用されているコンテンツの情報及びそのアドレス情報を、電子メイルを利用してユーザに通知し、ステップＳ２２の処理に戻る。
【００７１】
〔コンテンツ登録処理〕
ここで、図１２に示すフローチャートを参照しなから、上記ステップＳ２８におけるコンテンツ登録処理を行う際のコンテンツ不正利用探索装置２ｂの処理動作について詳しく説明する。
【００７２】
図１２に示すフローチャートは、上記ステップＳ２７からステップＳ２８の処理に移行することで処理が開始される。
【００７３】
ステップＳ４１の処理において、コンテンツ登録部４５は、ループ変数ｉに『１』、２次ノードアドレス情報ａに『φ』、変動値ｈに『０』を設定することで、各値を初期化する。
【００７４】
ステップＳ４２の処理において、コンテンツ登録部４５は、形態素テーブルのｉ行目のデータを読み込み、２次ノードアドレス情報を作業用変数ｕに、近傍キーワードを作業用変数ｋにセットする。
【００７５】
ステップＳ４３の処理において、コンテンツ登録部４５は、ステップＳ４２における読み込み処理が成功したか否かを判定する。そして、判定の結果、読み込みが成功した場合には、ステップＳ４７の処理に進む。一方、判定の結果、読み込みが失敗した場合には、ステップＳ４４の処理に進む。
【００７６】
ステップＳ４４の処理では、コンテンツ登録部４５は、コンテンツデータベース８内に『アドレス情報＝ａ』となる行が存在するか否かを判定する。なお、このコンテンツ登録部４５による処理は、データベース言語ＳＱＬにより次のように表現することができる。
【００７７】
【数３】

また、『ａ＝φ』である場合には、コンテンツデータベース８内には対応する行が存在しないものとする。そして、判定の結果、『アドレス情報＝ａ』となる行が存在しない場合には、ステップＳ４６の処理に進む。一方、判定の結果、『アドレス情報＝ａ』となる行が存在する場合には、ステップＳ４５の処理に進む。
【００７８】
ステップＳ４５の処理において、コンテンツ登録部４５は、アドレス情報ａに関連付けされた優先度を更新する。なお、このコンテンツ登録部４５による処理は、データベース言語ＳＱＬにより次のように表現することができる。
【００７９】
【数４】

ステップＳ４６の処理において、コンテンツ登録部４５は、上位層のリンク情報の優先度に変動値ｈを加算した値を優先度とする新たな行をコンテンツデータベース８に追加する。具体的には、今、アドレス情報ａの上位層は、上記ステップＳ２２の処理においてコンテンツ登録部４５が保持しているアドレス情報ｂであるので、コンテンツ登録部４５は、例えば、次のようなデータベース言語ＳＱＬにより、コンテンツデータベース１１からアドレス情報ｂの優先度ｗを取得する。
【００８０】
【数５】

そして、コンテンツ登録部４５は、例えば、次のようなデータベース言語ＳＱＬにより、上位層が持つ優先度に変動値ｈを加算した優先度を有し、アドレス情報ａを主キーとした新規行を追加する。
【００８１】
【数６】

このように処理することにより、「怪しいサイトからリンクを張られた先は怪しいサイトである可能性が高い」という仮定のもとにアドレスの優先度を予め上げておくことが可能となり、一連の登録処理は終了する。
【００８２】
一方、ステップＳ４７の処理において、コンテンツ登録部４５は、２次ノードアドレス情報ａと作業用変数ｕが等しいか否かを判定することにより、同一の２次ノードアドレス情報が形態素テーブル上で連続しているか否かを判別する。
【００８３】
そして、判別の結果、２次ノードアドレス情報ａと作業用変数ｕが等しい場合には、ステップＳ４８の処理として、コンテンツ登録部４５は、作業用変数ｋに設定された近傍キーワードを検索キーとしてキーワードデータベース１２を検索し、近傍キーワードの重要度を抽出する。そして、コンテンツ登録部４５は、抽出した重要度を変動値ｈに加算し、ループ変数ｉを１増数し、ステップＳ４２の処理に戻る。このような処理によれば、２次ノードアドレス情報が連続している場合、同一の２次ノードアドレス情報毎に変動値ｈをまとめることができる。
【００８４】
一方、ステップＳ４７の判別処理の結果、２次ノードアドレス情報ａと作業用変数ｕが等しくない場合には、ステップＳ４９の処理として、コンテンツ登録部４５は、２次ノードアドレス情報ａが空文字列φであるか否かを判別する。そして、判別の結果、２次ノードアドレス情報ａが空文字列φでない場合には、ステップＳ４４ａの処理に進む。
【００８５】
ステップＳ４４ａの処理では、コンテンツ登録部４５は、コンテンツデータベース８内に『アドレス情報＝ａ』となる行が存在するか否かを判定する。
【００８６】
そして、ステップＳ４４ａの判定処理の結果、『アドレス情報＝ａ』となる行が存在しない場合には、ステップＳ４６ａの処理に進み、コンテンツ登録部４５は、上位層のリンク情報の優先度に変動値ｈを加算した値を優先度とする新たな行をコンテンツデータベース８に追加し、変数値ｈに０をセットしてステップＳ５０へ戻る。
【００８７】
また、ステップＳ４４ａの判定処理の結果、『アドレス情報＝ａ』となる行が存在する場合には、ステップＳ４５ａの処理に進み、コンテンツ登録部４５は、アドレス情報ａに関連付けされた優先度を更新し、変数値ｈに０をセットしてステップＳ５０へ戻る。
【００８８】
一方、ステップＳ４９の判定処理の結果、２次ノードアドレス情報ａが空文字列φである場合には、ステップＳ５０の処理に進む。ステップＳ５０の処理において、コンテンツ登録部４５は、２次ノードアドレス情報ａに作業用変数ｕの内容を代入し、ステップＳ４８に進む。なお、上記ステップＳ４９の処理は、形態素テーブルを最初に読み始めた時の処理に相当する。
【００８９】
［第３の実施形態］
第２の実施形態によるコンテンツ不正利用探索装置２ｂにおいて、探索処理の際に生成する取得リストはコンテンツデータベース８に登録されている全アドレス情報をソートしたものであった（上記ステップＳ２１の処理を参照）。しかしながら、このような取得リストの構成によれば、探索対象のサイト数が増加した場合、１つの取得リストの処理に要する時間や、逐次変化しているコンテンツデータベース１１内の優先度を実際のコンテンツ取得に反映させるまでに要する時間が長くなってしまうことがある。
【００９０】
そこで、第３の実施形態となるコンテンツ不正利用探索装置２ｂは、以下のように動作することにより、予め定められた上限順回数ｑ毎に取得リストの見直しを行うようにした。以下、本発明の第３の実施形態によるコンテンツ不正利用探索装置２ｂの構成及び動作について説明する。
【００９１】
〔コンテンツ不正利用探索装置の構成〕
第３の実施形態によるコンテンツ不正利用探索装置２ｂでは、コンテンツデータベース８内に格納されるテーブルデータのデータ形式が、第２の実施形態によるコンテンツ不正利用探索装置２ｂと異なる。
【００９２】
コンテンツデータベース８内に格納されるテーブルデータは、具体的には図１３に示すように、アドレス情報と優先度に加えて更新日時を格納するようにしている。ここで「更新日時」は、比較可能な数値を持つ日付型のデータであり、ここでは単純化して、累積秒（１９７０年０１月０１日を０とした秒数の積算）のデータを日付型のデータとして定義する。
【００９３】
なお、第３の実施形態によるコンテンツ不正利用探索装置２ｂのその他の構成は、第２の実施の形態によるコンテンツ不正利用探索装置２ｂと同じであるので、その説明を省略する。
【００９４】
〔コンテンツ不正利用探索装置の処理動作〕
図１４に示すように、ステップＳ６０の処理において、ユーザは、入力Ｉ／Ｆ部４を介して起点ノードのアドレス情報を入力する。
【００９５】
ステップＳ６１の処理では、巡回部４１は、以下に示す４つの条件に従って、コンテンツデータベース８からコンテンツの取得リストを生成する。なお、この巡回部４１による取得リスト生成処理については、後程詳しく説明する。
【００９６】
（１）取得リストのアドレス情報数ｓが上限順回数ｑを超えない。
【００９７】
（２）優先度が高いアドレス情報から取得リストに追加する
（３）更新日時が古いアドレス情報から取得リストに追加する
（４）最近コンテンツ取得を行ったアドレス情報は追加しない
ステップＳ６２の処理において、コンテンツ取得部４２は、巡回部４１が生成した取得リストから順次アドレス情報を取り出し、通信制御部５を介してインターネット７にアクセスし、アドレス情報が示すコンテンツを取得する。また、コンテンツ取得部４２は、システムクロックを参照して、コンテンツを取得した際の現在日時ｎｏｗを日付型データとして取得し、コンテンツデータベース８内のアドレス情報に対応した更新日時を更新する。なお、この更新日時の更新処理は、データベース言語ＳＱＬにより次のように表現することができる。尚、ｂはコンテンツを取得したアドレス情報の識別子を示す。
【００９８】
【数７】

なお、ステップＳ６３以後の処理は、コンテンツデータベース８に新規行を追加する際（ステップＳ４６の処理に対応）に、コンテンツ登録部４５が、例えば『ＩＮＳＥＲＴ　ＩＮＴＯ　コンテンツデータベース　ＶＡＬＵＥＳ　（ａ，　ｗ＋ｈ，　０）；』のようなデータベース言語ＳＱＬにより、更新日時を日付型における最も古い値（０）に設定する以外、第２の実施形態によるコンテンツ不正利用探索装置２ｂのステップＳ２４以後の処理と同じ内容であるので、ここではその説明を省略する。
【００９９】
〔取得リスト生成処理〕
次に、図１５に示すフローチャートを参照して、ステップＳ６１の取得リスト生成処理を実行する際のコンテンツ不正利用探索装置２ｂの処理動作について説明する。
【０１００】
図１５のフローチャートに示す処理は、ユーザが、ステップＳ６０からステップＳ６１の処理に移行することで処理が開始される。
【０１０１】
ステップＳ８１の処理において、巡回部４１は、コンテンツデータベース８を検索し、優先度が最大値ＭＡＸ以上のコンテンツの優先度を最大値ＭＡＸに、最小値ＭＩＮ以下のコンテンツの優先度を最小値ＭＩＮに設定する（クリッピング処理）。なお、この巡回部４１による処理は、データベース言語ＳＱＬにより次のように表現することができる。
【０１０２】
【数８】

また、最大値ＭＡＸ及び最小値ＭＩＮは予め定められた定数、若しくは、コンテンツデータベース１１内に含まれる全優先度の平均値、分散値に基づいて求められる標準偏差を利用して、それぞれの優先度が正規分布に従うと仮定した場合の、偏差値２５及び偏差値７５に対応する優先度値をそれぞれ最大値ＭＡＸ及び最小値ＭＩＮとする等、毎回統計的に算出される変数であっても構わない。
【０１０３】
ステップＳ８２の処理において、巡回部４１は、ループ変数ｉの値をｄに設定し、取得リストｓｔの値をφに設定することで、各値を初期化する。
【０１０４】
ステップＳ８３の処理において、巡回部４１は、取得区間開始点ｓａ（＝ｓｄ×ｉ）と取得区間終了点ｓｂ（ｓｄ×（ｉ＋１））を計算する。ここで、変数ｓｄは区間変量に相当し、次式により計算される。なお、変数ｄは分割定数である。
【０１０５】
【数９】
ｓｄ＝（最大値ＭＡＸ−最小値ＭＩＮ）／ｄ
ステップＳ８４の処理において、巡回部４１は、システムクロックを参照して現在日時を日付型として取得した後、例えば、次のように示されるデータベース言語ＳＱＬにより、コンテンツデータベース１１を参照して、区間終了点＞優先度≧区間開始点となる優先度を有するアドレス情報を、更新日時で昇順にソートした候補リストを作成する。なお、ｃｕｔは、足切日時定数を示し、予め定義されている値である。
【０１０６】
【数１０】

このような処理によれば、ある区間に収まった優先度を有するアドレス情報の内、古い順に並んだ候補リストｉｔを生成することができる。
【０１０７】
ステップＳ８５の処理において、巡回部４１は、取得リストｓｔ内のアドレス情報数ｓｍと候補リストｉｔ内のアドレス情報数ｉｍを算出し、（ｑ−ｓｍ）＞ｉｍならば『ｉｍ個』、ｉｍ≦（ｑ−ｓｍ）ならば『ｑ−ｓｍ』個のアドレス情報を候補リストｉｔから取り出す（ここで、変数ｑは最大巡回数を示す）。
【０１０８】
そして、巡回部４１は、取り出したアドレス情報を取得リストｓｔに追加する。
【０１０９】
ステップＳ８６の処理において、巡回部４１は、取得リストｓｔ内のアドレス情報数ｓｍと最大巡回数ｑが等しいか否かを判別する。そして、判別の結果、等しい場合には、ステップＳ８９の処理に進む。一方、判別の結果、等しくない場合には、ステップＳ８７の処理に進む。
【０１１０】
ステップＳ８７の処理において、巡回部４１は、ループ変数ｉの値を１減数する。
【０１１１】
ステップＳ８８の処理において、巡回部４１は、ループ変数ｉの値が負であるか否かを判別する。そして、判別の結果、負でない場合には、ステップＳ８３の処理に戻る。一方、負である場合には、ステップＳ８９の処理に進む。
【０１１２】
ステップＳ８９の処理において、巡回部４１は、取得リストｓｔを出力する。なお、ステップＳ８９の処理完了後、ＣＰＵ１は、コンテンツの取得が完了したか否かを監視し、コンテンツの取得が完了するに応じて、ＣＰＵ１は再び上記生成処理を実行する。
【０１１３】
このような処理によれば、コンテンツの取得と解析によってコンテンツデータベース８内の優先度は変動し、ＣＰＵ１は、変動した優先度を参考にしながら、次回に取得すべき優先度の高いサイトを最大巡回数ｑの範囲内で選択するので、優先度を迅速にコンテンツ探索に適用し、戦略性の高い巡回処理が可能となる。
【０１１４】
以上の説明から明らかなように、第１の実施形態によるコンテンツ不正利用探索装置２ｂは、インターネット７上の全てのコンテンツを探索対象にするのではなく、入力されたキーワードに基づいて各リンク情報に探索の優先順位を付与し、この優先順位に基づいてコンテンツを探索するので、注目するコンテンツを効率良く探索し、コンテンツの収集や電子透かしの検出チェックに要する時間を短縮することができる。
【０１１５】
また、第２の実施形態あるいは第３の実施形態によるコンテンツ不正利用探索装置２ｂによれば、あるインターネットサイトを管理する悪意を持つ人間がコンテンツの不正利用を行った場合、不正利用コンテンツへのリンク情報を含んだハイパーテキストは優先度が高いまま残るため、同一サイト内の別のハイパーテキスト若しくは他のインターネットサイトに不正利用コンテンツを移動したとしても、不特定多数の閲覧ユーザを導くためには、元のハイパーテキストに「移転先はこちら」等のアンカーテキストを含んだリンク情報を記述しなければならない。また、リンク情報を含んだページは既に優先度が高いために、迅速にチェックされ、移転先を容易に発見することができる。さらに、優先度の継承によって移転先には最初から高い優先度が与えられるので、迅速にチェックすることができる。すなわち、本発明の第２及び第３の実施の形態となるコンテンツ不正利用探索装置２ｂによれば、悪意を持った人間によるリンク構造やコンテンツの差し替えに容易に対応することができる。
【０１１６】
以上、第１実施形態ないし第３実施形態について詳細に説明したが、本発明は、その精神または主要な特徴から逸脱することなく、他の色々な形で実施することができる。そのため、前述の実施例はあらゆる点で単なる例示に過ぎず、限定的に解釈してはならない。本発明の範囲は、特許請求の範囲によって示すものであって、明細書本文には何ら拘束されない。さらに、特許請求の範囲の均等範囲に属する変形や変更は、全て本発明の範囲内のものである。
【０１１７】
【発明の効果】
本発明によれば、コンテンツの収集や電子透かしの検出チェックに要する時間を短縮し、悪意あるユーザによるリンク構造やコンテンツの差し替えに容易に対応することができる。
【図面の簡単な説明】
【図１】第１の実施形態によるコンテンツ不正利用探索装置の機能構成を例示するブロック図である。
【図２】図１に示すコンテンツ不正利用探索装置のハードウェア構成を例示するブロック図である。
【図３】図１に示すコンテンツ不正利用探索装置によるコンテンツ不正利用探索処理の流れを例示するフローチャートである。
【図４】図３に示すコンテンツ不正利用探索処理において、テキスト解析部により解析されるＨＴＭＬテキストを例示する図である。
【図５】図３に示すコンテンツ不正利用探索処理の応用例を示すフローチャートである。
【図６】第２の実施形態によるコンテンツ不正利用探索装置の機能構成を例示するブロック図である。
【図７】図６に示すコンテンツ不正利用探索装置のハードウェア構成を例示するブロック図である。
【図８】図６に示すコンテンツ不正利用探索装置のコンテンツデータベースのデータ形式を例示する図である。
【図９】図６に示すコンテンツ不正利用探索装置のキーワードデータベースのデータ形式を例示する図である。
【図１０】図６に示すコンテンツ不正利用探索装置によるコンテンツ不正利用探索処理の流れを例示するフローチャートである。
【図１１】図１０に示すコンテンツ不正利用探索処理において用いられる形態素テーブルを例示する図である。
【図１２】図１０に示すコンテンツ不正利用探索処理における、コンテンツ取得処理の流れを例示するフローチャートである。
【図１３】第３の実施形態によるコンテンツ不正利用探索装置における、コンテンツデータベースのデータ形式を例示する図である。
【図１４】第３の実施形態のコンテンツ不正利用探索装置によるコンテンツ不正利用探索処理流れを例示するフローチャートである。
【図１５】図１４に示すコンテンツ不正利用探索処理における、取得リスト生成処理の流れを例示するフローチャートである。
【符号の説明】
１…ＣＰＵ
２…ＲＡＭ
２ａ…コンテンツ不正利用探索装置
２ｂ…コンテンツ不正利用探索装置
３…ＲＯＭ
３ａ…コンテンツ不正利用探索プログラム
３ｂ…コンテンツ不正利用探索プログラム
４…入力Ｉ／Ｆ部
５…通信制御部
６…ＩＤデータ／ＵＲＬデータベース
７…インターネット
８…コンテンツデータベース
９…キーワードデータベース
１１…コンテンツデータベース
１２…キーワードデータベース
３１…探索対象収集部
３２…テキスト解析部
３３…優先順位設定部
３４…探索対象コンテンツチェック部
３５…不正使用判定部
３６…警告メイル送信部
４１…巡回部
４２…コンテンツ取得部
４３…ハイパーテキスト判定部
４４…テキスト解析部
４５…コンテンツ登録部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention searches for content in which information for preventing unauthorized use such as copyright information is embedded as digital watermark information, and determines whether the content is illegally used based on the embedded information. In particular, the present invention relates to a content abuse search device, a content abuse search program, and a content abuse search method, which reduce the time required for collecting content and checking for detection of a digital watermark, and making it easy for a malicious user to replace the link structure or the content. The present invention relates to a technology that makes it possible to respond to
[0002]
[Prior art]
In recent years, for the purpose of protecting the copyright of content, techniques for embedding copyright information of the content into the content with a digital watermark or the like have been energetically studied, and various information embedding methods have been proposed to date. For example, a method of embedding information by changing an MPEG code, especially a DCT coefficient, a motion vector, and a quantization characteristic has been proposed (for example, see Non-Patent Document 1). In addition, a method has been proposed in which an image signal is spread using a PN sequence according to the direct spreading method and signature information is combined with the image (for example, see Non-Patent Document 2).
[0003]
Along with research on such a method of embedding information in content, recently, various systems for judging unauthorized use of content have been provided. For example, purchasers using digital watermarks have been provided. A system has been proposed in which information is embedded in content in advance, and information embedded in content considered to be illegally used is read to determine whether the content is illegally used. In such a system, since it is necessary to search for and obtain contents that are considered to be illegally used, digital watermarking of contents collected using a "web robot", which is a computer program for collecting contents on the Web, is required. Check the information. This web robot constantly visits websites all over the world and monitors whether contents such as images are illegally used. If the result of the monitoring determines that the contents are illegally used, In, the web robot gives some notice to the website that is abusing the content.
[0004]
In addition, the search target pattern to be collected is determined in accordance with the input information of the specified keyword or content, the search target content is collected via the network in accordance with the determined search target pattern, and the collected search target content is illegally used. There is disclosed a technique for judging whether or not the above is the case (for example, see Patent Document 1).
[0005]
[Patent Document 1]
JP-A-2001-76000
[Non-patent document 1]
Nippon Telegraph and Telephone Corporation, "Copyright Information Embedding Method in Digital Video Using DCT", IEICE 1997 Symposium on Cryptography and Information Security, SCIS '97 -31G
[Non-patent document 2]
National Defense Academy, "Watermark Signature Method for Images Using PN Sequence", IEICE 1997 Symposium on Cryptography and Information Security, SCIS '97_26B
[0006]
[Problems to be solved by the invention]
However, in the system for determining unauthorized use of content as described above, since all the content on the Internet is subject to investigation, it takes an enormous amount of time to complete collection of content and checking of digital watermarks. However, the content cannot be searched efficiently. In particular, since an open network such as the Internet is not managed by a specific organization, a malicious person can easily become the administrator and dynamically replace the content and link structure of the content. Given this situation, such a situation should be resolved immediately.
[0007]
Note that the technique disclosed in Japanese Patent Application Laid-Open No. 2001-76000 can reduce the time required for collecting contents and checking digital watermarks to some extent by searching only contents that meet the conditions. However, this method is not necessarily an efficient search method because all the contents that meet the conditions are treated equally.
[0008]
The present invention has been made in view of the circumstances described above, and reduces the time required for content collection and digital watermark detection check, and can easily respond to a link structure or content replacement by a malicious user. It is an object of the present invention to provide a content unauthorized use search device, a content unauthorized use search program, and a content unauthorized use search method, which make it possible to do so.
[0009]
[Means for Solving the Problems]
The content abuse search device (2a) according to the present invention is a content abuse search device (2a) for searching for unauthorized use of content in an electronic network, in which a keyword and address information of the highest hierarchy of the content to be searched are input. Input section (4), and text information described in a hierarchy specified by the address information input from the input section (4), and contents linked in the collected text information And an identifier embedded in the content collected by the search target collecting unit by an electronic watermark, and whether the content is illegally used is determined by the detected identifier. The search target content check unit (34) for determining whether A text analysis unit (32) for analyzing link information for searching from the hierarchy of the text information to a lower hierarchy included in the text information, and the keyword input from the input unit (4), A priority setting unit (33) for setting a priority order to the link information analyzed by the text analysis unit (32), wherein the priority order setting unit (33) includes the priority order included in the keyword. The priority is set in the link information for searching to the lower hierarchy based on one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority, and the search target collection unit (31) Are described in a hierarchy specified by the link information according to the priority set by the priority setting unit (33). The text information is collected, and the content linked in the collected text information is collected, and the search target content check unit (34) adds the content collected by the search target collection unit (31) to the content by electronic watermarking. The embedded identifier is detected, and it is determined whether or not the content is illegally used based on the detected identifier.
[0010]
Further, a content unauthorized use search program according to the present invention is a content unauthorized use search program for causing a content unauthorized use search device (2a) to search for unauthorized use of content in an electronic network. (S1, S2) of inputting the address information of the highest level of the input from the input unit (4), and text described in the level specified by the address information input from the input unit (4) Collecting information by the search target collection unit (31) (S3), and collecting the content linked in the collected text information by the search target collection unit (31) (S5); The unit (31) detects an identifier embedded in the content collected by the digital watermark. A step (S6, S7, S8) of judging whether or not the content is illegally used based on the detected identifier by the search target content check unit (34); and the text collected by the search target collection unit (31). A text analyzing unit (32) for analyzing link information for searching from a hierarchy of the text information to a lower hierarchy included in the information (S12); Setting a priority of the link information analyzed by the text analysis unit (32) by a priority setting unit (33) (S13), and setting the priority set by the priority setting unit (33). In accordance with the above, the text information described in the hierarchy specified by the link information is collected by the search target collecting unit (31). (S15), collecting the content linked in the collected text information by the search target collecting unit (31) (S5), and electronically adding the content collected by the search target collecting unit (31). (S6, S7, S8) detecting the identifier embedded by the watermark and determining whether the content is illegally used by the detected identifier by the search target content check unit (34). In the step (S13) of causing the unauthorized use search device to execute (2a) and setting the priority order, the priority order setting unit (33) determines a plus keyword and a priority order, which are included in the keyword and increase the priority order. Search to the lower hierarchy based on one or both of the negative keywords to lower In order to set the priority order in the link information to be used.
[0011]
Further, a content unauthorized use search method according to the present invention is a content unauthorized use search method by a content unauthorized use search device (2a) for searching for unauthorized use of content in an electronic network, wherein a keyword and a top-level hierarchy of the content to be searched are provided. Steps (S1, S2) in which address information is input from an input unit (4), and search target collection of text information described in a hierarchy specified by the address information input from the input unit (4) (S3) collecting the content linked in the collected text information by the search target collection unit (31); and (S5) collecting the content linked in the collected text information. An identifier embedded in the collected content by a digital watermark is detected, and based on the detected identifier, A step (S6, S7, S8) of determining whether or not the content is illegally used by the search target content check unit (34); and a step of determining whether the content is illegally used is included in the text information collected by the search target collection unit (31). A step (S12) of analyzing link information for searching from a hierarchy of the text information to a lower hierarchy by a text analysis unit (32), and the text information based on the keyword input from the input unit (4). Setting a priority order by the priority order setting unit (33) to the link information analyzed by the analyzing unit (32) (S13); and setting the link information according to the priority order set by the priority order setting unit (33). The text information described in the hierarchy specified by is collected by the search target collection unit (31) (S15) and collected. A step (S5) of collecting the content linked in the text information by the search target collection unit (31), and an identifier embedded in the content collected by the search target collection unit (31) by an electronic watermark. (S6, S7, S8) determining by the search target content check unit (34) whether the content is illegally used based on the detected identifier, and setting the priority order. In step (S13), the priority setting unit (33) determines the lower hierarchy based on one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority included in the keyword. And setting the priority order in the link information for searching for the link.
[0012]
According to such a configuration, not all contents on the electronic network are to be searched, but a search priority is assigned to each link information based on the input keyword, and the contents are determined based on the priority. To explore. According to such a configuration, it is possible to efficiently search for the content of interest and reduce the time required for collecting the content and checking the detection of the digital watermark.
[0013]
Further, a content unauthorized use search device (2b) according to the present invention is a content unauthorized use search device (2b) for searching for unauthorized use of content in an electronic network, and includes address information capable of specifying content and priority of the address information. A content database (8) that manages a set of a degree and an acquisition date and time at which the content was obtained, a keyword database (9) that manages a set of a keyword and the importance of the keyword, and a content to be searched. An input unit (4) for inputting starting address information to be input, a circulating unit (41) for extracting the address information managed by the content database (8) in descending order of the priority, and outputting it as an acquisition list; The content to be searched based on the address information included in the list A content acquisition unit (42) to be acquired, a hypertext determination unit (43) that determines whether the content acquired by the content acquisition unit (42) is hypertext including link information, and the content acquisition unit (42) When the acquired content is a hypertext including link information, the link information included in the hypertext is analyzed, and a neighborhood within a predetermined range from the secondary node address information and the link information is analyzed. A text analysis unit (44) that outputs a set of keywords as a morpheme table, and a search for the nearby keywords included in the morpheme table from the keyword database (9), and according to a search result, the content database (8). Generating a fluctuation value for changing the priority within A content registration unit (45) for registering the content with the address information in the content database (8); and an unauthorized use determination unit (42) for determining whether or not the content acquired by the content acquisition unit (42) is illegally used. 35), and when the priority is within a predetermined range, the circulating unit (41) performs, for each of a plurality of ranges generated by dividing the predetermined range by a predetermined constant, The address information is extracted from the content database (8) in order from the closest to the maximum value of the predetermined range, and the extracted address information is sorted in the order of the date and time of acquisition, and the address information is preferentially added to the acquisition list from the oldest address information. Outputting the acquisition list when the number of the address information to be added to the acquisition list reaches a predetermined maximum number of times. Sign.
[0014]
Further, a content unauthorized use search program according to the present invention is a content unauthorized use search program for causing a content unauthorized use search device (2b) to search for unauthorized use of content in an electronic network. A step of inputting the starting address information to be specified from the input unit (4), a content database for managing a set of the address information capable of specifying the content, the priority of the address information, and the date and time when the content was obtained ( 8) fetching the address information managed by the circulating unit (41) in descending order of the priority and outputting the acquired information as an acquisition list; and retrieving the content to be searched based on the address information included in the acquisition list. Steps acquired by the content acquisition section (42) Determining by the hypertext determination unit (43) whether or not the content obtained by the content obtaining unit (42) is hypertext including link information; and obtaining by the content obtaining unit (42) When the content is hypertext including link information, the text analysis unit (44) analyzes the link information included in the hypertext and is within a predetermined range from the secondary node address information and the link information. Outputting a set of neighboring keywords as a morpheme table as a morpheme table, and a content registration unit (45) for managing the neighboring keywords included in the morpheme table as a set of keywords and the importance of the keywords ( 9) and search for the Generating a change value for changing the priority in the contents database and registering the generated change value together with the address information in the content database (8); (42) causing the content unauthorized use search device to execute the step of determining whether or not the acquired content has been illegally used (2b), and outputting the acquisition list, wherein the traveling unit (41) When the priority is within a predetermined range, for each of a plurality of ranges generated by dividing the predetermined range by a predetermined constant, the content database ( 8), the address information is extracted, and the extracted address information is sorted in ascending order of the acquisition date and time. The address information to be added to the acquisition list preferentially, and the acquisition list is output when the number of the address information to be added to the acquisition list reaches a predetermined cyclic maximum number. I do.
[0015]
Further, a content unauthorized use search method according to the present invention is a content unauthorized use search method by a content unauthorized use search device (2b) for searching for unauthorized use of content in an electronic network, wherein starting address information for specifying content to be searched is provided. Is input from the input unit (4), and is managed by a content database (8) that manages a set of address information capable of specifying content, the priority of the address information, and the date and time when the content was obtained. Fetching the address information by the circulating unit (41) in the descending order of the priority and outputting it as an acquisition list; and fetching the search target content based on the address information included in the acquisition list. ) And obtaining the content A step of determining whether or not the content obtained by (42) is hypertext including link information by a hypertext determining unit (43); and a step of determining whether the content obtained by the content obtaining unit (42) is link information. If the hypertext includes the hypertext, the text analysis unit (44) analyzes the link information included in the hypertext, and forms a pair of secondary node address information and a nearby keyword within a predetermined range from the link information. Outputting as a morpheme table by a content registration unit (45), searching for the neighboring keywords included in the morpheme table from a keyword database (9) that manages the keywords in combination with the importance of the keywords; Depending on the search results, Generating a fluctuation value for changing the degree, registering the generated fluctuation value together with the address information in the content database (8), and acquiring the content by the content acquisition unit (42) by the unauthorized use determination unit (35) Determining whether or not the content has been illegally used. In the step of outputting the acquisition list, when the priority is within a predetermined range, For each of a plurality of ranges generated by dividing a predetermined range by a predetermined constant, the address information is extracted from the content database (8) in the order from the maximum value of the predetermined range, and the extracted address information is The acquisition date and time are sorted in chronological order, oldest address information is preferentially added to the acquisition list, and added to the acquisition list. The acquisition list is output when the number of the address information to be added reaches a predetermined maximum number of tours.
[0016]
According to such a configuration, when the collected content is a hypertext, a keyword near the link information is searched, and the priority is changed according to the presence or absence of the keyword. collect. Also, sites linked from sites with higher priority inherit the priority. According to such a configuration, it is possible to easily cope with replacement of a link structure or content by a malicious person.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
[0018]
It should be noted that the same or equivalent parts and components are denoted by the same or equivalent reference symbols throughout the drawings, and the description thereof will be omitted or simplified.
[0019]
[First Embodiment]
[Configuration of content unauthorized use search device]
The content unauthorized use search device 2a according to the first embodiment of the present invention is configured on a computer device such as a personal computer, a workstation, and a general-purpose computer. Specifically, as shown in FIG. 2, the content unauthorized use search device 2a includes a CPU 1, a RAM 2, a ROM 3, an input I / F unit 4, a communication control unit 5, an ID data / URL database 6, and the Internet 7 It has a connectable configuration.
[0020]
The CPU 1 controls the operation of the content abuse search device 2a according to a computer program stored in the ROM 3. Further, the RAM 2 provides a work area for temporarily storing computer programs and data relating to various processes executed by the CPU 1.
[0021]
The ROM 3 stores various computer programs such as the content unauthorized use search program 3a and data necessary for executing the programs. The ROM 3 includes a recording medium readable by the CPU 1, such as a magnetic or optical recording medium or a semiconductor memory. The computer program and data stored in the recording medium may be configured to receive part or all of the data via the Internet 7.
[0022]
The input interface (I / F) unit 4 is an interface for inputting various information (keywords and the address information of the highest hierarchy of the content to be searched) necessary for executing an unauthorized use search process described later. Take the role of.
[0023]
The communication control unit 5 includes, for example, a data communication protocol such as HTTP (Hyper Text Transfer Protocol) and TCP / IP (Transfer Control Protocol / Internet Protocol), such as SMTP (Simple Mail Transport Protocol) or Simple Mail Protocol (PO). Implement e-mail communication protocol. The communication control unit 5 transmits various data via the Internet 7 using these protocols, and converts the received various data into a format that can be processed by the CPU 1.
[0024]
The ID data / URL database 6 stores the ID data recorded in the content to be managed by the digital watermark, the valid URL address information of the content, and the electronic mail address as the contact information of the valid owner.
[0025]
As shown in FIG. 1, the content unauthorized use search program 3a includes a search target collection unit 31, a text analysis unit 32, a priority setting unit 33, a search target content check unit 34, an unauthorized use determination unit 35, and a warning mail transmission unit. 36.
[0026]
The search target collection unit 31 collects text information described in a hierarchy specified by the address information input from the input I / F unit 4, and collects content linked in the collected text information. .
[0027]
The search target content check unit 34 detects an identifier embedded in the content collected by the search target collection unit 31 with a digital watermark, and determines whether the content is illegally used based on the detected identifier.
[0028]
The text analysis unit 32 analyzes link information included in the text information collected by the search target collection unit 31 for searching from a hierarchy of text information to a lower hierarchy.
[0029]
The priority setting unit 33 sets a priority to the link information analyzed by the text analysis unit 32 based on the keyword input from the input I / F unit 4.
[0030]
Then, the search target collection unit 31 collects text information described in the lower hierarchy specified by the link information according to the priority set by the priority setting unit 33, and stores the text information in the collected lower hierarchy. The search target content check unit 34 detects an identifier embedded in the lower-level content collected by the search target collection unit 31 by an electronic watermark, and the content is linked to the content by the detected identifier. It is determined whether or not an unauthorized use has been made.
[0031]
The keyword includes one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority. The priority setting unit 33 includes one or both of the plus keyword and the minus keyword. Based on the keyword, the priority is set to the link information for searching the lower hierarchy.
[0032]
[Processing operation of content abuse search device]
Next, with reference to a flowchart shown in FIG. 3, a description will be given of a processing operation of the content unauthorized use search device 2a according to the first embodiment when searching for unauthorized use of content.
[0033]
In the processing in steps S1 and S2, the user inputs address information of the highest URL (Uniform Resource Locator) via the input I / F unit 4 and a keyword that is a source of the priority of the URL to be searched. Here, for example, the user may input “idle”, “singer A child”, “group B”, “singer C child”, “singer D child”, “CG”, and “painting” as plus keywords. The following processing will be described.
[0034]
In the process of step S3, the search target collection unit 31 downloads, via the communication control unit 5, the HTML text of the homepage located at the address position of the input highest URL into the RAM2.
[0035]
In the process of step S4, the text analysis unit 32 determines whether or not a content such as an image is linked in the HTML text downloaded by the search target collection unit 31. If the content is linked as a result of the determination, the process proceeds to step S5. On the other hand, as a result of the determination in step S4, if the content is not linked in the HTML text downloaded by the search target collection unit 31, the process proceeds to step S12.
[0036]
In the process of step S5, the search target collection unit 31 downloads the content linked to the HTML text to the RAM 2.
[0037]
In the process of step S6, the search target content check unit 34 detects a digital watermark from the content downloaded to the RAM 2, and recognizes ID data recorded by the detected digital watermark. Various methods are conceivable for this digital watermarking method.For example, when the content is image data, a digital watermark is embedded by manipulating a bit string representing the luminance of a pixel, or when the content is music data, There is a method of decomposing a waveform into frequency components and performing processing such as shifting the phase to embed a digital watermark.
[0038]
In the processing of steps S7 to S8, the unauthorized use determination unit 35 reads the ID data stored in the ID data / URL database 6, compares the read ID data with the ID data recorded by the digital watermark, and determines whether the downloaded content is It is determined whether the content is to be managed. Then, as a result of the determination, if it is determined that the URL is to be managed, in the process of step S10, the unauthorized use determination unit 35 determines the URL address stored in the ID data / URL database 6 and the detected URL address. To determine whether the URL address from which the content was downloaded is valid. If the result of the determination is that the URL address is valid, the process proceeds to step S12. On the other hand, if the result of determination in step S10 is that the URL address is not valid, the process proceeds to step S10.
[0039]
In the process of step S10, the unauthorized use determining unit 35 determines that the content is being illegally used, and sends information on the content and a mail address of the transmission destination to the warning mail transmitting unit 36.
[0040]
In the processing of step S11, the warning mail transmitting unit 36 creates a warning mail in which information of the content, the URL address information of the content, the date and time of detecting the fraud, etc. A warning mail is transmitted to the user, and the process proceeds to step S12.
[0041]
In the process of step S12, the text analysis unit 32 determines the text part “img src =” (underlined part A) indicating the image source as shown in FIG. 4 in the downloaded HTML text and the link destination to the next hierarchy. In the vicinity of a text part “A HREF =” (underlined part B) indicating a URL address (for example, a text part C of a total of three lines of one line before and after), a keyword (“” Perform pattern search to analyze whether or not "idol", "singer A child", "group B", "singer C child", "singer D child", "CG", "painting") exist I do.
[0042]
In the process of step S13, the priority determination unit 33 sets the search priority of the image source or the link destination URL address indicated by the text portion where the input keyword exists to +1. When the search priority has been set for the text portion indicating the link destination URL address in the downloaded HTML text, the process proceeds to step S14.
[0043]
In the process of step S14, the priority setting unit 33 determines whether or not the set search priority is equal to or greater than a predetermined value (for example, 1) and there is an unprocessed image source or link destination URL address. Then, as a result of the determination, if there is an unprocessed image source or link destination URL address whose search priority is equal to or more than a predetermined value, the priority order setting unit 33 sets the image source or link destination whose search priority is equal to or more than the predetermined value. The URL address or all or part of the image source or link destination URL address sorted in the search priority order is sent to the search target collection unit 31, and the process proceeds to step S15. On the other hand, as a result of the determination, if there is no unprocessed image source or link destination URL address whose search priority is equal to or more than the predetermined value, a series of search processing ends.
[0044]
In the process of step S15, the search target collection unit 31 downloads the HTML text indicated by the transmitted link destination URL address into the RAM 2 via the communication control unit 5, and returns to the process of step S4.
[0045]
If content such as an image is linked in the HTML sentence downloaded in the process of step S15, or if there is an image source whose search priority is equal to or more than a predetermined value in the process of step S14, In steps S5 to S11, the content is downloaded, and it is determined whether or not the content is illegally used.
[0046]
If no content is linked in the HTML sentence downloaded in the process of step S15, and if there is no image source whose search priority is equal to or higher than the predetermined value in the process of step S14, the above-described step S12 is performed. Thereafter, processing for searching for the next lower hierarchy is performed.
[0047]
In the above-described search processing, the user has been described as having input a plus keyword indicating content to be searched with priority. However, the user may input a minus keyword indicating content not to be searched together with the plus keyword or alone. Is also good. In this case, as shown in the flowchart of FIG. 5, the CPU 1 executes a pattern search based on the input plus keyword and minus keyword (step S12A), and executes an image source or a link destination indicated by the text portion where the plus keyword exists. The search priority of the URL address is set to +1 and the search priority of the image source or the link destination URL address indicated by the text portion in which the minus keyword exists is set to -1 (steps S13A and S13B).
[0048]
According to such a configuration, for example, “CG” or “painting” is used as a plus keyword for “idol”, “singer A child”, “group B”, “singer C child”, “singer D child”, and the like. When input as a negative keyword, image contents related to “CG” and “painting” can be excluded from search targets, and only a picture image can be efficiently searched. The other processing steps in the flow chart of FIG. 5 are the same as those shown in the flow chart of FIG. 3, and the description thereof is omitted here.
[0049]
[Second embodiment]
[Configuration of content unauthorized use search device]
As shown in FIGS. 6 and 7, the content unauthorized use search device 2b according to the second embodiment uses a content database instead of the ID data / URL database 6 included in the content unauthorized use search device 2a according to the first embodiment. 8 and a keyword database 9.
[0050]
The content database 8 stores, as shown in FIG. 8, table data in which n URLs corresponding to content address information are described with search priorities assigned thereto. Each piece of address information is a unique main key in the table, and the priority is represented by a comparable value.
[0051]
The keyword database 9 stores, as shown in FIG. 9, data in which p keywords are given importance and described in a text format. Specifically, this keyword is a text similar to a proper noun such as an artist name such as “Singer A child”, “Group B”, “Singer C child”, “Singer D child”, and weights importance. Registered by the user. The importance may be either positive or negative. If the importance is a positive value, the importance of the search is increased. Conversely, if the importance is a negative value, the importance of the search is Goes down. Also, when it is troublesome for the user to specify the importance level in detail, many hypertexts are subjected to morphological analysis in advance, and statistics of the word appearance distribution in all documents and the word appearance frequency in one sentence are obtained. The word importance may be calculated by a TF (Term Frequency) / IDF (Inverted Document Frequency) method or the like, a word importance table may be created, and the user may register the importance by referring to the word importance table. Specifically, when the importance of the word “Singer A child” is 0.17 in the word importance table, a value 0.34 obtained by multiplying the importance by an appropriate constant is stored in the keyword database 9. If the importance is to be registered, the user does not need to be aware of the frequency of appearance of the word itself, and the labor required for registering the importance can be reduced.
[0052]
The content unauthorized use search program 3b includes a traveling unit 41, a content acquisition unit 42, a hypertext determination unit 43, a text analysis unit 44, a content registration unit 45, an unauthorized use determination unit 35, and a warning mail transmission unit 36.
[0053]
From the input I / F unit 4, the user inputs start address information specifying the content to be searched.
[0054]
The traveling unit 41 extracts address information managed by the content database 8 in descending order of priority and outputs it as an acquisition list. Note that, when the priority is within a predetermined range, the circulating unit 41 performs, for each of a plurality of ranges generated by dividing the predetermined range by a predetermined constant, the plurality of ranges generated in order from a maximum value in the predetermined range. The address information is extracted from the content database 8, the extracted address information is sorted in the order of the acquisition date and time, and the oldest address information is preferentially added to the acquisition list. The acquisition list is output at the point when it reaches.
[0055]
The content acquisition unit 42 acquires the search target content based on the address information included in the acquisition list.
[0056]
The hypertext determination unit 43 determines whether the content acquired by the content acquisition unit 42 is a hypertext including link information.
[0057]
When the content acquired by the content acquisition unit 42 is a hypertext including link information, the text analysis unit 44 analyzes the link information included in the hypertext and performs a predetermined process based on the secondary node address information and the link information. A morpheme table is output by combining a set of nearby keywords within the range.
[0058]
The content registration unit 45 searches the keyword database 9 for nearby keywords included in the morpheme table, generates a variation value that changes the priority in the content database 8 according to the search result, and stores the generated variation value together with the address information. Register in the content database 8.
[0059]
[Processing operation of content abuse search device]
Next, the processing operation of the content unauthorized use search device 2b when determining unauthorized use of content will be described with reference to the flowchart shown in FIG.
[0060]
In the process of step S20, the user inputs the address information of the origin node via the input I / F unit 4. Note that the origin node input here indicates a directory service type portal site, and the address information of the origin node means the URL of the portal site.
[0061]
In the process of step S21, the traveling unit 41 extracts address information from the content database 8 in descending order of priority, and generates a content acquisition list. The processing by the traveling unit 41 is performed in a database language SQL (Structured).
Query Language) can be expressed as follows.
[0062]
(Equation 1)

In the process of step S22, the content acquisition unit 42 sequentially extracts address information from the acquisition list generated by the traveling unit 41, accesses the Internet 7 via the communication control unit 5, and acquires the content indicated by the address information.
[0063]
In the process of step S23, the content acquisition unit 42 refers to the content database 8 and subtracts a predetermined attenuation value R from the priority of the address information b from which the content has been acquired. The processing by the content acquisition unit 42 can be expressed as follows in the database language SQL. Note that the address information b indicates the identifier of the address information from which the content was acquired.
[0064]
(Equation 2)

In the process of step S24, the hypertext determination unit 43 analyzes the content type using the binary fingerprint, the header character string, and the like included in the content acquired by the content acquisition unit 42, and determines that the acquired content is a hypertext. Is determined. If the result of the determination is that it is not a hypertext, the process proceeds to the fraud detection processing of step S29. On the other hand, if the result of the determination is that the text is a hypertext, the process proceeds to step S25.
[0065]
In the processing in step S25, the text analysis unit 44 analyzes the hypertext analyzed by the hypertext determination unit 43 and the link information included in the plain text. Note that the “link information” here refers to an A tag (“<ahref =“ secondary address information ”> anchor text </a>) when the hypertext is described in the HTML format, It corresponds to a text tag such as an IMG tag (“<img src =“ secondary node address information “ALT =“ supplementary text ”>”) ”, and stores secondary node address information necessary for accessing content and other Internet sites. Including.
[0066]
In the process of step S26, the text analysis unit 44 cuts out the text near the link information within an appropriate range such as a line unit, and generates a nearby keyword group together with the anchor text and the supplementary text.
[0067]
In the process of step S27, the text analysis unit 44 sets the secondary node address information and the neighborhood keyword group, and outputs the set as a morpheme table that can be read out by designating an arbitrary line, for example, as shown in FIG. Here, in the morpheme table, the secondary node address information is address information similar to a URL, and the total number is o. Further, the secondary node address information is sorted so that the same addresses form a group. On the other hand, the nearby keywords are text and have a unique configuration without duplication. In the morpheme table shown in FIG. 11, the secondary node address information 1 includes the

neighborhood keywords

1, 2, 3, the secondary node address information 2 includes only the neighborhood keyword 1, and the secondary node address information 3 illustrates that only the nearby keyword 4 exists.
[0068]
In the process of step S28, the content registration unit 45 analyzes the morpheme table output by the text analysis unit 44 to calculate the variation value h, and adds the calculated variation value h to the priority of the content database 8. The details of the content registration process will be described later.
[0069]
In the process of step S29, the unauthorized use determination unit 35 determines whether or not the content has been illegally used by using a digital watermark extraction algorithm or the like as described in the first embodiment. If the result of the determination is not unauthorized use, the process returns to step S22. If the result of determination is unauthorized use, the process proceeds to step S30.
[0070]
In the process of step S30, the unauthorized use determination unit 35 stores the information of the illegally used content and the address information thereof in the RAM 3, and the warning mail transmitting unit 36 transmits the information of the illegally used content and the information thereof. The user is notified of the address information by electronic mail, and the process returns to step S22.
[0071]
[Content registration process]
Here, with reference to the flowchart shown in FIG. 12, the processing operation of the content unauthorized use search device 2b when performing the content registration process in step S28 will be described in detail.
[0072]
In the flowchart shown in FIG. 12, the process is started by shifting to the process from step S27 to step S28.
[0073]
In the process of step S41, the content registration unit 45 initializes each value by setting the loop variable i to “1”, the secondary node address information a to “φ”, and the fluctuation value h to “0”. .
[0074]
In the process of step S42, the content registration unit 45 reads the data of the i-th row of the morpheme table, and sets the secondary node address information to the work variable u and the nearby keyword to the work variable k.
[0075]
In the process of step S43, the content registration unit 45 determines whether the reading process in step S42 has been successful. Then, as a result of the determination, when the reading is successful, the process proceeds to step S47. On the other hand, as a result of the determination, if the reading has failed, the process proceeds to step S44.
[0076]
In the process of step S44, the content registration unit 45 determines whether or not there is a row of “address information = a” in the content database 8. The processing by the content registration unit 45 can be expressed as follows in a database language SQL.
[0077]
[Equation 3]

If “a = φ”, it is assumed that no corresponding row exists in the content database 8. Then, as a result of the determination, if there is no line where “address information = a” exists, the process proceeds to step S46. On the other hand, as a result of the determination, if there is a row where “address information = a”, the process proceeds to step S45.
[0078]
In the process of step S45, the content registration unit 45 updates the priority associated with the address information a. The processing by the content registration unit 45 can be expressed as follows in a database language SQL.
[0079]
(Equation 4)

In the process of step S46, the content registration unit 45 adds a new line to the content database 8 having a priority obtained by adding the fluctuation value h to the priority of the link information of the upper layer. More specifically, since the upper layer of the address information a is the address information b held by the content registration unit 45 in the process of step S22, the content registration unit 45 stores, for example, the following database The priority w of the address information b is acquired from the content database 11 by the language SQL.
[0080]
(Equation 5)

Then, the content registration unit 45 has a priority obtained by adding the fluctuation value h to the priority of the upper layer and adds a new line with the address information a as a primary key, for example, in the following database language SQL. I do.
[0081]
(Equation 6)

By processing in this way, it is possible to raise the priority of the address in advance on the assumption that the destination linked from a suspicious site is likely to be a suspicious site, and a series of The registration process ends.
[0082]
On the other hand, in the process of step S47, the content registration unit 45 determines whether or not the secondary node address information a and the work variable u are equal, so that the same secondary node address information is continuously displayed on the morpheme table. It is determined whether or not it is.
[0083]
If the secondary node address information a is equal to the work variable u as a result of the determination, the content registration unit 45 performs the process of step S48 by using the nearby keyword set in the work variable k as a search key. The database 12 is searched to extract the importance of the nearby keywords. Then, the content registration unit 45 adds the extracted importance to the fluctuation value h, increments the loop variable i by 1, and returns to the processing of step S42. According to such processing, when the secondary node address information is continuous, the fluctuation value h can be collected for each of the same secondary node address information.
[0084]
On the other hand, if the secondary node address information a is not equal to the work variable u as a result of the determination processing in step S47, the content registration unit 45 determines in step S49 that the secondary node address information a has an empty character string φ. Is determined. If the result of the determination is that the secondary node address information a is not the null character string φ, the process proceeds to step S44a.
[0085]
In the process of step S44a, the content registration unit 45 determines whether or not there is a row in which “address information = a” exists in the content database 8.
[0086]
Then, as a result of the determination processing in step S44a, if there is no line in which “address information = a” exists, the process proceeds to step S46a, and the content registration unit 45 changes the priority of the link information of the upper layer by the variation value. A new row having the value obtained by adding h to the priority is added to the content database 8, the variable value h is set to 0, and the process returns to step S50.
[0087]
Also, as a result of the determination processing in step S44a, if there is a row with “address information = a”, the process proceeds to step S45a, and the content registration unit 45 updates the priority associated with the address information a. Then, the variable value h is set to 0, and the process returns to step S50.
[0088]
On the other hand, if the result of the determination processing in step S49 is that the secondary node address information a is an empty character string φ, the process proceeds to step S50. In the process of step S50, the content registration unit 45 substitutes the contents of the work variable u for the secondary node address information a, and proceeds to step S48. Note that the processing in step S49 corresponds to the processing when reading the morpheme table for the first time.
[0089]
[Third Embodiment]
In the content abuse search device 2b according to the second embodiment, the acquisition list generated at the time of the search process is obtained by sorting all the address information registered in the content database 8 (see the process of step S21 described above). ). However, according to such a configuration of the acquisition list, when the number of sites to be searched increases, the time required for processing one acquisition list and the priority in the content database 11 that is changing sequentially indicate the actual content. The time required to reflect on the acquisition may be long.
[0090]
Therefore, the content unauthorized use search device 2b according to the third embodiment performs the following operation to review the acquisition list every predetermined upper limit order q. Hereinafter, the configuration and operation of the content unauthorized use search device 2b according to the third embodiment of the present invention will be described.
[0091]
[Configuration of content unauthorized use search device]
In the content unauthorized use search device 2b according to the third embodiment, the data format of the table data stored in the content database 8 is different from that of the content unauthorized use search device 2b according to the second embodiment.
[0092]
As shown in FIG. 13, the table data stored in the content database 8 stores the update date and time in addition to the address information and the priority. Here, the “update date and time” is date-type data having a numerical value that can be compared. Here, for simplification, the data of the accumulated seconds (integration of the number of seconds with January 01, 1970 being 0) is used as the date-type data. Is defined as data.
[0093]
The other configuration of the content unauthorized use search device 2b according to the third embodiment is the same as that of the content unauthorized use search device 2b according to the second embodiment, and a description thereof will be omitted.
[0094]
[Processing operation of content abuse search device]
As shown in FIG. 14, in the process of step S60, the user inputs the address information of the origin node via the input I / F unit 4.
[0095]
In the process of step S61, the traveling unit 41 generates a content acquisition list from the content database 8 according to the following four conditions. Note that the acquisition list generation processing by the traveling unit 41 will be described later in detail.
[0096]
(1) The number s of address information in the acquisition list does not exceed the upper limit order number q.
[0097]
(2) Add to the acquisition list from address information with high priority
(3) Add to the acquisition list from the oldest updated address information
(4) Do not add address information for which content was recently acquired
In the process of step S62, the content acquisition unit 42 sequentially extracts address information from the acquisition list generated by the traveling unit 41, accesses the Internet 7 via the communication control unit 5, and acquires the content indicated by the address information. In addition, the content acquisition unit 42 acquires the current date and time now when the content was acquired as date type data with reference to the system clock, and updates the update date and time corresponding to the address information in the content database 8. The update processing of the update date and time can be expressed as follows in the database language SQL. Note that b indicates an identifier of the address information from which the content was acquired.
[0098]
(Equation 7)

In addition, in the processing after step S63, when a new line is added to the content database 8 (corresponding to the processing of step S46), the content registration unit 45 executes, for example, “INSERT INTO content database VALUES (a, w + h, 0); Except that the update date and time are set to the oldest value (0) in the date type by the database language SQL such as "." Here, the description is omitted.
[0099]
[Acquisition list generation processing]
Next, with reference to the flowchart shown in FIG. 15, the processing operation of the content unauthorized use search device 2b when executing the acquisition list generation processing in step S61 will be described.
[0100]
The process shown in the flowchart of FIG. 15 is started when the user shifts from step S60 to step S61.
[0101]
In the process of step S81, the traveling unit 41 searches the content database 8, and sets the priority of the content whose priority is equal to or more than the maximum value MAX to the maximum value MAX and sets the priority of the content whose priority is equal to or less than the minimum value MIN to the minimum value MIN. Set (clipping processing). The processing by the traveling unit 41 can be expressed as follows in the database language SQL.
[0102]
(Equation 8)

The maximum value MAX and the minimum value MIN are determined by using a predetermined constant, or an average value of all priorities contained in the content database 11 and a standard deviation obtained based on a variance value. May be variables calculated statistically each time, such as when the priority values corresponding to the deviation value 25 and the deviation value 75 are assumed to be the maximum value MAX and the minimum value MIN, respectively, assuming that the values follow the normal distribution. .
[0103]
In the process of step S82, the traveling unit 41 initializes each value by setting the value of the loop variable i to d and setting the value of the acquisition list st to φ.
[0104]
In the process of step S83, the traveling unit 41 calculates an acquisition section start point sa (= sd × i) and an acquisition section end point sb (sd × (i + 1)). Here, the variable sd corresponds to an interval variable, and is calculated by the following equation. Note that the variable d is a division constant.
[0105]
(Equation 9)
sd = (maximum value MAX−minimum value MIN) / d
In the process of step S84, the circulating unit 41 acquires the current date and time as a date type by referring to the system clock, and then refers to the content database 11 using, for example, the database language SQL shown below, and ends the section. A candidate list is created by sorting address information having the priority of “point> priority ≧ section start point” in ascending order by update date and time. Note that cut indicates a cut-off date and time constant, and is a predefined value.
[0106]
(Equation 10)

According to such processing, it is possible to generate a candidate list it arranged in chronological order from the address information having the priority included in a certain section.
[0107]
In the process of step S85, the traveling unit 41 calculates the number sm of address information in the acquisition list st and the number im of address information in the candidate list it. If (q−sm)> im, “im”, im ≦ If (q-sm), "q-sm" pieces of address information are extracted from the candidate list it (where the variable q indicates the maximum number of rounds).
[0108]
Then, the traveling unit 41 adds the extracted address information to the acquisition list st.
[0109]
In the process of step S86, the traveling unit 41 determines whether the number sm of address information in the acquisition list st is equal to the maximum traveling number q. If the result of the determination is that they are equal, the process proceeds to step S89. On the other hand, if the result of the determination is not equal, the process proceeds to step S87.
[0110]
In the process of step S87, the traveling unit 41 decrements the value of the loop variable i by one.
[0111]
In the process of step S88, the traveling unit 41 determines whether the value of the loop variable i is negative. If the result of the determination is not negative, the process returns to step S83. On the other hand, if it is negative, the process proceeds to step S89.
[0112]
In the process of step S89, the traveling unit 41 outputs the acquisition list st. After the process of step S89 is completed, the CPU 1 monitors whether or not the acquisition of the content is completed, and upon completion of the acquisition of the content, the CPU 1 executes the generation process again.
[0113]
According to such processing, the priority in the content database 8 fluctuates due to the acquisition and analysis of the content, and the CPU 1 traverses a high-priority site to be acquired next time while referring to the fluctuated priority. Since the selection is made within the range of the number q, the priority can be quickly applied to the content search, and the cyclic processing with high strategy can be performed.
[0114]
As is apparent from the above description, the content unauthorized use search device 2b according to the first embodiment does not search all contents on the Internet 7 but searches each link information based on the input keyword. Since the search priority is assigned and the content is searched based on the priority, the content of interest can be searched efficiently, and the time required for collecting the content and checking the detection of the digital watermark can be reduced.
[0115]
Further, according to the content unauthorized use search device 2b according to the second embodiment or the third embodiment, when a malicious person who manages a certain Internet site performs unauthorized use of the content, a link to the illegally used content is provided. Since hypertexts containing information remain with high priority, even if unauthorized content is moved to another hypertext within the same site or another Internet site, in order to guide an unspecified number of viewing users, Link information including anchor text such as "Click here for destination" must be described in the original hypertext. Also, since the page containing the link information has already been given a high priority, the page is quickly checked, and the relocation destination can be easily found. Furthermore, since the transfer destination is given a high priority from the beginning by inheriting the priority, it is possible to check quickly. That is, according to the contents unauthorized use search device 2b according to the second and third embodiments of the present invention, it is possible to easily cope with the replacement of the link structure or the contents by a malicious person.
[0116]
As described above, the first to third embodiments have been described in detail, but the present invention can be implemented in other various forms without departing from the spirit or main features. Therefore, the above-described embodiment is merely an example in every aspect, and should not be interpreted in a limited manner. The scope of the present invention is defined by the appended claims, and is not limited by the specification. Further, all modifications and changes belonging to the equivalent scope of the claims are within the scope of the present invention.
[0117]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, the time required for content collection and the detection check of a digital watermark can be shortened, and it can respond easily to the link structure and replacement of content by a malicious user.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a functional configuration of a content unauthorized use search device according to a first embodiment;
FIG. 2 is a block diagram illustrating a hardware configuration of a content unauthorized use search device shown in FIG. 1;
FIG. 3 is a flowchart illustrating a flow of a content unauthorized use search process performed by the content unauthorized use search device illustrated in FIG. 1;
4 is a diagram exemplifying an HTML text analyzed by a text analysis unit in the content unauthorized use search processing shown in FIG. 3;
FIG. 5 is a flowchart showing an application example of the content unauthorized use search processing shown in FIG. 3;
FIG. 6 is a block diagram illustrating a functional configuration of a content unauthorized use search device according to a second embodiment;
FIG. 7 is a block diagram illustrating a hardware configuration of the content abuse search device shown in FIG. 6;
8 is a diagram illustrating a data format of a content database of the content abuse search device shown in FIG. 6;
9 is a diagram illustrating a data format of a keyword database of the content abuse search device shown in FIG. 6;
FIG. 10 is a flowchart illustrating a flow of a content unauthorized use search process performed by the content unauthorized use search device illustrated in FIG. 6;
11 is a diagram illustrating a morpheme table used in the content unauthorized use search processing shown in FIG. 10;
12 is a flowchart illustrating a flow of a content acquisition process in the content unauthorized use search process shown in FIG.
FIG. 13 is a diagram exemplifying a data format of a content database in the content unauthorized use search device according to the third embodiment.
FIG. 14 is a flowchart illustrating a content unauthorized use search processing flow by the content unauthorized use search device according to the third embodiment;
FIG. 15 is a flowchart illustrating a flow of an acquisition list generation process in the content unauthorized use search process shown in FIG. 14;
[Explanation of symbols]
1 ... CPU
2 ... RAM
2a ... Content illegal use search device
2b Content illegal use search device
3 ROM
3a: Content illegal use search program
3b ... Content illegal use search program
4: Input I / F section
5 Communication control unit
6 ... ID data / URL database
7 ... Internet
8 Content database
9 ... Keyword database
11 ... Content database
12 ... Keyword database
31 ... Search target collection unit
32: Text analysis unit
33: priority setting section
34: Search target content check unit
35 ... Unauthorized use determination unit
36 ... Warning mail sending unit
41 ... Circulation section
42 ... Content acquisition unit
43 ... Hypertext determination unit
44 ... Text analysis unit
45 ... Content registration section

Claims

A content abuse search device that searches for unauthorized use of content in an electronic network,
An input unit for inputting a keyword and address information of the highest hierarchy of the content to be searched;
A search target collection unit that collects text information described in a hierarchy specified by the address information input from the input unit, and collects content linked in the collected text information,
A search target content check unit that detects an identifier embedded by an electronic watermark in the content collected by the search target collection unit and determines whether the content is illegally used based on the detected identifier;
A text analysis unit that analyzes link information for searching from a hierarchy of the text information to a lower hierarchy, which is included in the text information collected by the search target collection unit,
Based on the keyword input from the input unit, and a priority setting unit that sets a priority to the link information analyzed by the text analysis unit,
The priority setting unit may include, based on one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority included in the keyword, link information for searching to the lower hierarchy. Set the priorities,
The search target collection unit collects text information described in a hierarchy specified by the link information according to the priority set by the priority setting unit, and is linked into the collected text information. The search target content check unit detects an identifier embedded in the content collected by the search target collection unit by an electronic watermark, and determines whether the detected identifier is used illegally by the detected identifier. A content unauthorized use search device, characterized in that:

A content unauthorized use search program for causing a content unauthorized use search device to search for unauthorized use of content in an electronic network,
Inputting the keyword and the address information of the highest hierarchy of the content to be searched from the input unit;
The text information described in the hierarchy specified by the address information input from the input unit is collected by a search target collection unit, and the content linked in the collected text information is collected by the search target collection. Collecting by the department;
Detecting an identifier embedded by a digital watermark in the content collected by the search target collection unit, and determining whether the content is illegally used by the detected identifier by a search target content check unit;
Included in the text information collected by the search target collection unit, link information for searching from the hierarchy of the text information to a lower hierarchy, analyzing by a text analysis unit,
Based on the keyword input from the input unit, the link information analyzed by the text analysis unit, setting a priority by a priority setting unit,
According to the priority order set by the priority order setting unit, text information described in a hierarchy specified by the link information is collected by the search target collection unit, and is linked in the collected text information. Collecting content by the search target collection unit;
Detecting an identifier embedded in the content collected by the search target collection unit with a digital watermark, and determining whether the content is illegally used by the detected identifier by the search target content check unit. Causing the content abuse search device to execute,
In the step of setting the priority, the priority setting unit sets the lower hierarchy based on one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority included in the keyword. A program for causing a user to set the above-mentioned priorities in link information for searching for contents.

A content unauthorized use search method by a content unauthorized use search device that searches for unauthorized use of content in an electronic network,
Inputting the keyword and the address information of the highest hierarchy of the content to be searched from the input unit;
The text information described in the hierarchy specified by the address information input from the input unit is collected by a search target collection unit, and the content linked in the collected text information is collected by the search target collection. Collecting by the department;
Detecting an identifier embedded by a digital watermark in the content collected by the search target collection unit, and determining whether the content is illegally used by the detected identifier by a search target content check unit;
Included in the text information collected by the search target collection unit, link information for searching from the hierarchy of the text information to a lower hierarchy, analyzing by a text analysis unit,
Based on the keyword input from the input unit, the link information analyzed by the text analysis unit, setting a priority by a priority setting unit,
According to the priority order set by the priority order setting unit, text information described in a hierarchy specified by the link information is collected by the search target collection unit, and is linked in the collected text information. Collecting content by the search target collection unit;
Detecting an identifier embedded in the content collected by the search target collection unit with a digital watermark, and determining whether the content is illegally used by the detected identifier by the search target content check unit. Have
In the step of setting the priority, the priority setting unit may include the lower hierarchy based on one or both of a plus keyword for increasing the priority and a minus keyword for decreasing the priority included in the keyword. A method of searching for unauthorized use of contents, wherein the priority order is set in link information for searching for contents.

A content abuse search device that searches for unauthorized use of content in an electronic network,
A content database that manages a set of address information capable of specifying content, the priority of the address information, and the date and time when the content was obtained,
A keyword database that manages the keywords in combination with the importance of the keywords,
An input unit for inputting start address information for specifying content to be searched;
A traveling unit that extracts the address information managed by the content database in order of the priority and outputs the acquired information as an acquisition list;
A content acquisition unit that acquires the search target content based on the address information included in the acquisition list,
A hypertext determination unit that determines whether the content acquired by the content acquisition unit is hypertext including link information,
When the content obtained by the content obtaining unit is hypertext including link information, the link information included in the hypertext is analyzed, and the content is within a predetermined range from the secondary node address information and the link information. A text analysis unit that outputs a morpheme table as a set of neighboring keywords,
The keyword database is searched for the neighboring keywords included in the morphological table, and a variation value for changing the priority in the content database is generated according to a search result. The generated variation value is stored in the content together with the address information. A content registration unit for registering in the database,
An unauthorized use determining unit that determines whether the content obtained by the content obtaining unit is used illegally,
The traveling unit, when the priority is within a predetermined range, for each of a plurality of ranges generated by dividing the predetermined range by a predetermined constant, in order from a maximum value of the predetermined range, The address information is extracted from the content database, the extracted address information is sorted in the order of the oldest acquisition date and time, the oldest address information is preferentially added to the acquisition list, and the number of the address information to be added to the acquisition list is A content abuse search device that outputs the acquisition list when a predetermined maximum number of tours is reached.

A content unauthorized use search program for causing a content unauthorized use search device to search for unauthorized use of content in an electronic network,
A step in which start address information for specifying the content to be searched is input from the input unit;
The circulating unit extracts the address information managed by the content database that manages the address information capable of specifying the content, the priority of the address information, and the acquisition date and time at which the content was obtained, in the descending order of the priority. Outputting as an acquisition list;
Acquiring the content to be searched based on the address information included in the acquisition list by a content acquisition unit,
A step of determining whether or not the content obtained by the content obtaining unit is hypertext including link information by a hypertext determining unit,
When the content acquired by the content acquisition unit is hypertext including link information, the text analysis unit analyzes the link information included in the hypertext, and performs a predetermined process based on the secondary node address information and the link information. Outputting as a morpheme table a set of neighboring keywords in the range,
A content registration unit searches for the nearby keywords included in the morpheme table from a keyword database that manages the keywords in combination with the importance of the keywords, and changes the priority in the content database according to a search result. Generating a fluctuation value, and registering the generated fluctuation value in the content database together with the address information;
By the unauthorized use determining unit, determining whether the content acquired by the content acquiring unit has been unauthorizedly used, and causing the content unauthorized use search device to execute the
In the step of outputting the acquisition list, when the priority is within a predetermined range, the circulating unit performs the predetermined processing for each of the plurality of ranges generated by dividing the predetermined range by a predetermined constant. Fetching the address information from the content database in order from the maximum value of the range, sorting the fetched address information in chronological order of the acquisition date and time, and preferentially adding to the acquisition list from the oldest address information, A function for outputting the acquisition list when the number of the address information to be added to the predetermined number reaches a predetermined maximum number of tours.

A content unauthorized use search method by a content unauthorized use search device that searches for unauthorized use of content in an electronic network,
A step in which start address information for specifying the content to be searched is input from the input unit;
The circulating unit extracts the address information managed by the content database that manages the address information capable of specifying the content, the priority of the address information, and the acquisition date and time at which the content was obtained, in the descending order of the priority. Outputting as an acquisition list;
Acquiring the content to be searched based on the address information included in the acquisition list by a content acquisition unit,
A step of determining whether or not the content obtained by the content obtaining unit is hypertext including link information by a hypertext determining unit,
When the content acquired by the content acquisition unit is hypertext including link information, the text analysis unit analyzes the link information included in the hypertext, and performs a predetermined process based on the secondary node address information and the link information. Outputting as a morpheme table a set of neighboring keywords in the range,
A content registration unit searches for the nearby keywords included in the morpheme table from a keyword database that manages the keywords in combination with the importance of the keywords, and changes the priority in the content database according to a search result. Generating a fluctuation value, and registering the generated fluctuation value in the content database together with the address information;
By an unauthorized use determining unit, determining whether or not the content acquired by the content acquiring unit has been illegally used,
In the step of outputting the acquisition list, when the priority is within a predetermined range, the circulating unit performs the predetermined processing for each of the plurality of ranges generated by dividing the predetermined range by a predetermined constant. Fetching the address information from the content database in order from the maximum value of the range, sorting the fetched address information in chronological order of the acquisition date and time, and preferentially adding to the acquisition list from the oldest address information, And outputting the acquisition list when the number of pieces of address information to be added to the predetermined number reaches a predetermined maximum number of rounds.