JP5051237B2

JP5051237B2 - Inappropriate content detection method and apparatus, computer program thereof, and content publishing system

Info

Publication number: JP5051237B2
Application number: JP2009537912A
Authority: JP
Inventors: 恭二平田; 芹沢　　昌宏
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-10-19
Filing date: 2008-10-14
Publication date: 2012-10-17
Anticipated expiration: 2028-10-14
Also published as: JPWO2009050877A1; WO2009050877A1

Description

【技術分野】
【０００１】
本発明は、投稿サイトにおける不適切コンテンツ検出方法、不適切コンテンツ検出装置、そのコンピュータプログラム、およびコンテンツ公開システムに関する。
【背景技術】
【０００２】
インターネットなどにて展開されるサービスの一つに、掲示板サービスあるいは動画または静止画投稿サービスがある。掲示板等のこれらのサービスでは、不特定多数の利用者が、画像、音声、テキスト等のデータ（コンテンツ）をアップロードしたり、他人のアップロードしたコンテンツを自由に閲覧したりすることができる。
【０００３】
このような掲示板等への投稿を、投稿する利用者の自由に任せると、他人の著作権物をコピーまたは模造したいわゆる不正コンテンツや不法コンテンツが大量に出回ってしまう危険性がある。現在不正コンテンツは、サービス提供者が定期的に監視を行うか、または著作権者がサービス提供者にクレームして該コンテンツを削除するといった枠組みで運用されているが、掲載されるコンテンツが大量になった場合、そのすべてを人手でチェックすることは困難である。
【０００４】
投稿されたコンテンツの中から不適切なデータを検出するシステムの一例が特許文献１に記載されている。同文献に記載されたシステムは、予め不適切なコンテンツ（画像、音声、テキスト等）をみつけるためのベースとなるコンテンツのサンプルを蓄えておき、新規にコンテンツ（画像、音声、テキスト等）が投稿されるたびに、前記蓄えられた不適切なコンテンツをみつけるためのベースとなるサンプルとの類似性を評価し、類似していると評価された場合に不適切なコンテンツであるとして抽出する。
【特許文献１】
特開２００６−２９３４５５号公報
【特許文献２】
特開平８−１８０１７６号公報
【特許文献３】
特開２０００−２５９８３２号公報
【特許文献４】
特開２０００−３３９４７４号公報
【発明の開示】
【０００５】
しかしながら、上記特許文献１に記載されている方法は、不適切なコンテンツをみつけるためのベースとなるコンテンツのサンプルとの類似度に基づくものであり、不適切コンテンツを抽出するために、不適切コンテンツをみつけるためのベースとなるコンテンツをサンプルとして用意する必要がある点で改善の余地を有していた。
【０００６】
特に、掲示板などのサービスで不特定多数の利用者が投稿する、画像、音声、テキスト等のデータ（コンテンツ）に対して、他人の著作権物をコピーまたは模造したいわゆる不正コンテンツまたは不法コンテンツを検出するために、予め不正画像データをサンプルとして用意し、日々掲示板などインターネットにアップされるすべての不正画像データに対応することは困難であった。
【０００７】
日々、多種多様なコンテンツが生成、流通する中で、著作権的に保護すべきコンテンツをすべて、予め不適切コンテンツのベースとなるコンテンツのサンプルとして用意しておくのはきわめて困難である。掲示板などのサービス母体が、コンテンツを作成していることは少ないため、事前に保護すべきコンテンツを入手することは困難であることに加え、掲示板などに投稿されるコンテンツの範囲は非常に多岐にわたっており、すべての投稿に対してもれなく不適切コンテンツをみつけるためのベースとなるサンプルとして用意することは非常に困難であるといえる。
【０００８】
本発明の目的は、上述した課題である、不正コンテンツサンプルや辞書データを予め用意する困難さを解決する不適切コンテンツ検出方法、不適切コンテンツ検出装置、そのコンピュータプログラム、およびコンテンツ公開システムを提供することにある。
【０００９】
本発明の第１の不適切コンテンツ検出方法は、不適切コンテンツ検出装置が、
個々の利用者から投稿されたコンテンツを受け付け、
受け付けた複数の前記投稿されたコンテンツを用いて、複数の前記投稿されたコンテンツの相互の類似度を算出し、
前記相互の類似度に基づき、前記投稿されたコンテンツが著作権的に適切であるか否かの判定を行う。
【００１０】
本発明の第２の不適切コンテンツ検出方法は、不適切コンテンツ検出装置が、
個々の利用者から投稿されたコンテンツを受け付け、
受け付けた複数の前記投稿されたコンテンツを用いて、複数の前記投稿されたコンテンツ間で算出された相互の類似度に基づき、複数の前記投稿されたコンテンツが相互に類似しているか否かを判定し、類似していた場合に、相互に類似しているコンテンツ群を著作権的に不適切なコンテンツとして検出する。
［００１１］
本発明の不適切コンテンツ検出装置は、個々の利用者から投稿されたコンテンツの入力を受け付けるコンテンツ受付手段と、
受け付けた複数の前記投稿されたコンテンツを用いて、複数の前記投稿されたコンテンツの相互の類似度を算出し、前記類似度に基づき、相互に類似するコンテンツ群を検出する類似コンテンツ検出手段と、
検出された前記類似するコンテンツ群に基づき、著作権的に不適切なコンテンツを判定する不正判定手段と、を備える。
［００１２］
本発明のコンピュータプログラムは、コンピュータに、個々の利用者から投稿されたコンテンツから著作権的に不適切なコンテンツを検出する不適切コンテンツ検出装置を実現させるためのコンピュータプログラムであって、
前記コンピュータに、
個々の利用者から投稿されたコンテンツの入力を受け付ける手順と、
受け付けた複数の前記投稿されたコンテンツを用いて、複数の前記投稿されたコンテンツの相互の類似度を算出し、前記類似度に基づき、相互に類似するコンテンツ群を検出する手順、
検出された前記類似するコンテンツ群に基づき、著作権的に不適切なコンテンツを判定する手順、を実行させる。
［００１３］
本発明の第１のコンテンツ公開システムは、投稿されたコンテンツを利用者が閲覧できるように公開するコンテンツ公開システムであって、
上記不適切コンテンツ検出装置によって検出された前記不適切なコンテンツをシステム管理者に提示する提示手段と、
前記システム管理者が確認した後、前記システム管理者から削除指示を受け付ける受付手段と、
前記削除指示に従い、前記不適切なコンテンツの削除を行う削除手段と、を備える。
【００１４】
本発明の第２のコンテンツ公開システムは、投稿されたコンテンツを利用者が閲覧できるように公開するコンテンツ公開システムであって、
上記不適切コンテンツ検出装置によって相互に類似していると判定されたコンテンツ数が、所定数より多いか否かを判定する判定手段と、
相互に類似していると判定されたコンテンツ数が所定数より多い場合に、当該コンテンツへの利用者のアクセスを自動的に停止する制御手段と、を備える。
【００１５】
なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。
【００１６】
また、本発明の各種の構成要素は、必ずしも個々に独立した存在である必要はなく、複数の構成要素が一個の部材として形成されていること、一つの構成要素が複数の部材で形成されていること、ある構成要素が他の構成要素の一部であること、ある構成要素の一部と他の構成要素の一部とが重複していること、等でもよい。
【００１７】
また、本発明の不適切コンテンツ検出方法およびコンピュータプログラムには複数の手順を順番に記載してあるが、その記載の順番は複数の手順を実行する順番を限定するものではない。このため、本発明の不適切コンテンツ検出方法およびコンピュータプログラムを実施するときには、その複数の手順の順番は内容的に支障しない範囲で変更することができる。
【００１８】
さらに、本発明の不適切コンテンツ検出方法およびコンピュータプログラムの複数の手順は個々に相違するタイミングで実行されることに限定されない。このため、ある手順の実行中に他の手順が発生すること、ある手順の実行タイミングと他の手順の実行タイミングとの一部ないし全部が重複していること、等でもよい。
【００１９】
本発明によれば、不正コンテンツサンプルや辞書データを予め用意することなく、投稿されるコンテンツの中から不適切なコンテンツを効率よく検出できる不適切コンテンツ検出方法、不適切コンテンツ検出装置、そのコンピュータプログラム、およびコンテンツ公開システムが提供される。
【図面の簡単な説明】
【００２０】
上述した目的、およびその他の目的、特徴および利点は、以下に述べる好適な実施の形態、およびそれに付随する以下の図面によってさらに明らかになる。
【００２１】
【図１】本発明の実施の形態に係る不適切コンテンツ検出装置の構成を示すブロック図である。
【図２】本発明の実施の形態における不適切コンテンツ検出の手順の一例を示すフローチャートである。
【図３】本発明の実施の形態に係る不適切コンテンツ検出装置の構成を示すブロック図である。
【図４】本発明の実施の形態における不適切コンテンツ検出の手順の一例を示すフローチャートである。
【図５】本発明の実施の形態に係る不適切コンテンツ検出装置の構成を示すブロック図である。
【図６】本発明の実施の形態における不適切コンテンツ検出の手順の一例を示すフローチャートである。
【図７】本発明の実施の形態に係る確認サーバの構成を示すブロック図である。
【図８】本発明の実施の形態に係るコンテンツ公開システムの構成を示す図である。
【図９】本発明の不適切コンテンツ検出方法の実施例を模式的に示した図である。
【図１０】図９で投稿されたコンテンツの類似の状況を模式的に示した図である。
【図１１】動画の類似例を模式的に示した図である。
【図１２】静止画の類似例を模式的に示した図である。
【発明を実施するための最良の形態】
【００２２】
以下、本発明の実施の形態について、図面を用いて説明する。なお、すべての図面において、同様な構成要素には同様の符号を付し、適宜説明を省略する。
【００２３】
（第一の実施の形態）
図１は、本発明の実施の形態に係る不適切コンテンツ検出装置１００の構成を示す機能ブロック図である。本実施形態の不適切コンテンツ検出装置１００は、複数の投稿されたコンテンツの相互の類似度を算出し、相互の類似度に基づき、投稿されたコンテンツが著作権的に適切であるか否かの判定を行うものである。また、本実施形態において、不適切コンテンツ検出装置１００は、投稿された複数のコンテンツの相互の類似度に基づき、複数の投稿されたコンテンツが相互に類似しているか否かを判定し、類似していた場合に、相互に類似しているコンテンツ群を著作権的に不適切なコンテンツとして検出する。
【００２４】
ここで、類似度とは、少なくとも二つのコンテンツ間が相互に類似しているか否かを示す尺度であり、たとえば、特許文献１の段落００１０および００１１、または特許文献４に記載されているような公知の技術により算出できる。
【００２５】
本実施形態の不適切コンテンツ検出装置１００は、投稿されたコンテンツの入力を受け付けるコンテンツ入力受付部１１０と、複数の投稿されたコンテンツの相互の類似度を算出し、類似度に基づき、相互に類似するコンテンツ群を検出する類似コンテンツ検出部１３４と、検出された類似するコンテンツ群に基づき、著作権的に不適切なコンテンツを判定する不正判定部１３６と、を備える。
【００２６】
また、本実施形態の不適切コンテンツ検出装置１００は、投稿されたコンテンツからその特徴量を抽出する特徴量抽出部１３２をさらに備え、類似コンテンツ検出部１３４は、複数の投稿されたコンテンツの特徴量を相互に照合して、特徴量の相互の類似度を算出し、相互に類似するコンテンツ群を検出する。
【００２７】
より詳細には、不適切コンテンツ検出装置１００は、コンテンツ入力受付部１１０と、コンテンツ記憶部１２０と、特徴量抽出部１３２と、類似コンテンツ検出部１３４と、不正判定部１３６と、不適切コンテンツ出力部１４０と、を備えている。
【００２８】
なお、以下の各図において、本発明の本質に関わらない部分の構成については省略してある。
また、不適切コンテンツ検出装置１００の各構成要素は、任意のコンピュータのＣＰＵ、メモリ、メモリにロードされた本図の構成要素を実現するプログラム、そのプログラムを格納するハードディスクなどの記憶ユニット、ネットワーク接続用インタフェースを中心にハードウエアとソフトウエアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。以下説明する各図は、ハードウエア単位の構成ではなく、機能単位のブロックを示している。
【００２９】
コンテンツ入力受付部１１０は、個々の利用者から次々に投稿されるコンテンツの入力を受け付ける。一例として、コンテンツ入力受付部１１０は、インターネット経由で利用者より投稿されたメッセージをキャプチャして、コンテンツ部分を切り出すプログラムを搭載したＣＰＵ（Central Processing Unit）である。入力されるコンテンツとは、たとえば、掲示板などに投稿された、画像（動画像、静止画）、音声（音響、音楽）、テキストデータなどであり、著作権的に保護すべきコンテンツである。
【００３０】
入力されるコンテンツは、利用者からの投稿を直接キャプチャする形で実現されてもよいし、一度外部のサーバ（不図示）等に蓄えられた後に逐次または一括して入力するような形であってもよい。投稿の形態は、ＷＷＷブラウザを使ってアップロードする形態であってもよいし、ｅ−ｍａｉｌ等にコンテンツを添付して送付するような形態であってもよい。また特定のＦＴＰサイトに登録するような形態であってもかまわない。
【００３１】
コンテンツ記憶部１２０は、コンテンツ入力受付部１１０が受け付けたコンテンツを蓄積する。一例として、コンテンツ記憶部１２０は、ハードディスク、フラッシュメモリなどの記憶装置であり、専用の蓄積装置であっても、他の蓄積装置との兼用であってもよい。
【００３２】
特徴量抽出部１３２は、コンテンツ間の一致の検出、または類似度を効果的に算出するための特徴量を、コンテンツ入力受付部１１０が受け付けた各コンテンツから抽出する。一例として、特徴量抽出部１３２は、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。たとえば、コンテンツが映像情報の場合、映像中の各フレームで観測される色情報のヒストグラムを抽出するように動作してもよいし、音楽情報の場合、各時刻における周波数成分を抽出するように動作してもよい。
【００３３】
一例として、特徴量抽出部１３２は、上記特許文献２に記載された画像インデックス生成や、上記特許文献３に記載された、代表レイアウト特徴量抽出のような手順で特徴量の抽出を行う。特徴量抽出に関しては、画像や映像または音響情報などのコンテンツの相互の類似度または一致度が算出できれば、特許文献２や特許文献３に記載された特徴量でなくてもかまわない。
また、特徴量抽出部１３２で抽出された特徴量は、特徴量蓄積部（不図示）に蓄積してもよいし、類似度算出毎にコンテンツ記憶部１２０に記憶されているコンテンツから抽出してもよい。
【００３４】
類似コンテンツ検出部１３４は、コンテンツ間の類似度判定または一致検出を行うために、特徴量抽出部１３２により抽出された特徴量の相互の類似度を算出し、予め定められた特定閾値以上の類似度を持つコンテンツ群を検出する。検出した結果は類似コンテンツ情報として一時記憶部（不図示）に記憶し、後述する不正判定部１３６に出力する。一例として、類似コンテンツ検出部１３４は、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。
【００３５】
類似するコンテンツの検出方法として、たとえば、特許文献３に記載された、代表レイアウト特徴量に基づく類似コンテンツ検出を用いる。コンテンツの一致に際しては、映像中の一部または全部のフレームの一致または類似や、静止画像の一部または全体の一致または類似、音響の一部または全体の一致または類似を検出できれば、特許文献３以外の方法で検出を行ってもよい。
【００３６】
すなわち、本実施形態の不適切コンテンツ検出装置１００において、類似コンテンツ検出部１３４は、投稿されたコンテンツが映像であった際に、映像に含まれる一部または全部のフレーム群が相互に類似しているときに、コンテンツ間の類似度が閾値より高いとすることができる。また、類似コンテンツ検出部１３４は、投稿されたコンテンツが静止画像であった際に、静止画像の一部または全体が類似しているときに、コンテンツ間の類似度が閾値より高いとすることができる。さらに、類似コンテンツ検出部１３４は、投稿されたコンテンツが音響または音楽を含む際に、音響または音楽の一部または全体のフレーズが類似しているときに、コンテンツ間の類似度が閾値より高いとすることができる。
【００３７】
不正判定部１３６は、類似コンテンツ検出部１３４にて検出された類似コンテンツ情報に基づいて、コンテンツの不正を判定する。一例として、不正判定部１３６は、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。
【００３８】
不適切コンテンツ出力部１４０は、不正判定部１３６における不正判定結果をうけて、不適切なコンテンツを出力する。一例として、不適切コンテンツ出力部１４０は、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。例として、不適切コンテンツ出力部１４０は、不正判定部１３６で不適切なコンテンツと判定された不適切なコンテンツをリストアップした不適切コンテンツリストを作成し、出力するリスト作成部（不図示）を含むこともできる。
【００３９】
上記不適切コンテンツ検出装置１００の各種ユニット（コンテンツ入力受付部１１０〜不適切コンテンツ出力部１４０）を各種機能として実現させるためのコンピュータプログラムは、上述のＣＰＵが利用するメモリ（不図示）に記憶され、ＣＰＵにより実行される。
【００４０】
本実施形態のコンピュータプログラムは、コンピュータに、投稿されたコンテンツから著作権的に不適切なコンテンツを検出する不適切コンテンツ検出装置１００を実現させるためのコンピュータプログラムであって、コンピュータに、投稿されたコンテンツの入力を受け付ける手順と、投稿されたコンテンツからその特徴量を抽出する手順、複数の投稿されたコンテンツの特徴量を相互に照合して、特徴量の相互の類似度を算出する手順、算出された類似度に基づいて、相互に類似するコンテンツ群を検出する手順、検出された類似するコンテンツ群に基づき、著作権的に不適切なコンテンツを判定する手順、を実行させるように記述されている。
【００４１】
コンピュータプログラムは、たとえば、コンピュータ（ＣＰＵ）が読取可能な記録媒体（メモリ）に格納される。例として、記録媒体は、ＰＲＯＭ（Programmable Read Only Memory）、ハードディスク、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ＦＤなどである。
【００４２】
このように構成された本実施形態の不適切コンテンツ検出装置１００の動作について、以下に説明する。図２は、本実施形態の不適切コンテンツ検出装置１００の動作の一例を示すフローチャートである。以下、図１および図２を用いて説明する。
【００４３】
はじめに、コンテンツ入力受付部１１０により利用者により投稿された複数のコンテンツの入力が受け付けられる（Ｓ１１）。コンテンツは、逐次入力されてもよいし、複数まとめて入力されてもよい。受け付けられたコンテンツがコンテンツ記憶部１２０に記憶され（Ｓ１３）、特徴量抽出部１３２が、コンテンツ入力受付部１１０が受け付けた各コンテンツから類似度算出のための特徴量を抽出する（Ｓ１５）。
【００４４】
そして、類似コンテンツ検出部１３４が、受け付けたコンテンツの中から順次コンテンツ対を選択し、（Ｓ１７）、選択されたコンテンツ対の類似度を算出し（Ｓ１９）、類似度が予め定めた閾値以上であかる否かを判定する（Ｓ２１）。類似度が予め定めた閾値以上であった場合に（Ｓ２１のＹｅｓ）、類似コンテンツとして検出し、検出されたコンテンツ対を相互類似コンテンツ情報として一時記憶部に記憶する（Ｓ２３）。
【００４５】
ステップＳ１７からステップＳ２３までを、入力コンテンツのすべての組合せに対して行い（Ｓ２５）、それぞれ相互に類似するコンテンツを検出する。
【００４６】
そして、すべての組み合わせについて類似コンテンツの検出が終了したとき（Ｓ２５のＹｅｓ）、不正判定部１３６により、ステップＳ２３で記憶した相互類似コンテンツ情報に基づいて不正判定が行われる（Ｓ２７）。一例として、特定の相互に類似するコンテンツが予め定められた件数以上検出された場合、それらの類似するコンテンツ群に対して不正と判定する。一例として、予め定められた件数が２件の場合、ステップＳ２３にて記憶されたコンテンツはすべて、不適切コンテンツと判定される。予め定められた件数がＮ件の場合、ステップＳ２１で検出されたコンテンツ対の接続関係に基づき、Ｎ件以上の相互接続を検出して検出されたコンテンツを不適切コンテンツと判定する。
【００４７】
そして、不適切コンテンツ出力部１４０により、不正と判定されたコンテンツの出力が行われる（Ｓ２９）。
【００４８】
以上説明したように、本実施形態の不適切コンテンツ検出装置１００によれば、不適切なコンテンツをみつけるためのベースとなるサンプルコンテンツ（辞書データ）を用意することなく、著作権的に不適切なコンテンツを効率よく自動的に検出するという、発明の目的を達成することができる。
また、著作権的な適切度の判定を効率よく行うことができる。そして、掲示板などに不適切なコンテンツが公開されるのを防ぐことが可能となる。
【００４９】
権利侵害をしている不適切コンテンツでは、テレビ録画、雑誌、ＣＤ、ＤＶＤなどのオリジナルコンテンツが存在し、オリジナルコンテンツを一部またはそのまま投稿したり、他者が投稿したコンテンツをダウンロードしてそのまま転用したりするという特性をもつため重複が多い。投稿されたコンテンツ間で類似度評価を行い、投稿された動画、静止画、音響の一致または重複を検知し、重複の多いコンテンツを著作権的に不適切なコンテンツとして検出することにより、不適切コンテンツのサンプルを予め用意することなく、著作権的な適切度の判定および不適切なコンテンツの検出ができるようになり、本発明の目的を達成することができる。
【００５０】
（第二の実施の形態）
図３は、本発明の実施の形態に係る不適切コンテンツ検出装置２００の構成を示す機能ブロック図である。本実施形態の不適切コンテンツ検出装置２００は、上記実施形態の不適切コンテンツ検出装置１００とは、コンテンツを所定の条件で選択し、選択されたコンテンツについて類似度評価を行う点で相違する。
【００５１】
本実施形態の不適切コンテンツ検出装置２００において、類似コンテンツ検出部２３４は、所定の期間内に投稿されたコンテンツのうち、所定数以上の投稿されたコンテンツ間で相互に類似度が閾値より高いコンテンツを相互に類似するコンテンツ群として検出し、不正判定部２３６は、検出された類似するコンテンツ群を不適切なコンテンツと判定する。
【００５２】
また、本実施形態の不適切コンテンツ検出装置２００において、類似コンテンツ検出部２３４は、複数の投稿されたコンテンツから、所定の基準でコンテンツを選択した後に、選択されたコンテンツ間で相互に類似度を算出し、所定数以上のコンテンツ間で、相互に類似度が閾値より高いコンテンツ群を検出し、不正判定部２３６は、検出された前記コンテンツ群を、不適切なコンテンツと判定する。本実施形態の不適切コンテンツ検出装置２００は、たとえば、所定期間に投稿されたコンテンツについて類似度評価を行う。
【００５３】
ここで、所定の基準とは、コンテンツが投稿された日時、コンテンツを投稿した投稿者の国籍、投稿されたコンテンツのサイズなどを含む。これらの情報は、コンテンツ投稿記録などに記録されており、コンテンツ投稿記録、すなわち、投稿時刻、投稿者の国籍、コンテンツのサイズなどに基づいてコンテンツを選択し、選択されたコンテンツ間で類似度を算出し、不適切コンテンツを検出することができる。
【００５４】
本実施形態の不適切コンテンツ検出装置２００は、特徴量抽出部２３２によって抽出された特徴量を記憶する特徴量蓄積部２５２と、特徴量蓄積部２５２から、所定の基準で選択されたコンテンツの特徴量を取得する蓄積特徴量選択部２５４と、を備え、類似コンテンツ検出部２３４は、蓄積特徴量選択部２５４によって選択されたコンテンツの特徴量の相互の類似度を算出し、相互に類似するコンテンツを検出する。
【００５５】
さらに、本実施形態の不適切コンテンツ検出装置２００において、特徴量蓄積部２５２は、投稿毎に投稿されたコンテンツの特徴量を蓄積しておき、類似コンテンツ検出部２３４は、新たに投稿された新規コンテンツと複数の蓄積されたコンテンツの特徴量に基づいて、複数の蓄積されたコンテンツ毎に、新規コンテンツとの類似度を算出し、新規コンテンツとの類似度が閾値より高いコンテンツが存在した場合に、新規コンテンツおよび、新規コンテンツとの類似度が閾値より高いコンテンツのコンテンツ群を検出し、不正判定部２３６は、検出されたコンテンツ群を不適切なコンテンツと判定する。
【００５６】
詳細には、本実施形態の不適切コンテンツ検出装置２００は、上記実施形態の不適切コンテンツ検出装置１００と同じコンテンツ入力受付部１１０と、コンテンツ記憶部１２０と、不適切コンテンツ出力部１４０と、に加え、特徴量抽出部２３２と、類似コンテンツ検出部２３４と、不正判定部２３６と、特徴量蓄積部２５２と、蓄積特徴量選択部２５４と、を備えている。
【００５７】
特徴量抽出部２３２は、コンテンツ間の一致または類似度を効果的に算出するための特徴量をコンテンツから抽出する。抽出された特徴量は特徴量蓄積部２５２に蓄積される。特徴量抽出部２３２は、たとえば、コンテンツが映像情報の場合、映像中の各フレームで観測される色情報のヒストグラムを抽出するように動作してもよいし、音楽情報の場合、各時刻における周波数成分を抽出するように動作してもよい。
【００５８】
一例として、特徴量抽出部２３２は、上記実施形態で記載した特許文献２に記載された画像インデックス生成や、特許文献３に記載された、代表レイアウト特徴量抽出のような手順で特徴量の抽出を行う。特徴量抽出に関しては、画像や映像または音響情報などのコンテンツの相互の類似度または一致度が算出できれば、特許文献２や特許文献３に記載された特徴量でなくてもかまわない。
【００５９】
特徴量蓄積部２５２は、特徴量抽出部２３２にて抽出された特徴量を蓄積する。一例として、ハードディスクやフラッシュメモリなどの記憶装置があり、専用の蓄積装置であっても、他の蓄積装置との兼用であってもよい。
【００６０】
蓄積特徴量選択部２５４は、特徴量蓄積部２５２から類似度を算出すべき特徴量を選択する。一例として、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。一例として、蓄積特徴量選択部２５４は、特徴量蓄積部２５２に蓄積された特徴量のうち、予め定められた期間に投稿されたコンテンツの特徴量を、投稿記録の投稿日時に基づき選択する。このほか、投稿記録に記録されているコンテンツの投稿者や投稿者の国籍、投稿されたコンテンツのサイズなどに応じて、類似度評価するコンテンツを選択してもかまわない。
【００６１】
類似コンテンツ検出部２３４は、受け付けたコンテンツの特徴量または蓄積されたコンテンツの特徴量を相互に照合することで、相互の類似度を算出し、類似度に基づいて類似コンテンツを検出する。一例として、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。類似コンテンツグループ（コンテンツ群）検出に際しては、上記実施形態で述べたようにすべての特徴量対の類似度を総当り的にもとめる方法（図２のステップＳ１７〜Ｓ２５）を用いてもよいし、まず、第一に、新規に投稿された入力コンテンツの特徴量と、蓄積された特徴量との類似度の算出を行い、類似コンテンツが検出されたときのみ、相互に類似度算出を行うという方法を用いてもよい。
【００６２】
すなわち、本実施形態の不適切コンテンツ検出装置２００において、特徴量蓄積部２５２は、これまでに投稿されたコンテンツの特徴量を蓄積しておき、類似コンテンツ検出部２３４は、新たに投稿された新規コンテンツと複数の蓄積されたコンテンツの特徴量に基づいて、複数の蓄積されたコンテンツ毎に、新規コンテンツとの類似度を算出し、新規コンテンツとの類似度が閾値より高いコンテンツが存在した場合に、蓄積されたコンテンツに新規コンテンツを加えて、相互の類似度を算出し、所定数以上のコンテンツ間で相互に類似度が閾値より高いコンテンツ群を検出してもよい。
すなわち、新規コンテンツとの類似度が閾値より高いコンテンツが存在しない場合は、コンテンツ群の検出を行わなくてよい。
【００６３】
また、このとき、類似コンテンツ検出部２３４は、新規コンテンツと比較する蓄積されたコンテンツとして、所定期間に投稿されたコンテンツのみを用いてもよい。
【００６４】
不正判定部２３６は、類似コンテンツ検出部２３４にて検出された類似コンテンツ情報をもとにコンテンツの不正を判定する。本実施形態において、不正判定部２３６は、類似コンテンツ検出部２３４にて検出された類似コンテンツグループが予め定めた数以上であった場合、検出された類似コンテンツグループを不適切なコンテンツと判定する。一例として、不正判定部２３６は、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。
【００６５】
上記不適切コンテンツ検出装置２００の各種ユニット（コンテンツ入力受付部１１０〜不適切コンテンツ出力部１４０、特徴量抽出部２３２〜蓄積特徴量選択部２５４）を各種機能として実現させるためのコンピュータプログラムは、上述のＣＰＵが利用するメモリ（不図示）に記憶され、ＣＰＵにより実行される。
【００６６】
本実施形態のコンピュータプログラムは、上記実施形態のコンピュータプログラムの手順に加え、コンピュータに、所定の期間内に投稿されたコンテンツのうち、所定数以上の投稿されたコンテンツ間で相互に類似度が閾値より高いコンテンツを相互に類似するコンテンツ群として検出する手順をさらに実行させるように記述されている。
【００６７】
このように構成された本実施形態の不適切コンテンツ検出装置２００の動作について以下に説明する。図４は、本実施形態の不適切コンテンツ検出装置２００の動作の一例を示すフローチャートである。以下、図３および図４を用いて説明する。
【００６８】
なお、本実施形態の不適切コンテンツ検出装置２００の動作において、図２のフローチャートのステップＳ１１、Ｓ１３、およびＳ２９は同じであるので、説明は省略する。
【００６９】
特徴量抽出部２３２により、ステップＳ１１で入力された各コンテンツから類似度算出のための特徴量が抽出される（Ｓ３１）。抽出された特徴量は、特徴量蓄積部２５２に蓄積される。そして、蓄積特徴量選択部２５４により、予め特徴量蓄積部２５２に記憶された特徴量から相互類似度評価用に利用すべき参照コンテンツの特徴量が選択される（Ｓ３３）。たとえば、上述したように投稿記録に基づき、所定期間に投稿されたコンテンツの特徴量を選択する。
【００７０】
そして、類似コンテンツ検出部２３４により、ステップＳ３３で選択された特徴量に、新規に入力されたコンテンツの特徴量を加えた特徴量間で相互に類似度を算出して、類似コンテンツグループの検出を行う（Ｓ３５）。
【００７１】
ステップＳ３５にて検出された類似コンテンツグループが予め定めた数以上であった場合、不正判定部２３６により、検出された類似コンテンツグループが不正コンテンツと判定され（Ｓ３７）、不適切コンテンツ出力部１４０により不正と判定したコンテンツが出力される（Ｓ２９）。
【００７２】
以上説明したように、本実施形態の不適切コンテンツ検出装置２００によれば、特定期間内に数多く重複して投稿された、オリジナルコンテンツの存在する、コピーとおぼしき投稿を検出することができ、不適切なコンテンツをみつけるためのベースとなるサンプルコンテンツ（辞書データ）を用意することなく、著作権的に不適切なコンテンツを自動的に検出するという、発明の目的を達成することができる。
【００７３】
（第三の実施の形態）
図５は、本発明の実施の形態に係る不適切コンテンツ検出装置３００の構成を示す機能ブロック図である。本実施形態の不適切コンテンツ検出装置３００は、上記実施形態の不適切コンテンツ検出装置１００および不適切コンテンツ検出装置２００とは、検出された不適切なコンテンツを辞書として登録し、辞書との照合を行うことで不適切なコンテンツの検出を行う点、および不適切なコンテンツが検出された場合、管理者などにアラーム提示する点で相違する。
【００７４】
本実施形態の不適切コンテンツ検出装置３００は、不正判定部２３６で不適切と判定された不適切なコンテンツまたは該不適切なコンテンツの特徴量を不適切コンテンツ辞書データとして不適切コンテンツ特徴量蓄積部３１２に記憶する不適切コンテンツ特徴量登録部３１４と、新規に投稿されたコンテンツに対して、不適切コンテンツ辞書データと照合を行うことにより不適切なコンテンツの検出を行う類似コンテンツ検出部３３４と、をさらに備える。
【００７５】
詳細には、本実施形態の不適切コンテンツ検出装置３００は、上記実施形態の不適切コンテンツ検出装置１００と同じコンテンツ入力受付部１１０と、コンテンツ記憶部１２０と、不適切コンテンツ出力部１４０と、ならびに、上記実施形態の不適切コンテンツ検出装置２００と同じ特徴量抽出部２３２と、不正判定部２３６と、特徴量蓄積部２５２と、に加え、類似コンテンツ検出部３３４と、不適切コンテンツ特徴量蓄積部３１２と、不適切コンテンツ特徴量登録部３１４と、アラーム提示部３２０と、を備えている。
【００７６】
不適切コンテンツ特徴量蓄積部３１２は、既に不適切なコンテンツであると判定されたコンテンツの特徴量を蓄積する。一例として、ハードディスクやフラッシュメモリなどの記憶装置があり、専用の蓄積装置であっても、他の蓄積装置との兼用であってもよい。
【００７７】
不適切コンテンツ特徴量登録部３１４は、不適切なコンテンツであると判定されたコンテンツの特徴量を不適切コンテンツ特徴量蓄積部３１２に登録する。一例として予め定められたルールにて動作するプログラムを搭載したＣＰＵである。たとえば、不正コンテンツと判定されたコンテンツの特徴量を特徴量蓄積部２５２から、不適切コンテンツ特徴量蓄積部３１２に変更するという動作を行う。また、放送局などから提供された著作物コンテンツの特徴量を新たに登録するように動作してもよい。
【００７８】
類似コンテンツ検出部３３４は、入力されたコンテンツの特徴量または蓄積されたコンテンツの特徴量の相互の類似度を算出し、類似コンテンツを検出する。一例として、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。特徴量間の類似度の算出方法として、たとえば上記実施形態にて記述した特許文献３に記載された、代表レイアウト特徴量に基づく類似コンテンツ検出を用いる。コンテンツの一致に際しては、映像中の一部または全部のフレームの一致または類似や、静止画像の一部または全体の一致または類似、音響の一部または全体の一致または類似を検出できれば、特許文献３以外の方法で検出を行ってもよい。
【００７９】
アラーム提示部３２０は、不適切なコンテンツが検出された際にコンテンツ管理者などに対してアラームを提示する。一例として、アラーム用のテキストまたは映像を提示するためのモニタ、音響を出力するためのスピーカなどが上げられる。
【００８０】
上記不適切コンテンツ検出装置３００の各種ユニット（コンテンツ入力受付部１１０〜不適切コンテンツ出力部１４０、特徴量抽出部２３２〜特徴量蓄積部２５２、不適切コンテンツ特徴量蓄積部３１２〜類似コンテンツ検出部３３４）を各種機能として実現させるためのコンピュータプログラムは、上述のＣＰＵが利用するメモリ（不図示）に記憶され、ＣＰＵにより実行される。
【００８１】
本実施形態のコンピュータプログラムは、上記実施形態のコンピュータプログラムの手順に加え、コンピュータに、不適切なコンテンツを判定する手順で不適切と判定された不適切なコンテンツまたは該不適切なコンテンツの特徴量を不適切コンテンツ辞書データとして不適切コンテンツ特徴量蓄積部３１２に記憶する手順、新規に投稿されたコンテンツに対して、不適切コンテンツ辞書データと照合を行うことにより不適切なコンテンツの検出を行う手順、をさらに実行させるように記述されている。
【００８２】
このように構成された本実施形態の不適切コンテンツ検出装置３００の動作について、以下に説明する。図６は、本実施形態の不適切コンテンツ検出装置３００の動作の一例を示すフローチャートである。以下、図５および図６を用いて説明する。
【００８３】
なお、本実施形態の不適切コンテンツ検出装置３００の動作において、図２のフローチャートのステップＳ１１およびＳ１３は同じであるので、説明は省略する。
【００８４】
特徴量抽出部２３２により、ステップＳ１１で入力された各コンテンツから類似度算出のための特徴量が抽出される（Ｓ４１）。抽出された特徴量は、特徴量蓄積部２５２に蓄積される。そして、類似コンテンツ検出部３３４により、入力コンテンツの特徴量と、不適切コンテンツ特徴量蓄積部３１２に蓄積された各コンテンツの特徴量との間の類似度が算出される（Ｓ４３）。
【００８５】
ステップＳ４３にて算出された類似度が予め定めた閾値以上であった場合（Ｓ４５のＹｅｓ）、不正判定部２３６により、類似度が閾値以上の特徴量の入力コンテンツが不適切なコンテンツと判定され（Ｓ５１）、不適切コンテンツ出力部１４０により不正と判定したコンテンツが出力される（Ｓ５３）。そして、アラーム提示部３２０が、情報管理者に不適切コンテンツ検出のアラームをあげる（Ｓ５５）。そして、不適切コンテンツ特徴量登録部３１４により、新たに不適切コンテンツと判定されたコンテンツの特徴量が不適切コンテンツ特徴量蓄積部３１２に登録される（Ｓ５７）。
【００８６】
ステップＳ４３にて算出されたコンテンツが、不適切コンテンツ特徴量蓄積部３１２に蓄積されたすべてのコンテンツに対して予め定めた閾値未満であった場合（Ｓ４５のＮｏ）は、類似コンテンツ検出部３３４により、入力されたコンテンツの特徴量ならびに、特徴量蓄積部２５２に蓄積された特徴量で特徴量対が選択され（Ｓ６１）、選択された特徴量間の類似度が算出される（Ｓ６３）。そして、類似コンテンツ検出部３３４により、類似度が予め定めた閾値以上か否かが判定される（Ｓ６５）。類似度が予め定めた閾値以上であった場合（Ｓ６５のＹｅｓ）に、類似コンテンツ検出部３３４により類似特徴量対として検出され、類似情報として一時記憶部に記憶する（Ｓ６７）。これを入力コンテンツからの特徴量と、特徴量蓄積部２５２に蓄積された特徴量をあわせたすべての特徴量の組合せに対して行い（Ｓ６９）、それぞれ相互に類似するコンテンツの特徴量を抽出する。
【００８７】
そして、すべての組み合わせについて類似判定が終了したとき（Ｓ６９のＹｅｓ）、不正判定部２３６により、特徴量の間の類似情報から不正判定が行われる（Ｓ７１）。一例として、特定の相互に類似するコンテンツの特徴量が、入力コンテンツの特徴量が加わることにより、予め定められた件数以上検出された場合、入力コンテンツならびに、入力コンテンツと相互に類似する特徴量蓄積部２５２に蓄積された特徴量に対応するコンテンツ記憶部１２０に蓄積されたコンテンツを不正と判定する。一例として、予め定められた件数が２件の場合、ステップＳ６５にて新たに検出された入力コンテンツを含むコンテンツはすべて、不適切コンテンツと判定される。予め定められた件数がＮ件の場合、ステップＳ６５で検出されたコンテンツの特徴量対の接続関係に基づき、Ｎ件以上の相互接続を検出して検出された特徴量に対応するコンテンツが不適切コンテンツと判定される。
【００８８】
そして、不適切コンテンツ出力部１４０により、不正と判定されたコンテンツの出力が行われる（Ｓ７３）。さらに、アラーム提示部３２０が、情報管理者に不適切コンテンツ検出のアラームをあげる（Ｓ７５）。そして、不適切コンテンツ特徴量登録部３１４により、新たに不適切コンテンツと判定されたコンテンツの特徴量が、特徴量蓄積部２５２から消去され、入力コンテンツの特徴量ならびに新たに不適切コンテンツと判定したコンテンツの特徴量が不適切コンテンツ特徴量蓄積部３１２に登録される（Ｓ７７）。
【００８９】
なお、不適切コンテンツ検出装置３００において、所定期間または所定の利用者または所定のコンテンツに対する相互の類似度の算出により不適切コンテンツ辞書データを生成する生成部（類似コンテンツ検出部３３４、不正判定部２３６）をさらに備えてもよい。これにより、所定期間または所定の利用者など、特定のグループ毎に辞書データを生成することができる。
また、この生成部は、相互の類似度の算出による不適切コンテンツ検出を定期的に行い、不適切コンテンツ辞書データの生成または更新を行ってもよい。
【００９０】
また、特徴量蓄積部２５２に蓄積する特徴量を、特定期間において投稿されたコンテンツの特徴量に制限するようにすることにより、特定期間内に投稿されたコンテンツの相互類似度を算出するようにすることもできる。また、不適切コンテンツ特徴量蓄積部３１２に、既に不適切コンテンツとして検出対象である、著作権保護コンテンツの特徴量を合わせて登録してもよい。
【００９１】
また、不適切コンテンツ特徴量登録部３１４は、特徴量蓄積部２５２からの消去や不適切コンテンツ特徴量蓄積部３１２への登録を自動的に行う代わりに、コンテンツ管理者等が確認をしてから登録を行うように変更してもよい。
【００９２】
すなわち、上記不適切コンテンツ検出装置３００において、不適切コンテンツ特徴量登録部３１４によって生成された不適切コンテンツ辞書データを情報管理者に提示し、該情報管理者が不適切コンテンツ辞書データを確認し、保持する不適切コンテンツ辞書データの指示を受け付け、指示された不適切コンテンツ辞書データのみを不適切コンテンツ特徴量蓄積部３１２に記憶してもよい。
【００９３】
相互類似性を判定する類似度の閾値や類似コンテンツは判定する場合の件数、または使用する特徴量をジャンルや時期、投稿者などの状況に応じて適応的に変更したり、情報管理者が手動で調整したりできるような機構があってもかまわない。
【００９４】
また、不適切コンテンツ検出装置３００は、所定の利用者または所定のコンテンツを予め登録する登録部（不図示）を備えてもよい。このとき、不正判定部２３６は、所定の利用者により投稿されたコンテンツまたは所定のコンテンツは不適切なコンテンツと判定しなくてもよい。
【００９５】
以上説明したように、本実施形態の不適切コンテンツ検出装置３００によれば、所定の期間内に投稿されたコンテンツ内で、類似度の高いコンテンツが所定数以上検出されたときにそれらのコンテンツ群を不適切コンテンツとするので、特定期間内に数多く重複して投稿された、オリジナルコンテンツの存在する、コピーとおぼしき投稿を検出することができ、不適切なコンテンツをみつけるためのベースとなるサンプルコンテンツ（辞書データ）を用意することなく、著作権的に不適切なコンテンツを自動的に効率よく検出するという、発明の目的を達成することができる。
【００９６】
権利侵害をしている不適切コンテンツでは、テレビ録画、雑誌、ＣＤ、ＤＶＤなどのオリジナルコンテンツが存在し、オリジナルコンテンツを一部またはそのまま投稿したり、他者が投稿したコンテンツをダウンロードしてそのまま転用したり、短期間に連続的に投稿されるという特性をもつため短期間での重複が多い。本実施形態によれば、予め定めた特定期間に投稿されたコンテンツ間で類似度評価を行い、投稿された動画、静止画、音響の一致または重複を検知し、予め定めた数以上の重複の多いコンテンツを著作権的に不適切なコンテンツとして検出することにより、不適切コンテンツのサンプルを予め用意することなく、不適切なコンテンツを検出できるようになり、本発明の目的を達成することができる。
【００９７】
なお、本実施形態では、不適切コンテンツ特徴量蓄積部３１２に不適切なコンテンツの特徴量のみを蓄積し、辞書データとして準備するので、辞書データとして不正コンテンツのサンプルを用意する場合に比べて、その容量は非常に小さくて済む。
【００９８】
さらに、本実施形態の不適切コンテンツ検出装置３００によれば、不適切なコンテンツが検出されたとき、アラームをあげることで情報管理者の確認を促すことができ、これにより不適切なコンテンツが公開されることを防ぐことができる。
【００９９】
（第四の実施の形態）
図７は、本発明の実施の形態に係る確認サーバ４００の構成を示す機能ブロック図である。
本実施形態の確認サーバ４００は、通信部４１０と、アラーム提示部４２０と、管理者確認および登録部４３０と、蓄積部４４０と、制御部４５０と、を備えている。
【０１００】
通信部４１０は、ネットワーク４０２を通じて端末（不図示）や他の管理装置（不図示）とコンテンツや伝達情報のやり取りを行う。通信部４１０は、一例として、ネットワーク４０２を介した通信を行う専用ボードなどがあげられる。
【０１０１】
アラーム提示部４２０は、コンテンツ管理者に不適切コンテンツ検出のアラーム情報を提示する。アラーム提示部４２０は、一例として、アラーム用のテキストまたは映像を提示するためのモニタ、音情報を出力するためのスピーカなどがあげられる。あるいは、アラーム情報をリストアップしたリストを作成し、印字出力するプリンタなどでもよい。
【０１０２】
管理者確認および登録部４３０は、管理者が、不適切コンテンツ検出結果を確認したり、不適切コンテンツを新たに登録するためのものである。管理者確認および登録部４３０は、一例として結果を表示するモニタとキーボード、タッチパネルなどの入力機器の組合せがあげられる。あるいは、不適切コンテンツ検出結果をリストアップしたリストを作成し、印字出力するプリンタなどでもよい。
【０１０３】
蓄積部４４０は、コンテンツや抽出した特徴量、不適切コンテンツの特徴量などを蓄積する。蓄積部４４０は、一例として、ハードディスクやフラッシュメモリなどの記憶装置があり、専用の蓄積装置であっても、他の蓄積装置との兼用であってもかまわない。
【０１０４】
制御部４５０は、確認サーバ４００の各要素および装置全体を制御するとともに、コンテンツ特徴量の抽出や照合など、不適切コンテンツの抽出を行う。制御部４５０は、一例として、プログラムを搭載したＣＰＵである。
【０１０５】
上記通信部４１０、アラーム提示部４２０、管理者確認および登録部４３０、蓄積部４４０、制御部４５０をあわせて、通信機能、蓄積機能、モニタ機能、入力機能を備えたコンピュータにて構築可能である。
【０１０６】
制御部４５０は、通信部４１０からコンテンツをうけとり入力するコンテンツ入力受付部４５２と、不適切コンテンツ検出結果を出力する不適切コンテンツ出力部４５４と、不適切コンテンツであると判定されたコンテンツの特徴量を不適切コンテンツ特徴量蓄積部４４４に登録する不適切コンテンツ特徴量登録部４５６と、蓄積部４４０に格納されているコンテンツ、特徴量、不適切コンテンツ特徴量を管理する蓄積管理部４６０と、コンテンツの特徴量をコンテンツから抽出する特徴量抽出部４６２と、特徴量間の類似度算出により類似コンテンツを抽出する類似コンテンツ検出部４６４と、類似コンテンツ検出部４６４の検出結果を基に不適切なコンテンツを判定する不正判定部４６６を含む。
【０１０７】
蓄積部４４０は、入力されたコンテンツを蓄積するコンテンツ記憶部４４２と、抽出されたコンテンツ特徴量を蓄積する特徴量蓄積部４４４と、不適切コンテンツと判定されたコンテンツの特徴量を蓄積する不適切コンテンツ特徴量蓄積部４４６と、を含む。
【０１０８】
このように構成された本実施形態の確認サーバ４００の動作については、上記実施形態の不適切コンテンツ検出装置３００と同様であるので説明は省略する。
【０１０９】
本実施形態の確認サーバ４００によれば、上記実施形態の不適切コンテンツ検出装置３００と同様な効果を奏する。
【０１１０】
（第五の実施の形態）
図８は、本発明の実施の形態に係るコンテンツ公開システム５００の構成を示す図である。
本実施形態のコンテンツ公開システム５００は、映像公開サービスを提供するシステムであり、ネットワーク５０２と、ネットワーク５０２上に接続された複数の端末装置５１０と、不適切なコンテンツの検出および確認を行う確認サーバ５２０と、利用者により投稿された公開データを記憶する公開データ記憶部５３０と、公開データを公開するための公開サーバ５４０と、確認サーバ５２０が不適切なコンテンツを検出した際に、公開データ記憶部５３０から公開データを削除する公開データ削除部５５０と、を備える。
【０１１１】
端末装置５１０は、ネットワーク５０２へのアクセス機能を有する端末装置であり、汎用のコンピュータで実現可能である。
ネットワーク５０２は、データの送受信が可能なインターネット等の通信網である。
【０１１２】
確認サーバ５２０は、ネットワーク５０２を解して端末装置５１０から送信された投稿データを受け付けて、公開データ記憶部５３０に格納する機能を有するとともに、公開データ記憶部５３０に記憶された投稿データのうちで、不適切なものを検出して、削除する機能を有するサーバコンピュータである。確認サーバ５２０は、図７の上記実施形態で述べた確認サーバ４００により実現可能である。不適切コンテンツ出力部５２２は、図７における確認サーバ４００の不適切コンテンツ出力部４５４と同様な構成により実現可能である。
【０１１３】
公開サーバ５４０は、公開データ記憶部５３０に記憶されているデータを、ネットワーク５０２を通じて公開する機能を有するサーバコンピュータであり、通常のＷＷＷサーバとしての機能を有するコンピュータにより実現可能である。
【０１１４】
公開データ削除部５５０は、確認サーバ５２０の検出結果を受けて、公開データ記憶部５３０中の公開データを削除する。一例として、予め定められたルールにて動作するプログラムを搭載したＣＰＵである。専用のＣＰＵであってもよいし、他のＣＰＵとの兼用であってもよい。
【０１１５】
以上説明したように本実施形態のコンテンツ公開システム５００によれば、不適切コンテンツが検出された際に、公開データを削除する機構が実現できるため、ネットワーク５０２を介してアクセスしてきた閲覧者は、不適切なデータが削除された公開データ記憶部５３０に格納されたデータのみ入手可能になるので、不適切なデータが一般に公開されることはない。したがって、本実施形態のコンテンツ公開システム５００によれば、特定期間内に数多く重複して投稿された、オリジナルコンテンツの存在する、コピーとおぼしき投稿を検出することができ、不適切なコンテンツをみつけるためのベースとなるサンプルコンテンツ（辞書データ）を用意することなく、著作権的に不適切なコンテンツを自動的に検出するという、発明の目的を達成することができる。
【０１１６】
また、本実施形態のコンテンツ公開システム５００によれば、不適切なコンテンツが検出されたとき、アップロードされたコンテンツを削除することにより、不適切なコンテンツが公開されることを防ぐことができる。
【０１１７】
以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。
【０１１８】
たとえば、上記実施形態のコンテンツ公開システム５００は、投稿されたコンテンツを利用者が閲覧できるように公開するコンテンツ公開システムであって、上記実施形態の不適切コンテンツ検出装置（１００〜３００）によって検出された不適切なコンテンツをシステム管理者に提示する提示部（不図示）と、システム管理者が確認した後、システム管理者から削除指示を受け付ける受付部（不図示）と、削除指示に従い、不適切なコンテンツの削除を行う削除部（不図示）と、を有してもよい。
【０１１９】
また、上記実施形態のコンテンツ公開システム５００は、投稿されたコンテンツを利用者が閲覧できるように公開するコンテンツ公開システムであって、上記実施形態の不適切コンテンツ検出装置（１００〜３００）によって相互に類似していると判定されたコンテンツ数が、所定数より多いか否かを判定する判定部（不図示）と、相互に類似していると判定されたコンテンツ数が所定数より多い場合に、当該コンテンツへの利用者のアクセスを自動的に停止する制御部（不図示）と、を有することもできる。
【実施例１】
【０１２０】
次に、具体的な実施例を用いて本発明を実施するための最良の形態の動作の例を説明する。
図９は、本発明の不適切コンテンツ検出装置の利用のおおまかなイメージを模式的に示した図である。図９において、放送局６１０は、番組１の放送を行う。ユーザＡ、Ｂ、Ｃ、Ｄ、Ｅはそれぞれテレビ放送を端末装置６２０で受信して、番組１を視聴する（ステップＳ１０１、Ｓ１１１、Ｓ１２１、Ｓ１３１、Ｓ１４１）。ユーザＡ、Ｂ、Ｅは番組を録画して、一部を切り出し、動画Ｖ１、Ｖ２、Ｖ５をそれぞれ投稿サイトにアップロードしたとする（ステップＳ１０３、Ｓ１１３、Ｓ１４３）。ユーザＣ、Ｄは利用者所有のハンディカム等で撮影した独自コンテンツ（動画Ｖ３、Ｖ４）をそれぞれ投稿サイトに投稿したとする（ステップＳ１２３、Ｓ１３３）。投稿された動画はそれぞれネットワーク６０２を介して確認サーバ６３０が受け付ける。確認サーバ６３０で投稿された動画はそれぞれ不正判定が行われ、適切なコンテンツのみが公開サーバ６４０にて公開されることとなる。
【０１２１】
図９で、動画Ｖ１、動画Ｖ２、動画Ｖ５は同一のオリジナルコンテンツの一部を切り出したものであり、その中で同一フレームを含む場合、類似映像として検知可能である。一方、動画Ｖ３および動画Ｖ４はまったくの独自コンテンツであるため、他のどの動画とも類似しない（図中、動画Ｖ３と他の動画との非類似関係を点線の矢印で示し、動画Ｖ４の非類似関係については図示していない。）。この結果、図１０のようになり、相互類似が確認された、動画Ｖ１、Ｖ２、Ｖ５（図中、実線の矢印で類似関係を示す。）は、不正コンテンツ候補として判定される。
【０１２２】
図１１は代表的な類似のパタンをいくつか示している。
図１１ではオリジナル動画Ｖｏの特定の部分を切り出している。図１１中横軸は、時間軸を示しており、動画Ｖａ、Ｖｂ、Ｖｃ、Ｖｄはオリジナルコンテンツの部分フレームを切り出した形（クリップ）になっている。
【０１２３】
たとえば、動画ＶａのＶａ１と動画ＶｂのＶｂ１は同一フレームを含み、動画ＶｂのＶｂ２と動画ＶｃのＶｃ１は同一フレームを含み、動画ＶｃのＶｃ２と動画ＶｄのＶｄ１は同一フレームを含む。このように同一フレームを含むため、動画ＶａとＶｂ、ＶｂとＶｃ、ＶｃとＶｄは相互に類似しているといえる。動画ＶａとＶｄは同一フレームが含まれておらず相互に類似しているとはいえないが、動画Ｖａ、Ｖｂ、Ｖｃ、Ｖｄの相互の接続関係から、動画ＶａとＶｂとＶｃとＶｄで相互に類似していると判定可能である。
【０１２４】
図１２は静止画の場合の例である。たとえば、オリジナル書籍７０２などをスキャナ等で取り込んだ後に投稿した場合（ステップＳ２０２）、図１２のように多少は異なるが相互に類似したコンテンツ（画像７１０、７２０、７３０）が多数投稿される。なお、一致する画像もかなり含まれる。たとえば、画像７１０はオリジナル書籍７０２の画像７１２が傾いて含まれている。また、画像７２０はオリジナル書籍７０２の画像７２２の周囲にのりしろ７２４が含まれている。さらに、画像７３０は、オリジナル書籍７０２の画像の一部が切り取られた画像７３２が含まれている。
【０１２５】
これらのような画像も、特許文献２、特許文献３を初めとする各類似度照合エンジンにより、相互の類似コンテンツとして検出することは可能であるため、このような不正コンテンツを削除することが可能になる。音楽データほかでも同様である。
【０１２６】
この出願は、２００７年１０月１９日に出願された日本出願特願２００７−２７２９６８号を基礎とする優先権を主張し、その開示の全てをここに取り込む。
【０１２７】
以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。【Technical field】
[0001]
  The present invention relates to an inappropriate content detection method, an inappropriate content detection apparatus, a computer program thereof, and a content publication system in a posting site.
[Background]
[0002]
  One of the services deployed on the Internet is a bulletin board service or a video or still image posting service. In these services such as a bulletin board, an unspecified number of users can upload data (contents) such as images, sounds, and texts, and can freely view content uploaded by others.
[0003]
  If posting to such a bulletin board is left to the user of the posting, there is a risk that a large amount of so-called illegal contents and illegal contents that copy or imitate the copyrighted material of another person will be distributed. Currently, fraudulent content is operated in a framework in which the service provider periodically monitors or the copyright owner claims to the service provider and deletes the content. When it becomes, it is difficult to check all of them manually.
[0004]
  An example of a system for detecting inappropriate data from posted content is described in Patent Document 1. The system described in this document stores a sample of content that serves as a base for finding inappropriate content (images, audio, text, etc.) in advance, and newly submits content (images, audio, text, etc.) Each time, the similarity with the sample serving as a base for finding the stored inappropriate content is evaluated, and when it is evaluated as similar, it is extracted as inappropriate content.
[Patent Document 1]
JP 2006-293455 A
[Patent Document 2]
JP-A-8-180176
[Patent Document 3]
JP 2000-259832 A
[Patent Document 4]
JP 2000-339474 A
DISCLOSURE OF THE INVENTION
[0005]
  However, the method described in Patent Document 1 is based on a similarity to a content sample serving as a base for finding inappropriate content. In order to extract inappropriate content, inappropriate content is used. There was room for improvement in that it was necessary to prepare a sample of the content that would serve as the basis for finding this.
[0006]
  In particular, it detects so-called illegal content or illegal content that is a copy or imitation of another person's copyrighted material for data (content) such as images, sounds, and texts posted by an unspecified number of users on services such as bulletin boards. Therefore, it is difficult to prepare illegal image data as a sample in advance and deal with all illegal image data uploaded to the Internet such as a bulletin board every day.
[0007]
  As various kinds of contents are generated and distributed every day, it is extremely difficult to prepare all contents that should be protected by copyright as a sample of contents as a base of inappropriate contents in advance. Since the service body such as the bulletin board rarely creates the content, it is difficult to obtain the content that should be protected in advance, and the range of the content posted on the bulletin board is very diverse. Therefore, it can be said that it is very difficult to prepare a sample as a base for finding all the inappropriate content for every post.
[0008]
  An object of the present invention is to provide an inappropriate content detection method, an inappropriate content detection device, a computer program therefor, and a content publication system that solve the above-mentioned difficulty of preparing illegal content samples and dictionary data. There is.
[0009]
  The first inappropriate content detection method of the present invention includes:Inappropriate content detection device
  Accepts content posted by individual users,
  Using the plurality of accepted posted contents, the mutual similarity of the plurality of posted contents is calculated,
  Based on the mutual similarity, it is determined whether or not the posted content is copyrightable.
[0010]
  The second inappropriate content detection method of the present invention is:Inappropriate content detection device
  Accepts content posted by individual users,
  Using a plurality of received posted contents, whether or not the plurality of posted contents are similar to each other based on a mutual similarity calculated between the plurality of posted contents If they are similar, a group of contents that are similar to each other are detected as inappropriate copyrighted contents.
[0011]
  The inappropriate content detection apparatus of the present invention includes a content receiving unit that receives input of content posted from individual users,
  Similar content detection means for calculating a similarity between a plurality of the posted contents using the received plurality of received contents, and detecting a group of similar contents based on the similarity,
  And fraud determination means for determining copyright-inappropriate content based on the detected similar content group.
[0012]
  The computer program of the present invention is a computer program for causing a computer to realize an inappropriate content detection apparatus that detects content inappropriately copyrighted from content posted by individual users.
  In the computer,
  A procedure to accept input of content posted by individual users,
  Using the received plurality of posted content, calculating a plurality of similarities of the posted content, and detecting a content group similar to each other based on the similarity,
  A procedure for determining a copyright inappropriate content based on the detected similar content group is executed.
[0013]
  A first content publishing system of the present invention is a content publishing system that publishes posted content so that a user can view it,
  Presenting means for presenting the inappropriate content detected by the inappropriate content detection device to a system administrator;
  Receiving means for receiving a delete instruction from the system administrator after the system administrator confirms;
  Deletion means for deleting the inappropriate content in accordance with the deletion instruction.
[0014]
  A second content publishing system of the present invention is a content publishing system that publishes posted content so that a user can view it.
  Determination means for determining whether or not the number of contents determined to be similar to each other by the inappropriate content detection apparatus is greater than a predetermined number;
  And control means for automatically stopping the user's access to the content when the number of content determined to be similar to each other is greater than a predetermined number.
[0015]
  It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.
[0016]
  The various components of the present invention do not necessarily have to be independent of each other. A plurality of components are formed as a single member, and a single component is formed of a plurality of members. It may be that a certain component is a part of another component, a part of a certain component overlaps with a part of another component, or the like.
[0017]
  Moreover, although the plurality of procedures are described in order in the inappropriate content detection method and the computer program of the present invention, the order of description does not limit the order in which the plurality of procedures are executed. For this reason, when the inappropriate content detection method and computer program of the present invention are implemented, the order of the plurality of procedures can be changed within a range that does not hinder the contents.
[0018]
  Further, the plurality of procedures of the inappropriate content detection method and the computer program of the present invention are not limited to being executed at different timings. For this reason, another procedure may occur during the execution of a certain procedure, or some or all of the execution timing of a certain procedure and the execution timing of another procedure may overlap.
[0019]
  According to the present invention, an inappropriate content detection method, an inappropriate content detection device, and a computer program therefor can efficiently detect inappropriate content from posted content without preparing illegal content samples and dictionary data in advance. , And a content publishing system.
[Brief description of the drawings]
[0020]
  The above-described object and other objects, features, and advantages will become more apparent from the preferred embodiments described below and the accompanying drawings.
[0021]
FIG. 1 is a block diagram showing a configuration of an inappropriate content detection apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart showing an example of a procedure for detecting inappropriate content in the embodiment of the present invention.
FIG. 3 is a block diagram showing a configuration of an inappropriate content detection apparatus according to an embodiment of the present invention.
FIG. 4 is a flowchart showing an example of a procedure for detecting inappropriate content according to the embodiment of the present invention.
FIG. 5 is a block diagram showing a configuration of an inappropriate content detection apparatus according to an embodiment of the present invention.
FIG. 6 is a flowchart showing an example of a procedure for detecting inappropriate content according to the embodiment of the present invention.
FIG. 7 is a block diagram showing a configuration of a confirmation server according to the embodiment of the present invention.
FIG. 8 is a diagram showing a configuration of a content publishing system according to an embodiment of the present invention.
FIG. 9 is a diagram schematically showing an example of the inappropriate content detection method of the present invention.
10 is a diagram schematically showing a similar situation of the content posted in FIG. 9. FIG.
FIG. 11 is a diagram schematically showing a similar example of a moving image.
FIG. 12 is a diagram schematically illustrating a similar example of a still image.
BEST MODE FOR CARRYING OUT THE INVENTION
[0022]
  Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same reference numerals are given to the same components, and the description will be omitted as appropriate.
[0023]
(First embodiment)
  FIG. 1 is a functional block diagram showing a configuration of an inappropriate content detection apparatus 100 according to an embodiment of the present invention. The inappropriate content detection apparatus 100 according to the present embodiment calculates the degree of similarity between a plurality of posted contents, and determines whether the posted contents are copyrightally appropriate based on the degree of similarity between them. Judgment is performed. Further, in the present embodiment, the inappropriate content detection apparatus 100 determines whether or not the plurality of posted contents are similar to each other based on the similarity between the plurality of posted contents. In the case where the contents are similar to each other, a group of contents that are similar to each other is detected as inappropriate copyright.
[0024]
  Here, the similarity is a scale indicating whether or not at least two pieces of content are similar to each other. For example, as described in paragraphs 0010 and 0011 of Patent Document 1 or Patent Document 4 It can be calculated by a known technique.
[0025]
  The inappropriate content detection apparatus 100 according to the present embodiment calculates a similarity between a content input receiving unit 110 that receives input of posted content and a plurality of posted content, and is similar to each other based on the similarity. A similar content detection unit 134 that detects a content group to be detected, and a fraud determination unit 136 that determines content that is inappropriate for copyright based on the detected similar content group.
[0026]
  The inappropriate content detection apparatus 100 according to the present embodiment further includes a feature amount extraction unit 132 that extracts a feature amount from posted content, and the similar content detection unit 134 includes feature amounts of a plurality of posted content items. Are compared with each other, the mutual similarity of the feature quantities is calculated, and content groups similar to each other are detected.
[0027]
  More specifically, the inappropriate content detection apparatus 100 includes a content input reception unit 110, a content storage unit 120, a feature amount extraction unit 132, a similar content detection unit 134, a fraud determination unit 136, and an inappropriate content output. Part 140.
[0028]
  In addition, in the following each figure, the structure of the part which is not related to the essence of this invention is abbreviate | omitted.
  In addition, each component of the inappropriate content detection apparatus 100 includes an arbitrary computer CPU, memory, a program that realizes the components shown in the figure loaded in the memory, a storage unit such as a hard disk that stores the program, and a network connection. It is realized by any combination of hardware and software, with a focus on the interface. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus. Each drawing described below shows a functional unit block, not a hardware unit configuration.
[0029]
  The content input reception unit 110 receives input of content posted one after another from individual users. As an example, the content input receiving unit 110 is a CPU (Central Processing Unit) equipped with a program that captures a message posted by a user via the Internet and extracts a content part. The input content is, for example, an image (moving image, still image), sound (sound, music), text data, or the like posted on a bulletin board or the like, and should be copyrighted.
[0030]
  The content to be input may be realized by directly capturing posts from the user, or once stored in an external server (not shown) or the like and then input sequentially or collectively. May be. The form of posting may be a form of uploading using a WWW browser, or a form of attaching content to e-mail or the like and sending it. Also, it may be registered in a specific FTP site.
[0031]
  The content storage unit 120 stores the content received by the content input reception unit 110. As an example, the content storage unit 120 is a storage device such as a hard disk or a flash memory, and may be a dedicated storage device or may be shared with other storage devices.
[0032]
  The feature amount extraction unit 132 extracts a feature amount for detecting coincidence between contents or for effectively calculating the similarity from each content received by the content input reception unit 110. As an example, the feature amount extraction unit 132 is a CPU equipped with a program that operates according to a predetermined rule. For example, if the content is video information, it may operate to extract a histogram of color information observed in each frame in the video, or if it is music information, operate to extract frequency components at each time May be.
[0033]
  As an example, the feature amount extraction unit 132 performs feature amount extraction by a procedure such as image index generation described in Patent Literature 2 or representative layout feature amount extraction described in Patent Literature 3. With respect to feature amount extraction, the feature amount described in Patent Document 2 or Patent Document 3 may be used as long as the degree of similarity or coincidence between contents such as images, video, or audio information can be calculated.
  The feature amount extracted by the feature amount extraction unit 132 may be stored in a feature amount storage unit (not shown), or extracted from the content stored in the content storage unit 120 for each similarity calculation. Also good.
[0034]
  The similar content detection unit 134 calculates the mutual similarity of the feature amounts extracted by the feature amount extraction unit 132 in order to perform similarity determination or coincidence detection between contents, and the similarity is equal to or greater than a predetermined threshold value. Detect content groups with degrees. The detected result is stored as similar content information in a temporary storage unit (not shown), and is output to the fraud determination unit 136 described later. As an example, the similar content detection unit 134 is a CPU equipped with a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs.
[0035]
  As a similar content detection method, for example, similar content detection based on a representative layout feature amount described in Patent Document 3 is used. When matching contents, if a match or similarity of a part or all of a frame in a video, a match or similarity of a part or whole of a still image, or a part or whole of a sound is detected, Patent Document 3 You may detect by methods other than.
[0036]
  That is, in the inappropriate content detection apparatus 100 according to the present embodiment, when the posted content is a video, the similar content detection unit 134 has some or all frame groups included in the video similar to each other. The similarity between the contents can be higher than a threshold. Further, when the posted content is a still image, the similar content detection unit 134 may determine that the similarity between the contents is higher than a threshold when a part or the whole of the still image is similar. it can. Furthermore, when the posted content includes sound or music, the similar content detection unit 134 determines that the similarity between the contents is higher than the threshold when the sound or part of the music or the whole phrase is similar. can do.
[0037]
  The fraud determination unit 136 determines the content fraud based on the similar content information detected by the similar content detection unit 134. As an example, the fraud determination unit 136 is a CPU equipped with a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs.
[0038]
  The inappropriate content output unit 140 receives the fraud determination result in the fraud determination unit 136 and outputs inappropriate content. As an example, the inappropriate content output unit 140 is a CPU equipped with a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs. As an example, the inappropriate content output unit 140 creates a list creation unit (not shown) that creates and outputs an inappropriate content list that lists inappropriate content determined as inappropriate content by the fraud determination unit 136. It can also be included.
[0039]
  A computer program for realizing various units (content input reception unit 110 to inappropriate content output unit 140) of the inappropriate content detection apparatus 100 as various functions is stored in a memory (not shown) used by the CPU. , Executed by the CPU.
[0040]
  The computer program according to the present embodiment is a computer program for causing a computer to realize an inappropriate content detection apparatus 100 that detects copyright inappropriate content from posted content. Procedure for accepting content input, procedure for extracting feature values from posted content, procedure for calculating the similarity of feature values by comparing feature values of multiple posted content, and calculation And a procedure for detecting a content group similar to each other based on the detected similarity, and a procedure for determining a copyright inappropriate content based on the detected similar content group. Yes.
[0041]
  The computer program is stored in, for example, a recording medium (memory) that can be read by a computer (CPU). As an example, the recording medium is a PROM (Programmable Read Only Memory), a hard disk, a DVD-ROM, a CD-ROM, an FD, or the like.
[0042]
  The operation of the inappropriate content detection apparatus 100 configured as described above according to this embodiment will be described below. FIG. 2 is a flowchart showing an example of the operation of the inappropriate content detection apparatus 100 of the present embodiment. Hereinafter, description will be made with reference to FIGS. 1 and 2.
[0043]
  First, input of a plurality of contents posted by the user is received by the content input receiving unit 110 (S11). Content may be input sequentially or a plurality of contents may be input together. The received content is stored in the content storage unit 120 (S13), and the feature amount extraction unit 132 extracts a feature amount for similarity calculation from each content received by the content input reception unit 110 (S15).
[0044]
  Then, the similar content detection unit 134 sequentially selects content pairs from the received content (S17), calculates the similarity of the selected content pair (S19), and the similarity is equal to or greater than a predetermined threshold. It is determined whether or not it is successful (S21). If the degree of similarity is equal to or greater than a predetermined threshold (Yes in S21), it is detected as similar content, and the detected content pair is stored in the temporary storage unit as mutual similar content information (S23).
[0045]
  Steps S17 to S23 are performed for all combinations of input contents (S25), and contents similar to each other are detected.
[0046]
  When detection of similar content is completed for all combinations (Yes in S25), the fraud determination unit 136 performs fraud determination based on the mutual similar content information stored in step S23 (S27). As an example, when more than a predetermined number of specific similar contents are detected, it is determined that these similar contents are illegal. As an example, if the predetermined number is two, all the contents stored in step S23 are determined to be inappropriate contents. When the predetermined number is N, based on the connection relationship of the content pair detected in step S21, N or more interconnections are detected, and the detected content is determined as inappropriate content.
[0047]
  Then, the inappropriate content output unit 140 outputs the content determined to be illegal (S29).
[0048]
  As described above, according to the inappropriate content detection apparatus 100 of this embodiment, it is inappropriate in terms of copyright without preparing sample content (dictionary data) as a base for finding inappropriate content. The object of the invention of efficiently and automatically detecting content can be achieved.
  Further, it is possible to efficiently determine the copyright appropriateness. It is possible to prevent inappropriate content from being published on a bulletin board or the like.
[0049]
  Inappropriate content that is infringing, there is original content such as TV recordings, magazines, CDs, DVDs, etc., and you can post part of the original content as it is, or download content posted by others and use it as it is There is a lot of duplication due to the characteristics of Appropriateness by evaluating similarity between posted content, detecting matching or duplication of posted video, still image, and sound, and detecting content with many duplicates as inappropriate copyrighted content Without preparing a content sample in advance, it becomes possible to determine the appropriateness of copyright and detect inappropriate content, thereby achieving the object of the present invention.
[0050]
(Second embodiment)
  FIG. 3 is a functional block diagram showing the configuration of the inappropriate content detection apparatus 200 according to the embodiment of the present invention. The inappropriate content detection apparatus 200 according to the present embodiment is different from the inappropriate content detection apparatus 100 according to the above-described embodiment in that the content is selected under a predetermined condition and the similarity is evaluated for the selected content.
[0051]
  In the inappropriate content detection apparatus 200 of the present embodiment, the similar content detection unit 234 has a content whose similarity is higher than a threshold value among a plurality of posted content among the content posted within a predetermined period. Are detected as content groups similar to each other, and the fraud determination unit 236 determines that the detected similar content groups are inappropriate content.
[0052]
  Further, in the inappropriate content detection apparatus 200 of the present embodiment, the similar content detection unit 234 selects a content based on a predetermined criterion from a plurality of posted content, and then determines the similarity between the selected content. The content group that is calculated and detects a content group having a degree of similarity higher than a threshold value among a predetermined number or more of content, and the fraud determination unit 236 determines that the detected content group is inappropriate content. For example, the inappropriate content detection apparatus 200 according to the present embodiment performs similarity evaluation for content posted in a predetermined period.
[0053]
  Here, the predetermined standard includes the date and time when the content is posted, the nationality of the poster who posted the content, the size of the posted content, and the like. These pieces of information are recorded in the content posting record, etc., and the content is selected based on the content posting record, that is, the posting time, the nationality of the poster, the size of the content, and the similarity between the selected content is determined. Calculate and detect inappropriate content.
[0054]
  The inappropriate content detection apparatus 200 according to this embodiment includes a feature amount storage unit 252 that stores the feature amount extracted by the feature amount extraction unit 232, and a feature of the content selected based on a predetermined criterion from the feature amount storage unit 252. An accumulated feature quantity selection unit 254 that acquires the quantity, and the similar content detection unit 234 calculates the mutual similarity of the feature quantities of the content selected by the accumulation feature quantity selection unit 254, and is similar to each other Is detected.
[0055]
  Furthermore, in the inappropriate content detection apparatus 200 of the present embodiment, the feature amount storage unit 252 stores the feature amount of the content posted for each posting, and the similar content detection unit 234 adds the newly posted new content. When similarity between new content is calculated for each of a plurality of accumulated content based on the feature amount of the content and the plurality of accumulated content, and there is content whose similarity to the new content is higher than a threshold The content group of the new content and the content whose similarity with the new content is higher than the threshold value is detected, and the fraud determination unit 236 determines the detected content group as inappropriate content.
[0056]
  Specifically, the inappropriate content detection apparatus 200 of the present embodiment includes the same content input reception unit 110, content storage unit 120, and inappropriate content output unit 140 as the inappropriate content detection apparatus 100 of the above embodiment. In addition, a feature amount extraction unit 232, a similar content detection unit 234, a fraud determination unit 236, a feature amount storage unit 252, and an accumulated feature amount selection unit 254 are provided.
[0057]
  The feature amount extraction unit 232 extracts a feature amount for effectively calculating a match or similarity between contents from the content. The extracted feature amount is stored in the feature amount storage unit 252. For example, when the content is video information, the feature amount extraction unit 232 may operate to extract a histogram of color information observed in each frame in the video, or in the case of music information, the frequency at each time It may operate to extract components.
[0058]
  As an example, the feature amount extraction unit 232 extracts feature amounts by a procedure such as image index generation described in Patent Document 2 described in the above embodiment or representative layout feature amount extraction described in Patent Document 3. I do. With respect to feature amount extraction, the feature amount described in Patent Document 2 or Patent Document 3 may be used as long as the degree of similarity or coincidence between contents such as images, video, or audio information can be calculated.
[0059]
  The feature amount storage unit 252 stores the feature amount extracted by the feature amount extraction unit 232. As an example, there is a storage device such as a hard disk or a flash memory, which may be a dedicated storage device, or may be used in combination with another storage device.
[0060]
  The accumulated feature amount selection unit 254 selects a feature amount whose similarity is to be calculated from the feature amount accumulation unit 252. As an example, the CPU includes a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs. As an example, the accumulated feature amount selection unit 254 selects the feature amount of the content posted during a predetermined period from the feature amounts accumulated in the feature amount accumulation unit 252 based on the posting date and time of the posting record. In addition, content to be evaluated for similarity may be selected in accordance with the poster of the content recorded in the posting record, the nationality of the poster, the size of the posted content, and the like.
[0061]
  The similar content detection unit 234 compares the feature amount of the received content or the feature amount of the accumulated content with each other to calculate the mutual similarity, and detects the similar content based on the similarity. As an example, the CPU includes a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs. When detecting similar content groups (content groups), as described in the above embodiment, a method of obtaining the omnibus similarity of all feature quantity pairs (steps S17 to S25 in FIG. 2) may be used. First of all, the method of calculating the similarity between the feature amount of the newly submitted input content and the accumulated feature amount, and calculating the similarity only when similar content is detected May be used.
[0062]
  That is, in the inappropriate content detection apparatus 200 of the present embodiment, the feature amount storage unit 252 stores the feature amount of the content that has been posted so far, and the similar content detection unit 234 has the newly posted new content. When similarity between new content is calculated for each of a plurality of accumulated content based on the feature amount of the content and the plurality of accumulated content, and there is content whose similarity to the new content is higher than a threshold Alternatively, a new content may be added to the accumulated content to calculate a mutual similarity, and a content group having a similarity higher than a threshold value among a predetermined number or more may be detected.
  That is, when there is no content whose similarity with the new content is higher than the threshold value, the content group need not be detected.
[0063]
  At this time, the similar content detection unit 234 may use only the content posted in a predetermined period as the accumulated content to be compared with the new content.
[0064]
  The fraud determination unit 236 determines whether the content is illegal based on the similar content information detected by the similar content detection unit 234. In the present embodiment, the fraud determination unit 236 determines that the detected similar content group is inappropriate content when the number of similar content groups detected by the similar content detection unit 234 is equal to or greater than a predetermined number. As an example, the fraud determination unit 236 is a CPU equipped with a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs.
[0065]
  The computer program for realizing the various units (content input receiving unit 110 to inappropriate content output unit 140, feature amount extraction unit 232 to accumulated feature amount selection unit 254) of the inappropriate content detection apparatus 200 as various functions is described above. Are stored in a memory (not shown) used by the CPU and executed by the CPU.
[0066]
  In the computer program of this embodiment, in addition to the procedure of the computer program of the above embodiment, among the content posted to the computer within a predetermined period, a predetermined number or more of the posted content has a threshold of mutual similarity It is described so as to further execute a procedure for detecting higher content as a content group similar to each other.
[0067]
  The operation of the inappropriate content detection apparatus 200 of the present embodiment configured as described above will be described below. FIG. 4 is a flowchart showing an example of the operation of the inappropriate content detection apparatus 200 of this embodiment. Hereinafter, a description will be given with reference to FIGS. 3 and 4.
[0068]
  In addition, in operation | movement of the inappropriate content detection apparatus 200 of this embodiment, since step S11, S13, and S29 of the flowchart of FIG. 2 are the same, description is abbreviate | omitted.
[0069]
  The feature amount extraction unit 232 extracts feature amounts for calculating similarity from each content input in step S11 (S31). The extracted feature amount is stored in the feature amount storage unit 252. Then, the accumulated feature amount selection unit 254 selects the feature amount of the reference content to be used for the mutual similarity evaluation from the feature amounts stored in advance in the feature amount accumulation unit 252 (S33). For example, as described above, the feature amount of the content posted in a predetermined period is selected based on the posting record.
[0070]
  Then, the similar content detection unit 234 calculates the similarity between the feature amounts obtained by adding the feature amount of the newly input content to the feature amount selected in step S33, and detects the similar content group. Perform (S35).
[0071]
  If the number of similar content groups detected in step S35 is equal to or greater than a predetermined number, the fraud determination unit 236 determines that the detected similar content group is fraudulent content (S37), and the inappropriate content output unit 140 The content determined to be illegal is output (S29).
[0072]
  As described above, according to the inappropriate content detection apparatus 200 of the present embodiment, it is possible to detect copies and open postings in which original content exists and which is posted many times within a specific period, It is possible to achieve the object of the invention to automatically detect copyright-inappropriate content without preparing sample content (dictionary data) as a base for finding appropriate content.
[0073]
(Third embodiment)
  FIG. 5 is a functional block diagram showing the configuration of the inappropriate content detection apparatus 300 according to the embodiment of the present invention. The inappropriate content detection apparatus 300 according to the present embodiment registers the inappropriate content detected as a dictionary with the inappropriate content detection apparatus 100 and the inappropriate content detection apparatus 200 according to the embodiment described above, and collates with the dictionary. It is different in that inappropriate content is detected by performing the operation, and when inappropriate content is detected, an alarm is presented to an administrator or the like.
[0074]
  The inappropriate content detection apparatus 300 according to the present embodiment includes an inappropriate content feature amount storage unit that uses inappropriate content determined by the fraud determination unit 236 as inappropriate or the feature amount of the inappropriate content as inappropriate content dictionary data. 312, an inappropriate content feature amount registration unit 314 stored in 312, a similar content detection unit 334 that detects inappropriate content by matching newly posted content with inappropriate content dictionary data, Is further provided.
[0075]
  Specifically, the inappropriate content detection device 300 of the present embodiment includes the same content input reception unit 110, content storage unit 120, inappropriate content output unit 140 as the inappropriate content detection device 100 of the above embodiment, and In addition to the same feature amount extraction unit 232, fraud determination unit 236, and feature amount storage unit 252 as the inappropriate content detection device 200 of the above embodiment, the similar content detection unit 334, the inappropriate content feature amount storage unit 312, an inappropriate content feature amount registration unit 314, and an alarm presentation unit 320.
[0076]
  The inappropriate content feature amount storage unit 312 stores the feature amount of content that has already been determined to be inappropriate content. As an example, there is a storage device such as a hard disk or a flash memory, which may be a dedicated storage device, or may be used in combination with another storage device.
[0077]
  The inappropriate content feature amount registration unit 314 registers the feature amount of the content determined to be inappropriate content in the inappropriate content feature amount storage unit 312. As an example, the CPU includes a program that operates according to a predetermined rule. For example, an operation is performed in which the feature amount of content determined to be unauthorized content is changed from the feature amount storage unit 252 to the inappropriate content feature amount storage unit 312. Further, the operation may be performed so as to newly register the feature amount of the copyrighted work content provided from a broadcasting station or the like.
[0078]
  The similar content detection unit 334 detects the similar content by calculating the mutual similarity of the feature amount of the input content or the feature amount of the accumulated content. As an example, the CPU includes a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs. As a method for calculating the similarity between feature quantities, for example, similar content detection based on representative layout feature quantities described in Patent Document 3 described in the above embodiment is used. When matching contents, if a match or similarity of a part or all of a frame in a video, a match or similarity of a part or whole of a still image, or a part or whole of a sound is detected, Patent Document 3 You may detect by methods other than.
[0079]
  The alarm presenting unit 320 presents an alarm to a content manager or the like when inappropriate content is detected. As an example, a monitor for presenting alarm text or video, a speaker for outputting sound, and the like can be given.
[0080]
  Various units of the inappropriate content detection apparatus 300 (content input reception unit 110 to inappropriate content output unit 140, feature amount extraction unit 232 to feature amount storage unit 252, inappropriate content feature amount storage unit 312 to similar content detection unit 334 ) As a variety of functions are stored in a memory (not shown) used by the CPU and executed by the CPU.
[0081]
  The computer program according to the present embodiment is a computer program according to the above-described embodiment, in addition to the procedure of the computer program according to the above-described embodiment. Stored in the inappropriate content feature amount storage unit 312 as inappropriate content dictionary data, and a procedure for detecting inappropriate content by matching newly posted content with inappropriate content dictionary data Are further executed.
[0082]
  The operation of the inappropriate content detection apparatus 300 configured as described above according to this embodiment will be described below. FIG. 6 is a flowchart illustrating an example of the operation of the inappropriate content detection apparatus 300 according to this embodiment. Hereinafter, a description will be given with reference to FIGS. 5 and 6.
[0083]
  In addition, in operation | movement of the inappropriate content detection apparatus 300 of this embodiment, since step S11 and S13 of the flowchart of FIG. 2 are the same, description is abbreviate | omitted.
[0084]
  The feature amount extraction unit 232 extracts a feature amount for calculating the similarity from each content input in step S11 (S41). The extracted feature amount is stored in the feature amount storage unit 252. Then, the similarity between the feature amount of the input content and the feature amount of each content stored in the inappropriate content feature amount storage unit 312 is calculated by the similar content detection unit 334 (S43).
[0085]
  When the similarity calculated in step S43 is equal to or greater than a predetermined threshold (Yes in S45), the fraud determination unit 236 determines that the input content having a feature amount with the similarity equal to or higher than the threshold is inappropriate content. (S51) The content judged inappropriate by the inappropriate content output unit 140 is output (S53). Then, the alarm presentation unit 320 raises an alarm of inappropriate content detection to the information manager (S55). The inappropriate content feature amount registration unit 314 registers the feature amount of the content newly determined as inappropriate content in the inappropriate content feature amount storage unit 312 (S57).
[0086]
  When the content calculated in step S43 is less than a predetermined threshold for all the content stored in the inappropriate content feature amount storage unit 312 (No in S45), the similar content detection unit 334 The feature amount pair is selected based on the feature amount of the input content and the feature amount stored in the feature amount storage unit 252 (S61), and the similarity between the selected feature amounts is calculated (S63). Then, the similar content detection unit 334 determines whether or not the similarity is greater than or equal to a predetermined threshold (S65). When the similarity is equal to or greater than a predetermined threshold (Yes in S65), the similar content detection unit 334 detects the similar feature amount pair, and stores it as similar information in the temporary storage unit (S67). This is performed for all combinations of feature quantities including the feature quantity from the input content and the feature quantity accumulated in the feature quantity storage unit 252 (S69), and the feature quantities of the contents that are similar to each other are extracted. .
[0087]
  When the similarity determination is completed for all the combinations (Yes in S69), the fraud determination unit 236 performs the fraud determination from the similar information between the feature amounts (S71). As an example, when the feature amount of specific mutually similar content is detected more than a predetermined number by adding the feature amount of the input content, the input content and the feature amount storage similar to the input content are accumulated. The content stored in the content storage unit 120 corresponding to the feature amount stored in the unit 252 is determined to be illegal. As an example, when the predetermined number is two, all contents including the input content newly detected in step S65 are determined to be inappropriate contents. When the predetermined number is N, based on the connection relation of the feature quantity pairs detected in step S65, the content corresponding to the detected feature quantity is detected by detecting N or more interconnections. It is determined as content.
[0088]
  Then, the inappropriate content output unit 140 outputs the content determined to be illegal (S73). Further, the alarm presenting unit 320 gives an alarm of inappropriate content detection to the information manager (S75). Then, the feature amount of the content newly determined as inappropriate content by the inappropriate content feature amount registration unit 314 is deleted from the feature amount storage unit 252, and the feature amount of the input content and the newly determined inappropriate content are determined. The feature amount of the content is registered in the inappropriate content feature amount storage unit 312 (S77).
[0089]
  In addition, in the inappropriate content detection apparatus 300, a generation unit (similar content detection unit 334, fraud determination unit 236) that generates inappropriate content dictionary data by calculating a degree of similarity between a predetermined period or a predetermined user or predetermined content. ) May be further provided. Thereby, dictionary data can be generated for each specific group such as a predetermined period or a predetermined user.
  In addition, the generation unit may periodically detect inappropriate content by calculating mutual similarity and generate or update inappropriate content dictionary data.
[0090]
  Further, by limiting the feature amount stored in the feature amount storage unit 252 to the feature amount of the content posted in the specific period, the mutual similarity of the content posted in the specific period is calculated. You can also Also, the feature quantity of copyright-protected content that is already detected as inappropriate content may be registered in the inappropriate content feature quantity storage unit 312 together.
[0091]
  In addition, the inappropriate content feature amount registration unit 314 does not automatically delete from the feature amount storage unit 252 or register in the inappropriate content feature amount storage unit 312, but after the content manager or the like confirms it. You may change so that it may register.
[0092]
  That is, the inappropriate content detection apparatus 300 presents the inappropriate content dictionary data generated by the inappropriate content feature amount registration unit 314 to the information manager, and the information manager confirms the inappropriate content dictionary data. An instruction for inappropriate content dictionary data to be held may be received, and only the specified inappropriate content dictionary data may be stored in the inappropriate content feature amount storage unit 312.
[0093]
  The threshold of similarity for judging mutual similarity, the number of cases when judging similar contents, or the feature quantity to be used are adaptively changed according to the situation such as genre, time, poster, etc., or the information manager manually It does not matter if there is a mechanism that can be adjusted with.
[0094]
  Further, the inappropriate content detection apparatus 300 may include a registration unit (not shown) that registers a predetermined user or predetermined content in advance. At this time, the fraud determination unit 236 may not determine that the content posted by the predetermined user or the predetermined content is inappropriate content.
[0095]
  As described above, according to the inappropriate content detection apparatus 300 of the present embodiment, when a predetermined number or more of high-similarities are detected in content posted within a predetermined period, those content groups are detected. Because it is considered inappropriate content, it is possible to detect duplicates and postings with original content that have been posted many times within a specific period, and sample content that serves as a base for finding inappropriate content Without preparing (dictionary data), it is possible to achieve the object of the invention to automatically and efficiently detect content inappropriate for copyright.
[0096]
  Inappropriate content that is infringing, there is original content such as TV recordings, magazines, CDs, DVDs, etc., and you can post part of the original content as it is, or download content posted by others and use it as it is Or because it has the property of being posted continuously in a short period of time, there are many short-term duplications. According to the present embodiment, similarity evaluation is performed between contents posted in a predetermined period, a match or overlap between posted videos, still images, and sounds is detected, and a predetermined number or more of duplicates are detected. By detecting a large amount of content as inappropriate content in terms of copyright, it becomes possible to detect inappropriate content without preparing a sample of inappropriate content in advance, and the object of the present invention can be achieved. .
[0097]
  In the present embodiment, only the feature amount of the inappropriate content is stored in the inappropriate content feature amount storage unit 312 and prepared as dictionary data. Therefore, compared to the case of preparing a sample of illegal content as dictionary data, Its capacity is very small.
[0098]
  Furthermore, according to the inappropriate content detection apparatus 300 of the present embodiment, when inappropriate content is detected, an alarm can be raised to prompt the information manager to confirm, thereby making the inappropriate content public. Can be prevented.
[0099]
(Fourth embodiment)
  FIG. 7 is a functional block diagram showing the configuration of the confirmation server 400 according to the embodiment of the present invention.
  The confirmation server 400 of this embodiment includes a communication unit 410, an alarm presenting unit 420, an administrator confirmation and registration unit 430, a storage unit 440, and a control unit 450.
[0100]
  The communication unit 410 exchanges contents and transmission information with a terminal (not shown) and other management devices (not shown) through the network 402. An example of the communication unit 410 is a dedicated board that performs communication via the network 402.
[0101]
  The alarm presenting unit 420 presents alarm information for detecting inappropriate content to the content manager. Examples of the alarm presenting unit 420 include a monitor for presenting alarm text or video, and a speaker for outputting sound information. Alternatively, a printer that creates a list in which alarm information is listed and prints it out may be used.
[0102]
  The administrator confirmation and registration unit 430 is for the administrator to confirm the inappropriate content detection result or newly register inappropriate content. As an example, the administrator confirmation and registration unit 430 may be a combination of a monitor that displays a result and an input device such as a keyboard and a touch panel. Alternatively, a printer that creates a list of inappropriate content detection results and prints them out may be used.
[0103]
  The accumulation unit 440 accumulates content, extracted feature amounts, feature amounts of inappropriate content, and the like. As an example, the storage unit 440 includes a storage device such as a hard disk or a flash memory, and may be a dedicated storage device or may be used in combination with another storage device.
[0104]
  The control unit 450 controls each element of the confirmation server 400 and the entire apparatus, and extracts inappropriate content such as content feature amount extraction and collation. The control unit 450 is a CPU loaded with a program as an example.
[0105]
  The communication unit 410, alarm presentation unit 420, administrator confirmation / registration unit 430, storage unit 440, and control unit 450 can be combined into a computer having a communication function, a storage function, a monitor function, and an input function. .
[0106]
  The control unit 450 receives a content from the communication unit 410 and inputs the content, an inappropriate content output unit 454 that outputs an inappropriate content detection result, and a feature amount of the content determined to be inappropriate content Is stored in the inappropriate content feature amount storage unit 444, the content management unit 460 that manages the content, the feature amount, and the inappropriate content feature amount stored in the storage unit 440, and the content Based on the detection result of the feature amount extraction unit 462 that extracts the feature amount from the content, the similar content detection unit 464 that extracts similar content by calculating the similarity between the feature amounts, and the similar content detection unit 464 The fraud determination unit 466 is included.
[0107]
  The storage unit 440 stores a content storage unit 442 that stores the input content, a feature amount storage unit 444 that stores the extracted content feature amount, and an inappropriate amount that stores the feature amount of the content determined to be inappropriate content Content feature amount storage unit 446.
[0108]
  Since the operation of the confirmation server 400 of the present embodiment configured as described above is the same as that of the inappropriate content detection apparatus 300 of the above-described embodiment, the description thereof is omitted.
[0109]
  According to the confirmation server 400 of the present embodiment, the same effects as those of the inappropriate content detection apparatus 300 of the above-described embodiment can be obtained.
[0110]
(Fifth embodiment)
  FIG. 8 is a diagram showing a configuration of a content publishing system 500 according to the embodiment of the present invention.
  A content publishing system 500 according to this embodiment is a system that provides a video publishing service, and includes a network 502, a plurality of terminal devices 510 connected on the network 502, and a confirmation server that detects and confirms inappropriate content. 520, a public data storage unit 530 for storing public data posted by the user, a public server 540 for publicizing public data, and a public data storage when the confirmation server 520 detects inappropriate content. A public data deletion unit 550 that deletes public data from the unit 530.
[0111]
  The terminal device 510 is a terminal device having a function of accessing the network 502 and can be realized by a general-purpose computer.
  The network 502 is a communication network such as the Internet that can transmit and receive data.
[0112]
  The confirmation server 520 has a function of accepting post data transmitted from the terminal device 510 via the network 502 and storing the post data in the public data storage unit 530, and among the post data stored in the public data storage unit 530. The server computer has a function of detecting and deleting inappropriate items. The confirmation server 520 can be realized by the confirmation server 400 described in the above embodiment of FIG. The inappropriate content output unit 522 can be realized by the same configuration as the inappropriate content output unit 454 of the confirmation server 400 in FIG.
[0113]
  The public server 540 is a server computer having a function of publishing data stored in the public data storage unit 530 through the network 502, and can be realized by a computer having a function as a normal WWW server.
[0114]
  The public data deletion unit 550 receives the detection result of the confirmation server 520 and deletes the public data in the public data storage unit 530. As an example, the CPU includes a program that operates according to a predetermined rule. It may be a dedicated CPU or may be shared with other CPUs.
[0115]
  As described above, according to the content publishing system 500 of the present embodiment, a mechanism for deleting public data when inappropriate content is detected can be realized. Since only the data stored in the public data storage unit 530 from which inappropriate data has been deleted can be obtained, the inappropriate data is not disclosed to the public. Therefore, according to the content publishing system 500 of the present embodiment, it is possible to detect copies and obscured posts in which a large number of original posts are duplicated within a specific period, and to find inappropriate content. It is possible to achieve the object of the invention to automatically detect copyright-inappropriate content without preparing sample content (dictionary data) as a base for the above.
[0116]
  Further, according to the content publishing system 500 of this embodiment, when inappropriate content is detected, it is possible to prevent the inappropriate content from being published by deleting the uploaded content.
[0117]
  As mentioned above, although embodiment of this invention was described with reference to drawings, these are the illustrations of this invention, Various structures other than the above are also employable.
[0118]
  For example, the content publishing system 500 of the above-described embodiment is a content publishing system that publishes posted content so that a user can view it, and is detected by the inappropriate content detection device (100 to 300) of the above-described embodiment. Inappropriate according to the deletion instruction, the presentation unit (not shown) that presents inappropriate content to the system administrator, the reception unit (not shown) that receives a deletion instruction from the system administrator after confirmation by the system administrator A deletion unit (not shown) for deleting various contents.
[0119]
  The content publishing system 500 of the above embodiment is a content publishing system that publishes posted content so that a user can view it. The content publishing system 500 of the above embodiment mutually uses the inappropriate content detection device (100 to 300). When the number of contents determined to be similar is greater than a predetermined number and a determination unit (not shown) that determines whether or not the number of contents determined to be similar to each other, It can also have a control part (not shown) which stops a user's access to the contents automatically.
[Example 1]
[0120]
  Next, an example of the operation of the best mode for carrying out the present invention will be described using specific examples.
  FIG. 9 is a diagram schematically showing a rough image of using the inappropriate content detection apparatus of the present invention. In FIG. 9, a broadcast station 610 broadcasts program 1. Each of the users A, B, C, D, and E receives the television broadcast at the terminal device 620 and views the program 1 (steps S101, S111, S121, S131, and S141). Assume that users A, B, and E record a program, cut out a part, and upload videos V1, V2, and V5 to the posting site, respectively (steps S103, S113, and S143). It is assumed that users C and D have posted their own content (moving images V3 and V4) taken with a handycam or the like owned by the user on the posting site (steps S123 and S133). Each posted video is received by the confirmation server 630 via the network 602. Each moving image posted on the confirmation server 630 is determined to be fraudulent, and only appropriate content is published on the publication server 640.
[0121]
  In FIG. 9, a moving image V1, a moving image V2, and a moving image V5 are obtained by cutting out a part of the same original content, and when the same frame is included therein, it can be detected as a similar image. On the other hand, since the video V3 and the video V4 are completely unique contents, they are not similar to any other video (in the figure, the dissimilarity between the video V3 and another video is indicated by a dotted arrow, and the video V4 is dissimilar The relationship is not shown.) As a result, as shown in FIG. 10, the moving images V1, V2, and V5 (in which a similar relationship is indicated by a solid arrow in the figure) in which mutual similarities have been confirmed are determined as illegal content candidates.
[0122]
  FIG. 11 shows some typical similar patterns.
  In FIG. 11, a specific part of the original moving picture Vo is cut out. In FIG. 11, the horizontal axis indicates the time axis, and the moving images Va, Vb, Vc, and Vd are cut out from the partial frames of the original content (clips).
[0123]
  For example, Va1 of moving image Va and Vb1 of moving image Vb include the same frame, Vb2 of moving image Vb and Vc1 of moving image Vc include the same frame, and Vc2 of moving image Vc and Vd1 of moving image Vd include the same frame. Since the same frame is included in this way, it can be said that the moving images Va and Vb, Vb and Vc, and Vc and Vd are similar to each other. The videos Va and Vd do not include the same frame and are not similar to each other, but due to the mutual connection relationship between the videos Va, Vb, Vc, and Vd, the videos Va, Vb, Vc, and Vd are mutually connected. Can be determined to be similar.
[0124]
  FIG. 12 shows an example of a still image. For example, when an original book 702 or the like is posted after being captured by a scanner or the like (step S202), many similar contents (images 710, 720, and 730) that are slightly different but similar to each other are posted as shown in FIG. Note that matching images are also included considerably. For example, the image 710 includes the image 712 of the original book 702 tilted. The image 720 includes a margin 724 around the image 722 of the original book 702. Further, the image 730 includes an image 732 in which a part of the image of the original book 702 is cut out.
[0125]
  Images such as these can also be detected as mutual similar contents by each similarity matching engine such as Patent Document 2 and Patent Document 3, so that such illegal content can be deleted. become. The same applies to music data and others.
[0126]
  This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2007-272968 for which it applied on October 19, 2007, and takes in those the indications of all here.
[0127]
  While the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

Claims

Inappropriate content detection device
Accepts content posted by individual users,
Using the plurality of accepted posted contents, the mutual similarity of the plurality of posted contents is calculated,
An inappropriate content detection method for determining whether the posted content is copyrightally appropriate based on the mutual similarity.

Inappropriate content detection device
Accepts content posted by individual users,
Using a plurality of received posted contents, whether or not the plurality of posted contents are similar to each other based on a mutual similarity calculated between the plurality of posted contents And an inappropriate content detection method for detecting content groups similar to each other as copyright inappropriate content when they are similar.

In the inappropriate content detection method according to claim 1 or 2 ,
The inappropriate content detection device is
Inappropriately detecting a content group in which the similarity is determined to be higher than a threshold value among a plurality of posted content among the posted content within a predetermined period as the inappropriate content Content detection method.

The inappropriate content detection method according to any one of claims 1 to 3 ,
The inappropriate content detection device is
After selecting the content from a plurality of the posted content according to a predetermined criterion, the similarity between the selected content is calculated, and the similarity between the predetermined number or more is higher than a threshold value An inappropriate content detection method for detecting a content group determined to be the inappropriate content.

The inappropriate content detection method according to any one of claims 1 to 4 ,
The inappropriate content detection device is
Extract features from the posted content,
An inappropriate content detection method for calculating a similarity between contents by comparing the feature quantities of the posted contents with each other.

The inappropriate content detection method according to any one of claims 1 to 5 ,
The inappropriate content detection device is
Inappropriate content detection method in which, when the posted content is a video, when some or all of the frame groups included in the video are similar to each other, the similarity between the contents is higher than a threshold value .

The inappropriate content detection method according to any one of claims 1 to 6 ,
The inappropriate content detection device is
An inappropriate content detection method in which, when the posted content is a still image, the similarity between the contents is higher than a threshold when some or all of the still images are similar.

The inappropriate content detection method according to any one of claims 1 to 7 ,
The inappropriate content detection device is
An inappropriate content detection method in which, when the posted content includes sound or music, the similarity between the contents is higher than a threshold when a part or the whole phrase of the sound or music is similar.

The inappropriate content detection method according to any one of claims 1 to 8 ,
The inappropriate content detection apparatus includes a feature amount storage unit,
The inappropriate content detection device is
A feature amount of the posted content is extracted for each post and accumulated in the feature amount storage unit, and a plurality of new features are newly added and based on the feature amounts of the plurality of accumulated contents. For each of the stored content, a similarity with the new content is calculated, and when there is content whose similarity with the new content is higher than a threshold, the new content and the new content An inappropriate content detection method for determining a content group of content determined to have a similarity higher than a threshold as the inappropriate content.

The inappropriate content detection method according to any one of claims 1 to 8 ,
The inappropriate content detection apparatus includes a feature amount storage unit,
The inappropriate content detection device is
The feature amount of the content that has been posted so far is extracted and stored in the feature amount storage unit. Based on the newly posted new content and the feature amount of the plurality of stored content, For each accumulated content, the similarity with the new content is calculated, and when there is content with the similarity higher than a threshold with the new content, the new content is added to the accumulated content. Inappropriate content detection method for calculating the mutual similarity and detecting a content group in which the similarity is determined to be higher than a threshold value among a predetermined number or more of content as the inappropriate content.

The inappropriate content detection method according to claim 10 ,
The inappropriate content detection device is
An inappropriate content detection method using only content posted in a predetermined period as the accumulated content to be compared with the new content.

The inappropriate content detection method according to any one of claims 1 to 11 ,
The inappropriate content detection device includes a dictionary storage unit,
The inappropriate content detection device is
The posted content determined to be inappropriate or the feature amount of the content is stored as inappropriate content dictionary data in the dictionary storage unit, and the newly stored content is stored with the stored inappropriate content dictionary data. An inappropriate content detection method for detecting the inappropriate content by performing collation.

The inappropriate content detection method according to claim 12 ,
The inappropriate content detection device is
An inappropriate content detection method for generating the inappropriate content dictionary data by calculating a similarity between users or content for a predetermined period of time.

The inappropriate content detection method according to claim 12 or 13 ,
The inappropriate content detection device is
An inappropriate content detection method that periodically detects inappropriate content by calculating the mutual similarity and generates or updates the inappropriate content dictionary data.

The inappropriate content detection method according to any one of claims 12 to 14 ,
The inappropriate content detection device is
An inappropriate content detection method in which an information manager confirms the inappropriate content dictionary data, and corrects to retain only the inappropriate content instructed by the information manager.

The inappropriate content detection method according to any one of claims 1 to 15 ,
The inappropriate content detection device includes a registration unit,
The inappropriate content detection device is
An inappropriate content detection method in which a predetermined user or predetermined content is registered in the registration unit in advance, and the content posted by the predetermined user or the predetermined content is not detected as the inappropriate content.

Content acceptance means for accepting input of content posted by individual users;
Similar content detection means for calculating a similarity between a plurality of the posted contents using the received plurality of received contents, and detecting a group of similar contents based on the similarity,
An inappropriate content detection apparatus comprising: fraud determination means for determining copyright-inappropriate content based on the detected similar content group.

The inappropriate content detection apparatus according to claim 17 ,
The similar content detection means detects, as content groups similar to each other, content having a similarity higher than a threshold value among a plurality of the posted content among the content posted within a predetermined period. And
The injustice determining unit is an inappropriate content detection apparatus that determines the detected similar content group as the inappropriate content.

The inappropriate content detection apparatus according to claim 17 or 18 ,
The similar content detection means calculates a mutual similarity between the selected contents after selecting a content from a plurality of the posted contents based on a predetermined criterion, To detect content groups whose similarity is higher than the threshold,
The inappropriate content detection device, wherein the fraud determination means determines that the detected content group is the inappropriate content.

The inappropriate content detection apparatus according to any one of claims 17 to 19 ,
Comprising a feature amount extraction means for extracting the feature amount from the posted content;
The similar content detecting means collates the feature quantities of the plurality of posted contents with each other, calculates a mutual similarity of the feature quantities, and based on the similarity, a group of similar contents Inappropriate content detection device that detects

The inappropriate content detection apparatus according to claim 20 ,
Feature quantity storage means for storing the feature quantity extracted by the feature quantity extraction means;
Accumulated feature quantity selection means for acquiring the feature quantity of the content selected on the basis of a predetermined standard from the feature quantity accumulation means,
The similar content detection unit is an inappropriate content detection apparatus that calculates a similarity between the feature amounts of the content selected by the accumulated feature amount selection unit and detects content similar to each other.

The inappropriate content detection apparatus according to any one of claims 17 to 21 ,
The similar content detection means, when the posted content is a video, the similarity between the contents is higher than a threshold when some or all of the frame groups included in the video are similar to each other Inappropriate content detection device.

The inappropriate content detection device according to any one of claims 17 to 22 ,
The similar content detection means is an inappropriate content in which, when the posted content is a still image, when a part or the whole of the still image is similar, the similarity between the contents is higher than a threshold value Detection device.

The inappropriate content detection apparatus according to any one of claims 17 to 23 ,
When the posted content includes sound or music, the similar content detection unit assumes that the similarity between the contents is higher than a threshold when a part or the whole phrase of the sound or music is similar Inappropriate content detection device.

The inappropriate content detection apparatus according to claim 21 ,
The feature amount storage means stores the feature amount of the posted content for each posting,
The similar content detection means calculates a similarity with the new content for each of the plurality of accumulated contents based on the newly posted new content and the feature amount of the plurality of accumulated contents. , When there is content whose similarity with the new content is higher than a threshold, a content group of the new content and content with the similarity with the new content higher than the threshold is detected.
The injustice determination unit is an inappropriate content detection apparatus that determines the detected content group as the inappropriate content.

The inappropriate content detection apparatus according to claim 21 ,
The feature amount storage means stores the feature amount of the content posted so far,
The similar content detection means calculates a similarity with the new content for each of the plurality of accumulated contents based on the newly posted new content and the feature amount of the plurality of accumulated contents. When there is content whose similarity with the new content is higher than a threshold value, the new content is added to the accumulated content, and the mutual similarity is calculated. To detect a group of content whose similarity is higher than a threshold,
The injustice determination unit is an inappropriate content detection apparatus that determines the detected content group as the inappropriate content.

The inappropriate content detection apparatus according to claim 26 ,
The similar content detection unit is an inappropriate content detection device that uses only content posted in a predetermined period as the accumulated content to be compared with the new content.

The inappropriate content detection apparatus according to any one of claims 17 to 27 ,
Dictionary storage means for storing the inappropriate content determined as inappropriate by the fraud determination means or the feature amount of the inappropriate content as inappropriate content dictionary data;
An inappropriate content detection apparatus further comprising inappropriate content detection means for detecting the inappropriate content by comparing newly posted content with the inappropriate content dictionary data.

The inappropriate content detection apparatus according to claim 28 ,
An inappropriate content detection apparatus further comprising a generating unit that generates the inappropriate content dictionary data by calculating a similarity between a predetermined period or a predetermined user or predetermined content.

The inappropriate content detection apparatus according to claim 29 ,
The improper content detection apparatus, wherein the generation means periodically detects improper content by calculating the mutual similarity and generates or updates the improper content dictionary data.

The inappropriate content detection apparatus according to any one of claims 28 to 30 ,
The inappropriate content dictionary data is presented to the information manager, and the information manager confirms the inappropriate content dictionary data and accepts an instruction for the inappropriate content dictionary data to be held. An inappropriate content detection apparatus for storing the content in the dictionary storage means.

The inappropriate content detection apparatus according to any one of claims 17 to 31 ,
A registration unit for registering a predetermined user or predetermined content in advance;
The injustice determination unit is an inappropriate content detection apparatus in which the content posted by the predetermined user or the predetermined content is not determined as the inappropriate content.

A computer program for causing a computer to implement an inappropriate content detection device that detects copyright inappropriate content from content posted by individual users,
In the computer,
A procedure to accept input of content posted by individual users,
Using the received plurality of posted content, calculating a plurality of similarities of the posted content, and detecting a content group similar to each other based on the similarity,
A computer program for executing a procedure for determining copyright-inappropriate content based on the detected similar content group.

34. A computer program according to claim 33 .
In the computer,
In order to further execute a procedure of detecting, as a content group similar to each other, content whose similarity is higher than a threshold value among a plurality of posted content among content posted within a predetermined period Computer program.

The computer program according to claim 33 or 34 ,
In the computer,
A procedure for extracting the feature amount from the posted content;
A procedure for comparing the feature quantities of a plurality of the posted content with each other and calculating a similarity between the feature quantities;
A computer program for further executing a procedure for detecting the content groups similar to each other based on the calculated similarity.

36. A computer program according to any one of claims 33 to 35 ,
The computer includes a dictionary storage device for storing inappropriate content dictionary data,
In the computer,
A procedure for storing the inappropriate content determined as inappropriate in the procedure for determining the inappropriate content or a feature amount of the inappropriate content in the dictionary storage device as the inappropriate content dictionary data;
A computer program for further executing a procedure of detecting the inappropriate content by collating the newly posted content with the inappropriate content dictionary data.

A content publishing system that publishes posted content so that users can view it.
Presenting means for presenting the inappropriate content detected by the inappropriate content detection device according to any one of claims 17 to 32 to a system administrator;
Receiving means for receiving a delete instruction from the system administrator after the system administrator confirms;
A content publishing system comprising: deletion means for deleting the inappropriate content in accordance with the deletion instruction.

A content publishing system that publishes posted content so that users can view it.
Determination means for determining whether or not the number of contents determined to be similar to each other by the inappropriate content detection apparatus according to any one of claims 17 to 32 is greater than a predetermined number;
And a control unit that automatically stops a user's access to the content when the number of content determined to be similar to each other is greater than a predetermined number.

The inappropriate content detection method according to any one of claims 1 to 16 ,
The inappropriate content detection device is
The inappropriate content detection method, wherein the content for calculating the degree of similarity is a plurality of posted content whose copyright is not appropriate.

The inappropriate content detection method according to any one of claims 12 to 15 ,
The inappropriate content detection device is
An inappropriate content detection method for storing copyrighted content in the dictionary storage unit as the inappropriate content dictionary data.

The inappropriate content detection method according to any one of claims 1 to 16, 39, and 40 ,
The inappropriate content detection device is
Select content pairs sequentially from the plurality of accepted posted content,
Calculating the similarity of the selected content pair,
Find the number of content whose calculated similarity is equal to or greater than a predetermined threshold,
An inappropriate content detection method for determining whether or not the content is copyrightally appropriate based on the number of cases of the content.

The inappropriate content detection method according to claim 41 ,
The inappropriate content detection device is
When the content having the calculated similarity equal to or higher than a predetermined threshold is detected more than a predetermined number, the detected content pair is detected as similar content, and the similar content group Inappropriate content detection method for determining content that is not legally appropriate.

The inappropriate content detection device according to any one of claims 17 to 32 ,
The inappropriate content detection apparatus in which the content for which the similar content detection unit calculates the mutual similarity is a plurality of posted content whose copyrights are not appropriate.

The inappropriate content detection apparatus according to any one of claims 28 to 31 ,
The dictionary storage means is an inappropriate content detection apparatus that stores copyrighted content as the inappropriate content dictionary data.

The inappropriate content detection device according to any one of claims 17 to 32, 43, and 44 ,
The similar content detection means includes
Select a content pair sequentially from the accepted plurality of posted content,
Calculating the similarity of the selected content pair,
Find the number of content whose calculated similarity is equal to or greater than a predetermined threshold,
The fraud determination means includes
An inappropriate content detection apparatus that determines whether or not the content is copyrightally appropriate based on the number of cases of the content.

The inappropriate content detection apparatus according to claim 45 ,
The similar content detection means detects the detected content pair as similar content when the content with the calculated similarity equal to or greater than a predetermined threshold is detected more than a predetermined number,
The injustice determination unit is an inappropriate content detection apparatus that determines that the similar content group is copyright inappropriate content.

A computer program according to any one of claims 33 to 36 ,
A computer program comprising a plurality of posted contents whose copyrights are unconfirmed as to whether or not the contents whose mutual similarity is calculated in the procedure of detecting the content groups similar to each other.

The computer program according to claim 36 ,
In the computer,
A computer program for further executing a procedure of storing copyrighted content in the dictionary storage device as the inappropriate content dictionary data.

A computer program according to any one of claims 33 to 36, 47 and 48 ,
In the computer ,
A procedure for sequentially selecting a content pair from a plurality of the posted content received;
A procedure for calculating the similarity of the selected content pair;
A procedure for obtaining the number of contents whose calculated similarity is equal to or greater than a predetermined threshold;
A computer program for further executing a procedure for determining whether or not the content is copyrightally appropriate based on the number of cases of the content.

50. A computer program according to claim 49 .
In the computer,
A procedure for detecting a detected content pair as a similar content when the calculated similarity is equal to or greater than a predetermined threshold,
A computer program for causing the similar content group to further execute a procedure for determining that the content is inappropriate in terms of copyright.