JP2008192102A

JP2008192102A - Metadata generation device and metadata generation method

Info

Publication number: JP2008192102A
Application number: JP2007028864A
Authority: JP
Inventors: Makoto Akaha; 誠赤羽; Satoru Sasa; 哲佐々; Hirotoshi Maekawa; 博俊前川
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2007-02-08
Filing date: 2007-02-08
Publication date: 2008-08-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology for effectively generating metadata. <P>SOLUTION: In this metadata generation device, a low-level metadata extraction part 120 extracts low-level metadata from content data, and a medium-level metadata extraction part 140 extracts medium-level metadata from the low-level metadata. A beat feature amount analysis part 122 and a time-interval analysis part 124 extract low-level metadata respectively from the same music data 112a. A beat information extraction part 142 and a medium-level music feature amount extraction part 144 extract medium-level metadata respectively from the low-level metadata extracted by the beat feature amount analysis part 122 and the time-interval analysis part 124. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、コンテンツのメタデータを生成する技術に関する。 The present invention relates to a technique for generating content metadata.

メタデータとはコンテンツの関連情報であり、コンテンツの作成日時や作成者、データ形式、タイトルなどのコンテンツ付随情報を示すことが多い。たとえばＭＰ３（MPEG1 Aidio Layer-3）ファイルには、ＩＤ３タグと呼ばれる規格により、曲名やアーティスト名などをテキストデータとして記述した書誌情報が埋め込まれている。最近、ＩＤ３タグによる記入サイズが拡張されて、記入できる項目が増加し、画像データを含めることも可能となっている。 Metadata is content-related information, and often indicates content-related information such as the date and time of creation, creator, data format, and title. For example, in an MP3 (MPEG1 Aidio Layer-3) file, bibliographic information in which a song title, artist name, etc. are written as text data is embedded according to a standard called ID3 tag. Recently, the entry size by the ID3 tag has been expanded, the number of items that can be entered has increased, and image data can also be included.

また近年では、音楽データを解析して、音楽の特徴量を解析して抽出しようとする試みがなされている。音楽の特徴量であるリズムなどをメタデータとして抽出することで、音楽をジャンル分けするような処理も可能になる。このように１つのコンテンツから、様々な種類のメタデータを抽出することが可能であり、メタデータの種類が増えることで、メタデータを利用した様々なアプリケーションの実現が可能となる。 In recent years, attempts have been made to analyze music data and analyze and extract music features. By extracting rhythms or the like that are characteristic features of music as metadata, it is possible to perform processing such as categorizing music. As described above, various types of metadata can be extracted from one content, and by increasing the number of types of metadata, various applications using the metadata can be realized.

メタデータをアプリケーションに応用するためには、メタデータの抽出精度を高めることが好ましい。たとえば、ＭＰ３ファイルからは、音楽の特徴量から導出されるメタデータと、またＩＤ３タグに記述されたメタデータとを抽出することができるが、これらのメタデータはそれぞれ独立に存在しているのであって、それらを統合して処理する試みはなされていない。本発明者は、独立して存在しているコンテンツデータやメタデータを、互いに関連づけて処理することにより、メタデータを効果的に生成できる技術を開発するに至った。 In order to apply metadata to an application, it is preferable to increase the accuracy of metadata extraction. For example, from MP3 files, metadata derived from music features and metadata described in ID3 tags can be extracted, but these metadata exist independently. There have been no attempts to integrate and process them. The present inventor has developed a technique that can effectively generate metadata by processing content data and metadata that exist independently in association with each other.

上記課題を解決するために、本発明のある態様のメタデータ生成装置は、コンテンツデータから第１レベルのメタデータを抽出する第１メタデータ抽出部と、第１レベルのメタデータから、第２レベルのメタデータを抽出する第２メタデータ抽出部とを備えて、コンテンツに関連するメタデータを階層的に生成する。このメタデータ生成装置において、前記第１メタデータ抽出部は、コンテンツデータに含まれる第１データから、複数種類の第１レベルのメタデータを抽出し、前記第２メタデータ抽出部は、複数種類の第１レベルのメタデータから、第２レベルのメタデータを抽出する。 In order to solve the above-described problem, a metadata generation apparatus according to an aspect of the present invention includes a first metadata extraction unit that extracts first-level metadata from content data, and a second metadata from the first-level metadata. A second metadata extraction unit for extracting level metadata, and hierarchically generating metadata related to the content. In the metadata generation apparatus, the first metadata extraction unit extracts a plurality of types of first level metadata from the first data included in the content data, and the second metadata extraction unit includes a plurality of types. The second level metadata is extracted from the first level metadata.

本発明の別の態様のメタデータ生成方法は、コンテンツデータから第１レベルのメタデータを抽出するステップと、第１レベルのメタデータから、第２レベルのメタデータを抽出するステップとを備えて、コンテンツに関連するメタデータを階層的に生成するメタデータ生成方法に関する。この方法において、第１レベルのメタデータを抽出するステップは、コンテンツデータに含まれる１種類のデータから、複数種類の第１レベルのメタデータを抽出し、第２レベルのメタデータを抽出するステップは、複数種類の第１レベルのメタデータから、１種類の第２レベルのメタデータを抽出する。 A metadata generation method according to another aspect of the present invention includes a step of extracting first level metadata from content data, and a step of extracting second level metadata from the first level metadata. The present invention relates to a metadata generation method for hierarchically generating metadata related to content. In this method, the step of extracting the first level metadata includes a step of extracting a plurality of types of first level metadata and extracting a second level metadata from one type of data included in the content data. Extracts one type of second level metadata from a plurality of types of first level metadata.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によると、メタデータを効果的に生成する技術を提供することができる。 According to the present invention, it is possible to provide a technique for effectively generating metadata.

図１は、本発明の実施例にかかる情報処理システムの使用環境を示す。情報処理システム１は、アプリケーションを実行する情報処理装置１０と、情報処理装置１０における処理結果を出力する表示機器１２とを備える。表示機器１２は、画像を出力するディスプレイ部を有して構成され、さらに音声を出力する音声出力部を有するテレビであってよい。表示機器１２は、情報処理装置１０に有線ケーブルで接続されてよく、また無線ＬＡＮ（Local Area Network）などにより無線接続されてもよい。情報処理システム１において、情報処理装置１０は、ケーブル１４を介してインターネットなどの外部ネットワークに接続する。なお、無線通信により外部ネットワークへの接続が行われてもよい。情報処理装置１０は、ゲームデータが記録されたメディアを装着されて、そのゲームデータを実行し、ゲームアプリケーションの処理結果を示す画像信号および音声信号を生成するゲーム装置であってよい。 FIG. 1 shows a use environment of an information processing system according to an embodiment of the present invention. The information processing system 1 includes an information processing apparatus 10 that executes an application and a display device 12 that outputs a processing result in the information processing apparatus 10. The display device 12 may be a television set that includes a display unit that outputs an image and further includes an audio output unit that outputs audio. The display device 12 may be connected to the information processing apparatus 10 by a wired cable, or may be wirelessly connected by a wireless local area network (LAN) or the like. In the information processing system 1, the information processing apparatus 10 is connected to an external network such as the Internet via a cable 14. Connection to an external network may be performed by wireless communication. The information processing apparatus 10 may be a game apparatus that is loaded with a medium on which game data is recorded, executes the game data, and generates an image signal and an audio signal indicating the processing result of the game application.

本実施例において情報処理装置１０は、外部ネットワークに接続するコンテンツ提供サーバから、音楽ファイルや映像ファイルなどのコンテンツデータを受信して、再生する機能を有する。コンテンツデータは、ファイル形式のデータであってもよく、またストリーミング形式のデータであってもよい。コンテンツデータとして、たとえば評論家による音楽アルバムの評論ファイルなどが含まれてもよい。なおコンテンツデータは、ネットワーク経由ではなく、光ディスクや光磁気ディスク、ブルーレイディスクなどの記録メディアから供給されてもよい。情報処理装置１０は、取得したコンテンツデータから、コンテンツに関連するメタデータを階層的に生成するメタデータ生成装置として機能する。情報処理装置１０は、従来にない新しいメタデータを抽出する機能ももつ。情報処理装置１０は、コンテンツデータを大容量の補助記憶装置に一旦記憶し、このコンテンツデータからメタデータを生成すると、コンテンツデータとは別にメタデータファイルを補助記憶装置に記憶する。生成されたメタデータは、コンテンツデータの検索などに利用される。 In the present embodiment, the information processing apparatus 10 has a function of receiving and reproducing content data such as a music file and a video file from a content providing server connected to an external network. The content data may be file format data or streaming format data. As content data, for example, a review file of a music album by a critic may be included. The content data may be supplied from a recording medium such as an optical disc, a magneto-optical disc, or a Blu-ray disc instead of via a network. The information processing apparatus 10 functions as a metadata generation apparatus that hierarchically generates metadata related to content from the acquired content data. The information processing apparatus 10 also has a function of extracting new metadata that has not existed before. When the information processing apparatus 10 temporarily stores content data in a large-capacity auxiliary storage device and generates metadata from the content data, the information processing apparatus 10 stores a metadata file in the auxiliary storage device separately from the content data. The generated metadata is used for searching content data.

図２は、本発明の実施例にかかるコンテンツ配信システムを示す。コンテンツ配信システム２では、インターネットなどのネットワーク１６を介して、ユーザ端末である情報処理装置１０と、コンテンツ提供者が保有する端末であるコンテンツ提供サーバ１８ａ、１８ｂ、１８ｃ（以後、総称して「コンテンツ提供サーバ１８」とよぶ）とが通信可能に接続される。コンテンツ提供者は、コンテンツデータをコンテンツ提供サーバ１８に保存して、情報処理装置１０が、コンテンツデータをコンテンツ提供サーバ１８からダウンロードできるようにする。 FIG. 2 shows a content distribution system according to an embodiment of the present invention. In the content distribution system 2, the information processing apparatus 10 that is a user terminal and the content providing servers 18 a, 18 b, and 18 c that are terminals owned by the content provider (hereinafter collectively “content”) via the network 16 such as the Internet. And a providing server 18 ”) that are communicably connected. The content provider stores the content data in the content providing server 18 so that the information processing apparatus 10 can download the content data from the content providing server 18.

情報処理装置１０は、ユーザから操作されることにより、所望のコンテンツデータを保持するコンテンツ提供サーバ１８にアクセスし、そのコンテンツデータをダウンロードする。コンテンツ提供サーバ１８は、たとえば複数曲が入った音楽アルバムのデジタルコンテンツを保持しており、ユーザは、料金を支払うことで、所望のアルバムファイルを取得できる。情報処理装置１０は、コンテンツ提供サーバ１８からダウンロードしたコンテンツデータを、補助記憶装置の所定の領域に格納する。 When operated by the user, the information processing apparatus 10 accesses the content providing server 18 that holds desired content data, and downloads the content data. The content providing server 18 holds, for example, digital contents of a music album containing a plurality of songs, and the user can acquire a desired album file by paying a fee. The information processing apparatus 10 stores the content data downloaded from the content providing server 18 in a predetermined area of the auxiliary storage device.

図３は、情報処理装置１０の機能ブロック図を示す。情報処理装置１０は、電源ボタン２０、ＬＥＤ２２、システムコントローラ２４、デバイスコントローラ３０、メディアドライブ３２、ハードディスクドライブ３４、スイッチ３６、無線インタフェース３８、メインコントローラ１００、メインメモリ１０２および出力処理部２００を有して構成される。 FIG. 3 shows a functional block diagram of the information processing apparatus 10. The information processing apparatus 10 includes a power button 20, an LED 22, a system controller 24, a device controller 30, a media drive 32, a hard disk drive 34, a switch 36, a wireless interface 38, a main controller 100, a main memory 102, and an output processing unit 200. Configured.

電源ボタン２０は、ユーザからの操作入力が行われる入力部であって、情報処理装置１０への電源供給をオンまたはオフするために操作される。電源ボタン２０は押下ボタンであってよく、押下されることで電源のオンまたはオフが制御されてもよい。なお電源ボタン２０は、タッチセンサなど、ユーザが電源のオンオフを行える他の構造をとってもよい。ＬＥＤ２２は、電源のオンまたはオフの状態を点灯表示する。システムコントローラ２４は、電源ボタン２０の押下状態または非押下状態を検出し、電源オフの状態から押下状態への状態遷移を検出すると、メインコントローラ１００を起動し、またＬＥＤ２２を点灯制御する。情報処理装置１０に電源ケーブルが差し込まれている場合、システムコントローラ２４は、電源オフの状態であってもスタンバイモードを維持して、電源ボタン２０の押下を監視する。 The power button 20 is an input unit where an operation input from a user is performed, and is operated to turn on or off the power supply to the information processing apparatus 10. The power button 20 may be a push button, and the power on or off may be controlled by being pressed. The power button 20 may have another structure such as a touch sensor that allows the user to turn on / off the power. The LED 22 illuminates and displays the power on / off state. The system controller 24 detects whether the power button 20 is pressed or not, and when detecting a state transition from the power-off state to the pressed state, activates the main controller 100 and controls the lighting of the LED 22. When the power cable is inserted into the information processing apparatus 10, the system controller 24 maintains the standby mode even when the power is off, and monitors the pressing of the power button 20.

デバイスコントローラ３０は、サウスブリッジのようにデバイス間の情報の受け渡しを実行するＬＳＩ（Large-Scale Integrated Circuit）として構成される。図示のように、デバイスコントローラ３０には、システムコントローラ２４、メディアドライブ３２、ハードディスクドライブ３４、スイッチ３６およびメインコントローラ１００などのデバイスが接続される。デバイスコントローラ３０は、それぞれのデバイスの電気特性の違いやデータ転送速度の差を吸収し、データ転送のタイミングを制御する。 The device controller 30 is configured as an LSI (Large-Scale Integrated Circuit) that exchanges information between devices like a south bridge. As illustrated, devices such as a system controller 24, a media drive 32, a hard disk drive 34, a switch 36, and the main controller 100 are connected to the device controller 30. The device controller 30 absorbs the difference in electrical characteristics of each device and the difference in data transfer speed, and controls the timing of data transfer.

メディアドライブ３２は、アプリケーションデータを記録したメディア５０を装着して駆動し、メディア５０からアプリケーションデータを読み出すドライブ装置である。メディア５０は、光ディスクや光磁気ディスク、ブルーレイディスクなどの読出専用の記録メディアであってよい。 The media drive 32 is a drive device that loads and drives a medium 50 that records application data, and reads application data from the medium 50. The medium 50 may be a read-only recording medium such as an optical disc, a magneto-optical disc, or a Blu-ray disc.

ハードディスクドライブ３４は、内蔵ハードディスクを駆動し、磁気ヘッドを用いてデータの書込／読出を行う補助記憶装置である。コンテンツ提供サーバ１８からダウンロードされるコンテンツデータおよびメディア５０から供給されるコンテンツデータは、ハードディスクドライブ３４に格納される。スイッチ３６は、イーサネットスイッチ（イーサネットは登録商標）であって、外部の機器と有線または無線で接続して、情報の送受信を行うデバイスである。本実施例では、スイッチ３６にケーブル１４が差し込まれ、ネットワーク１６に通信可能に接続している。さらにスイッチ３６は無線インタフェース３８に接続し、無線インタフェース３８は、Bluetooth（登録商標）プロトコルやIEEE802.11プロトコルなどの通信プロトコルで無線通信機能をもつ無線コントローラ４０と接続する。無線コントローラ４０は、ユーザからの操作入力が行われる入力部として機能する。 The hard disk drive 34 is an auxiliary storage device that drives a built-in hard disk and writes / reads data using a magnetic head. Content data downloaded from the content providing server 18 and content data supplied from the medium 50 are stored in the hard disk drive 34. The switch 36 is an Ethernet switch (Ethernet is a registered trademark), and is a device that transmits and receives information by connecting to an external device in a wired or wireless manner. In the present embodiment, the cable 14 is inserted into the switch 36 and is communicably connected to the network 16. Further, the switch 36 is connected to a wireless interface 38, and the wireless interface 38 is connected to a wireless controller 40 having a wireless communication function using a communication protocol such as Bluetooth (registered trademark) protocol or IEEE802.11 protocol. The wireless controller 40 functions as an input unit where operation input from the user is performed.

メインコントローラ１００は、マルチコアＣＰＵを備え、１つのＣＰＵの中に１つの汎用的なプロセッサコアと、複数のシンプルなプロセッサコアを有する。汎用プロセッサコアをＰＰＵ（Power Processing Unit）と呼び、残りのプロセッサコアをＳＰＵ（Synergistic-Processing Unit）と呼ぶ。 The main controller 100 includes a multi-core CPU, and includes one general-purpose processor core and a plurality of simple processor cores in one CPU. The general-purpose processor core is called a PPU (Power Processing Unit), and the remaining processor cores are called a SPU (Synergistic-Processing Unit).

メインコントローラ１００は、主記憶装置であるメインメモリ１０２に接続するメモリコントローラを備える。ＰＰＵはレジスタを有し、演算実行主体としてメインプロセッサを備えて、各アプリケーションにおける基本処理単位としてのタスクを各ＳＰＵに効率的に割り当てる。なお、ＰＰＵ自身がタスクを実行してもよい。ＳＰＵはレジスタを有し、演算実行主体としてのサブプロセッサとローカルな記憶領域としてのローカルメモリ（専用ＲＡＭ）を備える。ＳＰＵは制御ユニットとして専用のＤＭＡ（Direct Memory Access）コントローラをもち、メインメモリ１０２とローカルメモリの間のデータ転送を行うことで、データを高速にストリーム処理でき、また出力処理部２００に内蔵されるフレームメモリとローカルメモリの間で高速なデータ転送を実現できる。 The main controller 100 includes a memory controller connected to a main memory 102 that is a main storage device. The PPU has a register, has a main processor as an operation execution subject, and efficiently assigns a task as a basic processing unit in each application to each SPU. Note that the PPU itself may execute the task. The SPU has a register, and includes a sub-processor as an operation execution subject and a local memory (dedicated RAM) as a local storage area. The SPU has a dedicated DMA (Direct Memory Access) controller as a control unit, and can transfer data at high speed by transferring data between the main memory 102 and the local memory, and is built in the output processing unit 200. High-speed data transfer can be realized between the frame memory and the local memory.

出力処理部２００は、表示機器１２に接続されて、アプリケーションの処理結果である映像信号および音声信号を出力する。出力処理部２００は、画像処理機能を実現するＧＰＵ（Graphics Processing Unit）を備える。ＧＰＵは、ＨＤＭＩ（High Definition Multimedia Interface）を採用し、アナログを介さずに、映像信号をデジタル出力できる。 The output processing unit 200 is connected to the display device 12 and outputs a video signal and an audio signal that are processing results of the application. The output processing unit 200 includes a GPU (Graphics Processing Unit) that realizes an image processing function. The GPU employs HDMI (High Definition Multimedia Interface) and can digitally output a video signal without using analog.

図４は、メタデータ生成処理を実行するメインコントローラ１００の内部構成を示す。メインコントローラ１００は、低レベルメタデータ抽出部１２０、中レベルメタデータ抽出部１４０および高レベルメタデータ抽出部１６０を備える。図４において、さまざまな処理を行う機能ブロックとして記載される各要素は、ハードウェア的には、ＣＰＵ（Central Processing Unit）、メモリ、その他のＬＳＩで構成することができ、ソフトウェア的には、メモリにロードされたプログラムなどによって実現される。既述したように、メインコントローラ１００には１つのＰＰＵと複数のＳＰＵとが設けられており、ＰＰＵおよびＳＰＵがそれぞれ単独または協同して、各機能ブロックを構成できる。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 FIG. 4 shows an internal configuration of the main controller 100 that executes the metadata generation process. The main controller 100 includes a low level metadata extraction unit 120, a medium level metadata extraction unit 140, and a high level metadata extraction unit 160. In FIG. 4, each element described as a functional block for performing various processes can be configured by a CPU (Central Processing Unit), a memory, and other LSIs in terms of hardware. This is realized by a program loaded on the computer. As described above, the main controller 100 is provided with one PPU and a plurality of SPUs, and each of the PPUs and SPUs can be configured individually or in cooperation with each other. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

低レベルメタデータ抽出部１２０、中レベルメタデータ抽出部１４０および高レベルメタデータ抽出部１６０は、コンテンツデータ１１０ａ、１１０ｂ（以下、総称する場合は「コンテンツデータ１１０」と呼ぶ）から階層的なメタデータを抽出する。低レベルメタデータ抽出部１２０は、コンテンツデータ１１０に含まれるデータを直接利用して、低レベル（１次レベル）のメタデータを抽出する。中レベルメタデータ抽出部１４０は、低レベルメタデータ抽出部１２０で抽出された低レベルメタデータを利用して、中レベル（２次レベル）のメタデータを抽出する。高レベルメタデータ抽出部１６０は、中レベルメタデータ抽出部１４０で抽出された中レベルメタデータを利用して、高レベル（３次レベル）のメタデータを抽出する。このように、メインコントローラ１００は、段階的（階層的）にメタデータを生成し、コンテンツデータの検索などに有用なメタデータを高精度に生成する。本実施例におけるメタデータの階層化は、信号処理の観点からいえば、各階層におけるメタデータのレベルが揃うように、解析処理、認識処理、理解処理の順に実行される。なお、解析処理から認識処理までを１階層、すなわち１次レベルと設定してもよく、この場合は、メタデータが２階層で抽出されることになる。 The low-level metadata extraction unit 120, the medium-level metadata extraction unit 140, and the high-level metadata extraction unit 160 are hierarchical metadata from the content data 110a and 110b (hereinafter collectively referred to as “content data 110”). Extract data. The low-level metadata extraction unit 120 extracts low-level (primary level) metadata by directly using data included in the content data 110. The intermediate level metadata extraction unit 140 extracts intermediate level (secondary level) metadata using the low level metadata extracted by the low level metadata extraction unit 120. The high level metadata extraction unit 160 uses the medium level metadata extracted by the medium level metadata extraction unit 140 to extract high level (third level) metadata. As described above, the main controller 100 generates metadata step by step (hierarchical), and generates metadata useful for searching content data with high accuracy. From the viewpoint of signal processing, the hierarchization of metadata in this embodiment is executed in the order of analysis processing, recognition processing, and understanding processing so that the level of metadata in each layer is uniform. Note that the process from the analysis process to the recognition process may be set as one level, that is, the primary level. In this case, metadata is extracted in two levels.

まず低レベルメタデータ抽出部１２０は、解析処理を実行し、コンテンツデータを物理量として解析した低レベルのメタデータを抽出する。低レベルメタデータ抽出部１２０は、信号処理を実行して、複数のコンテンツデータ１１０ａ、１１０ｂから、低レベルメタデータを抽出してもよい。ここで、複数のコンテンツデータ１１０のうち、１つのコンテンツデータ１１０ａが、メタデータの作成対象として存在し、他のコンテンツデータ１１０ｂなどが、コンテンツデータ１１０ａ用のメタデータを作成するための補助的なデータとして利用されてもよい。たとえば、コンテンツデータ１１０ａが、音楽データを圧縮したＭＰ３ファイルであり、一方、コンテンツデータ１１０ｂが、その音楽に対する評論のテキストファイルである場合、コンテンツデータ１１０ｂは、コンテンツデータ１１０ａの音楽データに対するメタデータを作成するために利用されてもよい。なお、この場合に、コンテンツデータ１１０ａが、コンテンツデータ１１０ｂのメタデータを作成するために利用されることも可能である。 First, the low-level metadata extraction unit 120 executes an analysis process, and extracts low-level metadata obtained by analyzing content data as a physical quantity. The low level metadata extraction unit 120 may extract low level metadata from the plurality of content data 110a and 110b by performing signal processing. Here, among the plurality of content data 110, one content data 110a exists as a creation target of metadata, and other content data 110b and the like are auxiliary for creating metadata for the content data 110a. It may be used as data. For example, when the content data 110a is an MP3 file obtained by compressing music data, and the content data 110b is a review text file for the music, the content data 110b includes metadata about the music data of the content data 110a. It may be used to create. In this case, the content data 110a can be used to create metadata of the content data 110b.

ＭＰ３ファイルは、圧縮された音楽データに加えて、ＩＤ３タグと呼ばれる規格により、曲名やアーティスト名などのテキストデータを含み、さらに、音楽アルバムのジャケット写真の画像データを含むこともある。本実施例の情報処理装置１０において、低レベルメタデータ抽出部１２０は、ＭＰ３ファイルから、３種類のデータ、すなわち音楽データ、テキストデータおよび画像データを取得し、それぞれのデータから低レベルメタデータを抽出する。このとき低レベルメタデータ抽出部１２０は、ＭＰ３ファイルに含まれる１種類のデータから、複数種類の低レベルメタデータを抽出してもよい。たとえば低レベルメタデータ抽出部１２０は、圧縮された音楽データから、複数種類の低レベルメタデータを抽出してもよく、またテキストデータから、複数種類の低レベルメタデータを抽出してもよい。 In addition to compressed music data, the MP3 file includes text data such as a song name and an artist name according to a standard called an ID3 tag, and may further include image data of a jacket photo of a music album. In the information processing apparatus 10 according to the present embodiment, the low-level metadata extraction unit 120 acquires three types of data, that is, music data, text data, and image data, from the MP3 file, and obtains low-level metadata from each data. Extract. At this time, the low level metadata extraction unit 120 may extract a plurality of types of low level metadata from one type of data included in the MP3 file. For example, the low-level metadata extraction unit 120 may extract a plurality of types of low-level metadata from compressed music data, or may extract a plurality of types of low-level metadata from text data.

低レベルメタデータ抽出部１２０は、音楽データに対して高速フーリエ変換などの信号処理を施すことで、時間−音程解析を実行することができ、物理量としての音程解析結果を低レベルのメタデータとして抽出できる。また同様に音楽データのビート特徴量を解析することも可能であり、これにより音楽中の音の変化位置を解析できる。このビート特徴量も、低レベルのメタデータとして抽出される。また、低レベルメタデータ抽出部１２０は、テキストデータを形態素解析することで、低レベルのメタデータを抽出でき、さらに画像データを画像解析することで、低レベルのメタデータを抽出できる。 The low-level metadata extraction unit 120 can perform time-pitch analysis by performing signal processing such as fast Fourier transform on the music data, and the pitch analysis result as a physical quantity is converted into low-level metadata. Can be extracted. Similarly, it is possible to analyze the beat feature amount of the music data, thereby analyzing the change position of the sound in the music. This beat feature amount is also extracted as low-level metadata. The low-level metadata extraction unit 120 can extract low-level metadata by performing morphological analysis on text data, and can extract low-level metadata by performing image analysis on the image data.

次に、中レベルメタデータ抽出部１４０は、認識処理を実行し、低レベルのメタデータをユニークに変換することで、中レベルのメタデータを抽出する。たとえば、音程解析結果をある単位のセグメントごとに分割して認識処理することで、メインの楽器や声などの高さ、大きさなどの音楽特徴量が中レベルのメタデータとして抽出される。また、ビート特徴量から、より高精度なビート情報が中レベルのメタデータとして抽出される。解析処理において、１つの音楽データから、音程解析結果とビート特徴量とが低レベルメタデータとして抽出されている場合、これらを相互に利用して、より高精度な中レベルの音楽特徴量および／またはビート情報を抽出することも可能である。また、認識処理では、低レベルメタデータ抽出部１２０における形態素解析の結果から、キーワードや、文章中の構文を中レベルメタデータとして抽出することができ、また、中レベルメタデータ抽出部１４０における画像解析の結果から、画像中に含まれる人の数や、明るさ（雰囲気）などの画像特徴量を中レベルメタデータとして抽出することもできる。 Next, the intermediate level metadata extraction unit 140 executes a recognition process, and extracts the intermediate level metadata by uniquely converting the low level metadata. For example, by dividing the pitch analysis result into segments of a certain unit and performing recognition processing, music features such as the height and size of the main musical instrument and voice are extracted as medium level metadata. Further, more accurate beat information is extracted as intermediate level metadata from the beat feature amount. In the analysis process, when the pitch analysis result and the beat feature value are extracted as low-level metadata from one music data, they are mutually used to obtain a more accurate medium-level music feature value and / or Alternatively, beat information can be extracted. Further, in the recognition process, keywords and syntax in the text can be extracted as medium level metadata from the result of morphological analysis in the low level metadata extraction unit 120, and an image in the medium level metadata extraction unit 140 can be extracted. From the result of the analysis, the number of people included in the image and image feature quantities such as brightness (atmosphere) can be extracted as medium level metadata.

高レベルメタデータ抽出部１６０は、理解処理を実行し、中レベルのメタデータから、クラス化した高レベルのメタデータ、換言するとシンボリック化した高レベルのメタデータを抽出する。たとえば、音楽特徴量やビート情報などから、その音楽のジャンルや、楽器の有無などが高レベルのメタデータとして抽出される。また、抽出したキーワードや構文から、楽曲やキーワードの関係を高レベルメタデータとして抽出することができ、また、中レベルの画像特徴量から特定した人の顔などを高レベルメタデータとして抽出することができる。 The high-level metadata extraction unit 160 executes an understanding process, and extracts high-level metadata classified into classes, in other words, high-level metadata symbolized from medium-level metadata. For example, the genre of the music, the presence / absence of an instrument, and the like are extracted as high-level metadata from the music feature amount and beat information. Also, it is possible to extract the relationship between music and keywords as high-level metadata from the extracted keywords and syntax, and to extract human faces, etc., identified from medium-level image feature quantities as high-level metadata. Can do.

メインコントローラ１００において、コンテンツデータ１１０ａ、１１０ｂに含まれる複数種類のデータから、複数種類の低レベルのメタデータが抽出され、また、これらから複数種類の中レベルのメタデータが生成される。高レベルメタデータ抽出部１６０は、異なる種類の中レベルのメタデータから、たとえば高レベルの音楽特徴量を抽出することも可能である。このように、音楽データ、テキストデータおよび／または画像データなど、異なる種類のデータを起原として生成された中レベルのメタデータを統合して処理し、１種類の高レベルのメタデータを抽出することで、高レベルメタデータの生成精度を高めることが可能となる。 In the main controller 100, a plurality of types of low-level metadata are extracted from a plurality of types of data included in the content data 110a and 110b, and a plurality of types of medium-level metadata are generated therefrom. The high-level metadata extraction unit 160 can also extract, for example, high-level music feature amounts from different types of medium-level metadata. As described above, medium-level metadata generated from different types of data such as music data, text data, and / or image data is integrated and processed to extract one type of high-level metadata. This makes it possible to increase the generation accuracy of high-level metadata.

図５は、図４に示すメインコントローラ１００の詳細を示す。図５において、メインコントローラ１００は、ＭＰ３ファイルであるコンテンツデータ１１０ａおよび音楽評論のテキストデータであるコンテンツデータ１１０ｂから、階層的にメタデータを生成する。ＭＰ３ファイルは、圧縮された音楽データ１１２ａ、ＪＰＥＧ記録されたジャケット写真画像データ１１２ｂ、およびテキストデータで記述された音楽書誌データ１１２ｃを含む。なお、ジャケット写真画像データ１１２ｂおよび／または音楽書誌データ１１２ｃについては、ネットワーク１６上のコンテンツ提供サーバ１８から、ＭＰ３ファイルとは別にダウンロードされたファイルであってもよい。 FIG. 5 shows details of the main controller 100 shown in FIG. In FIG. 5, the main controller 100 generates metadata hierarchically from content data 110 a that is an MP3 file and content data 110 b that is text data of music reviews. The MP3 file includes compressed music data 112a, JPEG-recorded jacket photo image data 112b, and music bibliographic data 112c described in text data. The jacket photo image data 112b and / or the music bibliographic data 112c may be a file downloaded from the content providing server 18 on the network 16 separately from the MP3 file.

低レベルメタデータ抽出部１２０は、ビート特徴量解析部１２２、時間−音程解析部１２４、画像解析部１２６、形態素解析部１２８および形態素解析部１３０を備え、コンテンツデータを物理量として解析した低レベルのメタデータを抽出する。ビート特徴量解析部１２２および時間−音程解析部１２４は、情報圧縮された音楽データ１１２ａをデコードしてモノラル処理した音楽信号から、低レベルメタデータを抽出する。画像解析部１２６は、ジャケット写真画像データ１１２ｂから低レベルメタデータを抽出し、同様に、形態素解析部１２８および形態素解析部１３０は、それぞれ音楽書誌データ１１２ｃおよび音楽評論データ１１２ｄから低レベルメタデータを抽出する。低レベルメタデータ抽出部１２０において、ビート特徴量解析部１２２および時間−音程解析部１２４が、音楽データ１１２ａから、それぞれメタデータを抽出する。これにより、コンテンツデータ１１０ａに含まれる音楽データ１１２ａから、複数のメタデータを抽出することになり、多面的な低レベルメタデータの取得を実現できるとともに、後段の中レベルおよび高レベルのメタデータの抽出精度を高めることができる。また、低レベルメタデータ抽出部１２０において、それぞれ種類の異なる音楽データ１１２ａ、ジャケット写真画像データ１１２ｂ、音楽書誌データ１１２ｃおよび音楽評論データ１１２ｄから、複数の低レベルのメタデータを抽出することで、後段の中レベルおよび高レベルのメタデータの抽出精度を高めることもできる。 The low-level metadata extraction unit 120 includes a beat feature amount analysis unit 122, a time-pitch analysis unit 124, an image analysis unit 126, a morpheme analysis unit 128, and a morpheme analysis unit 130. Extract metadata. The beat feature amount analysis unit 122 and the time-pitch analysis unit 124 extract low-level metadata from a music signal obtained by decoding the monaural processed music data 112a. The image analysis unit 126 extracts low-level metadata from the jacket photo image data 112b. Similarly, the morpheme analysis unit 128 and the morpheme analysis unit 130 extract low-level metadata from the music bibliographic data 112c and the music review data 112d, respectively. Extract. In the low level metadata extraction unit 120, the beat feature amount analysis unit 122 and the time-pitch analysis unit 124 extract metadata from the music data 112a, respectively. As a result, a plurality of metadata is extracted from the music data 112a included in the content data 110a, and multifaceted low-level metadata can be obtained. Extraction accuracy can be increased. Further, the low level metadata extraction unit 120 extracts a plurality of low level metadata from the music data 112a, the jacket photo image data 112b, the music bibliographic data 112c, and the music review data 112d, which are different from each other. It is also possible to improve the accuracy of extracting medium and high level metadata.

ビート特徴量解析部１２２は、情報圧縮された音楽データ１１２ａをデコードしてモノラル処理した音楽信号を、楽器特徴量に応じた帯域に音楽信号を分割する。次に、ビート信号に対応する帯域を選択して平滑化することにより、音のアタックタイムとリリースタイムの候補位置を求め、各帯域の自己相関関数から基本周期を求める。これにより、ビート特徴量解析部１２２は、ビート特徴量を抽出する。 The beat feature value analysis unit 122 divides the music signal obtained by decoding the monaural processed music data 112a into a band corresponding to the instrument feature value. Next, by selecting and smoothing the band corresponding to the beat signal, the sound attack time and release time candidate positions are obtained, and the fundamental period is obtained from the autocorrelation function of each band. Thereby, the beat feature amount analysis unit 122 extracts a beat feature amount.

時間−音程解析部１２４は、情報圧縮された音楽データ１１２ａをデコードしてモノラル処理した音楽信号を、短時間のインターバルで周波数分析し、時間−周波数の特徴量を求める。この周波数分析には、たとえば高速フーリエ変換（ＦＦＴ）器、またはバンドパスフィルタの集合体であるフィルタバンクが利用されてもよい。時間−音程解析部１２４は、４４．１ｋＨｚのサンプリング周波数のデジタル信号を音程に対応した帯域で分割した後、１〜２０ｍｓのインターバルで標本化して、時間−周波数（音程）特徴量を抽出する。 The time-pitch analysis unit 124 performs frequency analysis on a music signal obtained by decoding and monaurally processing the information-compressed music data 112a at a short time interval to obtain a time-frequency feature quantity. For this frequency analysis, for example, a Fast Fourier Transform (FFT) unit or a filter bank which is an aggregate of bandpass filters may be used. The time-pitch analysis unit 124 divides a digital signal having a sampling frequency of 44.1 kHz by a band corresponding to the pitch, samples the digital signal at intervals of 1 to 20 ms, and extracts a time-frequency (pitch) feature quantity.

画像解析部１２６は、ジャケット写真画像データ１１２ｂから、オブジェクトセグメンテーション処理を実行し、ジャケット写真中のオブジェクトの解析を行う。また画像解析部１２６は、色解析処理を行い、色相ヒストグラムを求めてもよい。従来、ジャケット写真画像データ１１２ｂの画像解析結果をメタデータとして利用するものはないが、本実施例の情報処理システム１では、これを新たに低レベルのメタデータとして取り扱うことで、メタデータの種類を増やすことができ、中レベル以降のメタデータ抽出処理における選択肢を広げることができる。 The image analysis unit 126 executes an object segmentation process from the jacket photo image data 112b, and analyzes an object in the jacket photo. The image analysis unit 126 may perform a color analysis process to obtain a hue histogram. Conventionally, there is nothing that uses the image analysis result of the jacket photo image data 112b as metadata. However, in the information processing system 1 of the present embodiment, this is newly handled as low-level metadata, so that the type of metadata And the choices in the metadata extraction processing at the intermediate level and later can be expanded.

形態素解析部１２８は、音楽書誌データ１１２ｃから、書誌データを抽出する。たとえば書誌データには、アルバムタイトル、アーティスト名、楽曲名、作詞者、作曲者、レーベル名、ジャンルなどが含まれてもよい。形態素解析部１３０は、音楽評論データ１１２ｄを形態素解析して、形態素を品詞に分類する。ここで音楽評論データ１１２ｄは、音楽評論家により、ＭＰ３ファイルに含まれる音楽アルバムを評論、批評したテキストデータであるものとする。なお音楽評論データ１１２ｄを処理する形態素解析部１３０は複数設けられてもよい。その場合、それぞれの形態素解析部は、解析結果を異ならせるように、異なる辞書などを利用して形態素解析を実行してもよい。これにより、様々な形態素解析を実行することができ、後段のキーワード抽出部１５０が、異なる解析結果をもとに、バリエーションに富んだキーワードを抽出することが可能となる。なお同様に、音楽書誌データ１１２ｃを処理する形態素解析部１２８が複数設けられてもよい。 The morphological analysis unit 128 extracts bibliographic data from the music bibliographic data 112c. For example, the bibliographic data may include an album title, artist name, song name, songwriter, composer, label name, genre, and the like. The morpheme analysis unit 130 performs morpheme analysis on the music review data 112d and classifies the morpheme into parts of speech. Here, it is assumed that the music review data 112d is text data in which a music reviewer reviews and criticizes a music album included in the MP3 file. A plurality of morpheme analyzers 130 that process the music review data 112d may be provided. In that case, each morpheme analysis unit may perform morpheme analysis using different dictionaries or the like so that the analysis results are different. Accordingly, various morphological analyzes can be performed, and the keyword extraction unit 150 in the subsequent stage can extract keywords rich in variations based on different analysis results. Similarly, a plurality of morphological analysis units 128 for processing the music bibliographic data 112c may be provided.

このように低レベルメタデータ抽出部１２０は、様々なコンテンツデータを物理量として解析した低レベルのメタデータを複数抽出することで、後段の中レベルメタデータ抽出部１４０および高レベルメタデータ抽出部１６０におけるメタデータ抽出処理の選択肢の幅を広げることができるとともに、低レベルメタデータの組合せをダイナミックに変更することも可能とする。 As described above, the low-level metadata extraction unit 120 extracts a plurality of low-level metadata obtained by analyzing various content data as physical quantities, so that the intermediate-level metadata extraction unit 140 and the high-level metadata extraction unit 160 in the subsequent stage are extracted. Can expand the range of metadata extraction processing options, and can also dynamically change the combination of low-level metadata.

中レベルメタデータ抽出部１４０は、ビート情報抽出部１４２、中レベル音楽特徴量抽出部１４４、画像特徴量抽出部１４６、キーワード抽出部１４８およびキーワード抽出部１５０を備え、低レベルメタデータをユニークに変換した中レベルのメタデータを生成する。ビート情報抽出部１４２は、音楽データ１１２ａから抽出されたビート特徴量および時間−音程特徴量から、中レベルのメタデータを抽出する。中レベル音楽特徴量抽出部１４４も同様に、音楽データ１１２ａから抽出されたビート特徴量および時間−音程特徴量から、中レベルのメタデータを抽出する。ビート情報抽出部１４２および中レベル音楽特徴量抽出部１４４は、１つの音楽データ１１２ａから抽出された異なる種類の低レベルメタデータをもとに、それぞれ中レベルのメタデータを生成する。複数種類の低レベルメタデータを利用することで、生成する中レベルメタデータの正確度を高めることができる。画像特徴量抽出部１４６は、画像解析部１２６で解析された画像解析データから、中レベルのメタデータを抽出する。キーワード抽出部１４８は、形態素解析部１２８による形態素解析データから中レベルのメタデータを抽出する。同様にキーワード抽出部１５０は、形態素解析部１３０による形態素解析データから中レベルのメタデータを抽出する。 The intermediate level metadata extraction unit 140 includes a beat information extraction unit 142, an intermediate level music feature amount extraction unit 144, an image feature amount extraction unit 146, a keyword extraction unit 148, and a keyword extraction unit 150, and uniquely sets the low level metadata. Generate converted medium level metadata. The beat information extraction unit 142 extracts medium level metadata from the beat feature value and the time-pitch feature value extracted from the music data 112a. Similarly, the medium level music feature amount extraction unit 144 extracts medium level metadata from the beat feature amount and the time-pitch feature amount extracted from the music data 112a. The beat information extracting unit 142 and the medium level music feature amount extracting unit 144 generate medium level metadata based on different types of low level metadata extracted from one music data 112a. By using a plurality of types of low-level metadata, the accuracy of the generated intermediate-level metadata can be increased. The image feature amount extraction unit 146 extracts medium level metadata from the image analysis data analyzed by the image analysis unit 126. The keyword extraction unit 148 extracts medium level metadata from the morpheme analysis data by the morpheme analysis unit 128. Similarly, the keyword extraction unit 150 extracts medium level metadata from the morpheme analysis data obtained by the morpheme analysis unit 130.

ビート情報抽出部１４２は、ビート特徴量解析部１２２で抽出されたビート特徴量と、時間−音程解析部１２４で抽出された時間−音程特徴量から、音楽信号のビート位置を高精度に抽出する。ビート特徴量として音楽の基本周期が抽出され、また時間−音程特徴量として時間ごとの音程量が抽出されているため、ビート情報抽出部１４２は、抽出された基本周期を、時間ごとの音程量で補正等することで、正確なビート位置を抽出することができる。ビート情報抽出部１４２で抽出されたビート位置は、後段の高レベル音楽特徴量抽出部１６２に供給される。なお、抽出されたビート位置は、異なる音楽同士をつなげるリミックス処理に利用されてもよい。正確なビート位置を抽出することで、なめらかなリミックス処理を実現できる。 The beat information extraction unit 142 extracts the beat position of the music signal with high accuracy from the beat feature amount extracted by the beat feature amount analysis unit 122 and the time-pitch feature amount extracted by the time-pitch analysis unit 124. . Since the basic period of music is extracted as the beat feature amount, and the pitch amount for each time is extracted as the time-pitch feature amount, the beat information extraction unit 142 uses the extracted basic cycle as the pitch amount for each time. The correct beat position can be extracted by performing correction or the like. The beat position extracted by the beat information extraction unit 142 is supplied to the subsequent high-level music feature amount extraction unit 162. The extracted beat position may be used for remix processing for connecting different music. By extracting accurate beat positions, smooth remix processing can be realized.

中レベル音楽特徴量抽出部１４４は、ビート特徴量解析部１２２で抽出されたビート特徴量と、時間−音程解析部１２４で抽出された時間−音程特徴量から、音符情報や、小節ごとの音の強さやテンポ、楽曲構造などの中レベル音楽特徴量を抽出する。従来、これらの中レベル音楽特徴量は、時間−音程解析部１２４により抽出された時間−音楽特徴量のみから導出されていたが、中レベル音楽特徴量抽出部１４４は、ビート特徴量として抽出された音楽の基本周期をさらに利用することで、楽曲構造や、テンポなどの抽出精度を高めることが可能となる。 The medium level music feature amount extraction unit 144 uses the beat feature amount extracted by the beat feature amount analysis unit 122 and the time-pitch feature amount extracted by the time-pitch analysis unit 124 to obtain note information and sound for each measure. Medium-level music features such as music strength, tempo, and music structure. Conventionally, these intermediate level music feature values are derived only from the time-music feature value extracted by the time-pitch analysis unit 124, but the intermediate level music feature value extraction unit 144 is extracted as a beat feature value. By further utilizing the basic period of the music, it is possible to improve the extraction accuracy such as the music structure and tempo.

画像特徴量抽出部１４６は、画像解析部１２６で抽出された画像解析データから、オブジェクトの位置、大きさ、色相などの画像特徴量を抽出する。 The image feature amount extraction unit 146 extracts image feature amounts such as the position, size, and hue of the object from the image analysis data extracted by the image analysis unit 126.

キーワード抽出部１４８は、形態素解析部１２８で抽出された形態素解析データから、キーワードを抽出する。またキーワード抽出部１５０も、形態素解析部１３０で抽出された形態素解析データから、キーワードを抽出する。キーワード抽出部１５０は、たとえば音楽評論に一般に使用される用語をテーブルとして保持している。そのような用語は、「バラード」、「聴きやすい」、「落ち着く」などのキーワードであってもよく、キーワード抽出部１５０は、テーブルに含まれる用語を、形態素解析部１３０で抽出された形態素解析データから抽出する。 The keyword extraction unit 148 extracts keywords from the morpheme analysis data extracted by the morpheme analysis unit 128. The keyword extraction unit 150 also extracts keywords from the morpheme analysis data extracted by the morpheme analysis unit 130. The keyword extraction unit 150 holds, for example, terms commonly used in music reviews as a table. Such terms may be keywords such as “ballad”, “easy to hear”, “settled”, etc., and the keyword extraction unit 150 extracts the terms included in the table from the morphological analysis extracted by the morphological analysis unit 130. Extract from the data.

高レベルメタデータ抽出部１６０は、高レベル音楽特徴量抽出部１６２、顔抽出部１６４および評価抽出部１６６を備える。高レベルメタデータ抽出部１６０は、ビート情報抽出部１４２、中レベル音楽特徴量抽出部１４４、画像特徴量抽出部１４６およびキーワード抽出部１４８で抽出された中レベルメタデータから、高レベルのメタデータを抽出する。顔抽出部１６４は、画像特徴量抽出部１４６で抽出された中レベルメタデータから高レベルのメタデータを抽出する。評価抽出部１６６は、キーワード抽出部１４８およびキーワード抽出部１５０で抽出された中レベルメタデータから高レベルのメタデータを抽出する。 The high level metadata extraction unit 160 includes a high level music feature amount extraction unit 162, a face extraction unit 164, and an evaluation extraction unit 166. The high-level metadata extraction unit 160 uses the high-level metadata from the medium-level metadata extracted by the beat information extraction unit 142, the medium level music feature amount extraction unit 144, the image feature amount extraction unit 146, and the keyword extraction unit 148. To extract. The face extraction unit 164 extracts high-level metadata from the medium level metadata extracted by the image feature amount extraction unit 146. The evaluation extraction unit 166 extracts high-level metadata from the medium level metadata extracted by the keyword extraction unit 148 and the keyword extraction unit 150.

高レベルメタデータ抽出部１６０は、高レベルの音楽特徴量として、音楽のジャンル、楽器音、ムード、音質、速さ、音の良さなどの音楽全体に対する特徴量を高レベルメタデータとして抽出する。また、高レベル特徴量として、全体のエネルギに対するリズム楽器のエネルギの割合であったり、また単位時間あたりの音符数などを抽出することもできる。高レベルメタデータ抽出部１６０は、ビート情報抽出部１４２、中レベル音楽特徴量抽出部１４４、画像特徴量抽出部１４６およびキーワード抽出部１４８で抽出された中レベルメタデータを受け取ることで、高レベル音楽特徴量を高精度に抽出する。 The high-level metadata extraction unit 160 extracts, as high-level metadata, feature quantities for the entire music such as music genre, musical instrument sound, mood, sound quality, speed, and sound quality as high-level music feature quantities. Further, as the high level feature quantity, it is possible to extract the ratio of the energy of the rhythm instrument to the total energy, the number of notes per unit time, and the like. The high level metadata extraction unit 160 receives the medium level metadata extracted by the beat information extraction unit 142, the medium level music feature amount extraction unit 144, the image feature amount extraction unit 146, and the keyword extraction unit 148. Extract music features with high accuracy.

たとえば高レベル音楽特徴量抽出部１６２は、ビート情報抽出部１４２から供給される正確なビート情報および中レベル音楽特徴量抽出部１４４から供給される正確な中レベル音楽特徴量から、楽曲構造やテンポなどを高精度に把握でき、したがって、音楽の速さやムードなどを正確に推定し抽出することができる。また、高レベル音楽特徴量抽出部１６２は、これらの中レベルメタデータから、音楽のジャンルを推定して抽出することもできる。またジャンルについていえば、キーワード抽出部１４８から供給されるキーワードにジャンルが含まれており、推定したジャンルと、キーワードに含まれたジャンルとから、新たに正確なジャンルを決定してもよい。たとえば、キーワードとして「バラード」のジャンルが抽出されていた場合であっても、楽曲構造やテンポなどから、若干激しいロック調の音楽であることが推定された場合には、その音楽のジャンルをロックとして高レベル音楽特徴量を抽出してもよい。なお、この場合に、バラードとロックの間のジャンル、たとえばポップをジャンルとして抽出してもよい。このように、書誌データと、実際の音楽を解析、認識処理した結果を突き合わせることで、高精度なメタデータを抽出することが可能となる。また、画像特徴量抽出部１４６から供給される画像特徴量により、たとえばジャケット写真に明るい色が使用されている場合には、高レベル音楽特徴量抽出部１６２が、楽しい音楽であろうことを推測し、他の要素、たとえばテンポや使用される楽器などから、明るい曲調の音楽であるとして高レベル音楽特徴量を抽出してもよい。従来では、ジャケット写真の色調や、それに含まれるオブジェクトの特徴量などを、メタデータとして利用する試みはなされていなかった。一方、本実施例では、高レベル音楽特徴量抽出部１６２が、画像特徴量抽出部１４６から供給される画像特徴量も含めて高レベルのメタデータを生成することで、新しいメタデータを抽出できるだけでなく、メタデータの抽出精度を高めることが可能となる。 For example, the high-level music feature amount extraction unit 162 uses the accurate beat information supplied from the beat information extraction unit 142 and the accurate medium-level music feature amount supplied from the middle-level music feature amount extraction unit 144 to determine the music structure and tempo. Therefore, it is possible to accurately estimate and extract the speed and mood of music. The high-level music feature amount extraction unit 162 can also estimate and extract music genres from these medium-level metadata. As for the genre, a genre is included in the keyword supplied from the keyword extraction unit 148, and a new accurate genre may be determined from the estimated genre and the genre included in the keyword. For example, even if the genre of “ballad” is extracted as a keyword, if the music structure or tempo is estimated to be slightly intense rock music, the genre of the music is locked. A high-level music feature amount may be extracted. In this case, a genre between ballad and rock, for example, pop may be extracted as a genre. In this way, it is possible to extract highly accurate metadata by matching the bibliographic data with the result of analyzing and recognizing actual music. Further, according to the image feature amount supplied from the image feature amount extraction unit 146, for example, when a bright color is used for a jacket photo, the high-level music feature amount extraction unit 162 estimates that the music will be fun. However, the high-level music feature value may be extracted from other elements such as the tempo and the musical instrument used, assuming that the music has a bright tone. Conventionally, no attempt has been made to use the color tone of a jacket photo or the feature amount of an object included in the jacket photo as metadata. On the other hand, in this embodiment, the high-level music feature amount extraction unit 162 can extract new metadata by generating high-level metadata including the image feature amount supplied from the image feature amount extraction unit 146. In addition, it is possible to increase the accuracy of metadata extraction.

顔抽出部１６４は、画像特徴量から顔を抽出する。たとえば、この抽出結果は、たとえば顔認識処理などにより、抽出された顔が誰であるかを判別するアプリケーションで利用されてもよい。 The face extraction unit 164 extracts a face from the image feature amount. For example, the extraction result may be used in an application that determines who the extracted face is by, for example, face recognition processing.

評価抽出部１６６は、キーワード抽出部１４８およびキーワード抽出部１５０で抽出されたキーワードを関連づける。たとえば音楽評論データ１１２ｄに、同一アーティストによるアルバムリストなどが含まれている場合は、音楽書誌データ１１２ｃにおける書誌データとアルバムリストが紐付けられる処理が行われてもよい。 The evaluation extraction unit 166 associates the keywords extracted by the keyword extraction unit 148 and the keyword extraction unit 150. For example, when the music review data 112d includes an album list by the same artist, a process in which the bibliographic data and the album list in the music bibliographic data 112c are linked may be performed.

以上、本発明を実施例をもとに説明した。この実施例は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 In the above, this invention was demonstrated based on the Example. This embodiment is an exemplification, and it will be understood by those skilled in the art that various modifications can be made to the combination of each component and each processing process, and such modifications are also within the scope of the present invention. .

実施例ではＭＰ３の音楽ファイルのメタデータを生成する例について説明したが、コンテンツデータは、これに限らず、単なる音声ファイルであってもよく、また動画データであってもよく、さらにいえば、メタデータを作成する必要がある対象全てが含まれてもよい。 In the embodiment, the example of generating the metadata of the MP3 music file has been described. However, the content data is not limited to this, and may be a simple audio file or video data. All targets for which metadata needs to be created may be included.

実施例では、情報処理装置１０がメタデータ生成装置として機能する例について説明したが、たとえばコンテンツ提供サーバ１８が、メタデータ生成装置として機能してもよい。この場合、コンテンツ提供サーバ１８は、コンテンツデータとともに、コンテンツデータから生成したメタデータを情報処理装置１０に配信してもよい。 In the embodiment, the example in which the information processing apparatus 10 functions as a metadata generation apparatus has been described. However, for example, the content providing server 18 may function as a metadata generation apparatus. In this case, the content providing server 18 may distribute the metadata generated from the content data to the information processing apparatus 10 together with the content data.

本発明の実施例にかかる情報処理システムの使用環境を示す図である。It is a figure which shows the use environment of the information processing system concerning the Example of this invention. 本発明の実施例にかかるコンテンツ配信システムを示す図である。It is a figure which shows the content delivery system concerning the Example of this invention. 情報処理装置の機能ブロック図を示す図である。It is a figure which shows the functional block diagram of information processing apparatus. メタデータ生成処理を実行するメインコントローラの内部構成を示す図である。It is a figure which shows the internal structure of the main controller which performs a metadata production | generation process. 図４に示すメインコントローラの詳細を示す図である。It is a figure which shows the detail of the main controller shown in FIG.

Explanation of symbols

１・・・情報処理システム、１０・・・情報処理装置、１２・・・表示機器、１００・・・メインコントローラ、１１０・・・コンテンツデータ、１１２ａ・・・音楽データ、１１２ｂ・・・ジャケット写真画像データ、１１２ｃ・・・音楽書誌データ、１１２ｄ・・・音楽評論データ、１２０・・・低レベルメタデータ抽出部、１２２・・・ビート特徴量解析部、１２４・・・時間−音程解析部、１２６・・・画像解析部、１２８・・・形態素解析部、１３０・・・形態素解析部、１４０・・・中レベルメタデータ抽出部、１４２・・・ビート情報抽出部、１４４・・・中レベル音楽特徴量抽出部、１４６・・・画像特徴量抽出部、１４８・・・キーワード抽出部、１５０・・・キーワード抽出部、１６０・・・高レベルメタデータ抽出部、１６２・・・高レベル音楽特徴量抽出部、１６４・・・顔抽出部、１６６・・・評価抽出部。 DESCRIPTION OF SYMBOLS 1 ... Information processing system, 10 ... Information processing apparatus, 12 ... Display apparatus, 100 ... Main controller, 110 ... Content data, 112a ... Music data, 112b ... Jacket picture Image data, 112c ... Music bibliographic data, 112d ... Music review data, 120 ... Low level metadata extraction unit, 122 ... Beat feature amount analysis unit, 124 ... Time-pitch analysis unit, 126 ... Image analysis unit, 128 ... Morphological analysis unit, 130 ... Morphological analysis unit, 140 ... Medium level metadata extraction unit, 142 ... Beat information extraction unit, 144 ... Medium level Music feature extraction unit, 146... Image feature extraction unit, 148... Keyword extraction unit, 150... Keyword extraction unit, 160. 162 ... high-level music characteristic quantity extracting section, 164 ... face extraction unit, 166 ··· evaluation extractor.

Claims

A first metadata extraction unit for extracting first level metadata from content data;
A metadata generation device that hierarchically generates metadata related to content, including a second metadata extraction unit that extracts second level metadata from first level metadata;
The first metadata extraction unit extracts a plurality of types of first level metadata from first data included in content data,
The second metadata extraction unit extracts second level metadata from a plurality of types of first level metadata.

The metadata generation apparatus according to claim 1, further comprising a third metadata extraction unit that extracts third level metadata from the second level metadata.

The first metadata extraction unit extracts a plurality of types of first-level metadata from first data included in content data and second data having different types from the first data,
The second metadata extraction unit is configured to extract a second level from a plurality of types of metadata extracted based on different types of first data and second data included in the content data in the first metadata extraction unit. The metadata generation apparatus according to claim 1, wherein metadata is extracted.

The first metadata extraction unit extracts a plurality of types of first-level metadata from first data included in content data and second data having different types from the first data,
The second metadata extraction unit includes a plurality of types of first data from a plurality of types of metadata extracted based on different types of first data and second data included in the content data in the first metadata extraction unit. Extract two levels of metadata,
3. The third metadata extraction unit according to claim 2, wherein the third metadata extraction unit extracts third level metadata from a plurality of types of second level metadata extracted by the second metadata extraction unit. Metadata generation device.

Extracting first level metadata from content data;
A metadata generation method for hierarchically generating metadata related to content, comprising: extracting second level metadata from first level metadata;
The step of extracting the first level metadata extracts a plurality of types of first level metadata from one type of data included in the content data,
The step of extracting the second level metadata includes extracting one type of second level metadata from a plurality of types of first level metadata.

On the computer,
A function of extracting a plurality of types of first level metadata from one type of data included in the content data;
A program that realizes a function of extracting one type of second level metadata from a plurality of types of first level metadata, and hierarchically generates metadata related to the content.