JP2009157278A

JP2009157278A - Device for detecting audio signal feature and method for detecting feature

Info

Publication number: JP2009157278A
Application number: JP2007338206A
Authority: JP
Inventors: Hirokazu Takeuchi; 広和竹内
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2009-07-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and device for detecting feature suitable for a content including an audio signal such as voice and a music piece, with low processing load. <P>SOLUTION: The device includes: an information analysis section 440 for obtaining classification information of a stream from an input stream; an analysis control section 450 for specifying a method for analyzing the input stream from the classification information obtained by the information analysis section; a feature parameter conversion section 460 which analyzes the input stream based on the analyzing method specified by the analysis control section, and which converts it to the feature parameter to be specified as a detection item; a detection processing section 470 for detecting whether or not, the detection item which is the feature parameter specified by the feature parameter conversion section, is included in the input stream; and an indexing section 480 for holding a detection result of the detection item which is detected by the detection processing section, so as to be retrieved when playing back the input stream. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、主として動画情報とオーディオ信号からなり地表波や放送衛星からの電波として供給される公衆向け放送やケーブルネットワーク等により配信される映像信号及びおよび音声や楽曲等のオーディオ信号を含むコンテンツに適した特徴検出処理を、低処理負荷で検出する方法及び検出装置に関する。 The present invention is a content including video signals and audio signals such as audio and music, which are mainly composed of moving image information and audio signals and distributed by the public broadcasting or cable network supplied as surface waves or radio waves from broadcasting satellites. The present invention relates to a method and a detection apparatus for detecting a suitable feature detection process with a low processing load.

動画情報（映像信号）やオーディオ信号（音声信号）を受信して再生するテレビ受信器や受信した動画およびオーディオを記録して保存できる録画再生装置（ビデオレコーダ）や動画撮像装置（ビデオカメラ）等の普及と発展は、めざましい。また、既にパーソナルコンピュータにおいても、テレビ放送を受信して映像や音声を再生できる機能が標準仕様として用意されている場合も少なくない。 TV receivers that receive and play back video information (video signals) and audio signals (audio signals), recording and playback devices (video recorders) and video imaging devices (video cameras) that can record and store received video and audio, etc. The spread and development of this is remarkable. In addition, even in personal computers, there are many cases where a function for receiving a television broadcast and reproducing video and audio is already provided as a standard specification.

このような、さまざまなコンテンツを容易に取得可能な環境において、例えば音楽番組やコンテンツに含まれる楽曲部分とそれ以外の部分とを分離したい、という音楽検出の要求に関する記載が特許文献１に見られる。 In such an environment in which various contents can be easily acquired, for example, Patent Document 1 shows a description regarding a music detection request for separating a music part and other parts included in a music program or content. .

特許文献１には、音楽検出を行う際に、検出処理すべきデータが符号化ストリームの場合は、復号処理の過程で得られる周波数領域のＬチャネル信号とＲチャネル信号の和成分と差成分に基づいてパワー比を計算し、閾値判定することが提案されている。
特開２００６−３０１１３４ In Patent Literature 1, when music detection is performed, if the data to be detected is an encoded stream, the sum component and the difference component of the L-channel signal and the R-channel signal in the frequency domain obtained in the decoding process are described. It has been proposed to calculate a power ratio based on this and determine a threshold value.
JP 2006-301134 A

特許文献１は、音楽検出に特化した方式であり音楽番組のようなコンテンツには適しているが、それ以外のニュース番組やバラエティ番組等では検出処理として意味をなさない。 Patent Document 1 is a method specialized in music detection and is suitable for content such as a music program, but does not make sense as detection processing for other news programs or variety programs.

また、この検出方法をベースにして、例えばＣＭ（コマーシャルメッセージ、商業放送に含まれるコンテンツ本編とは独立した構成）を検出使用とした場合には、２チャンネル（ｃｈ）分の信号に対して解析処理をする必要はないため、無駄な処理が生じることになってしまう。 In addition, based on this detection method, for example, when CM (commercial message, configuration independent of the main content included in commercial broadcast) is used for detection, analysis is performed on signals for two channels (ch). Since it is not necessary to perform processing, useless processing will occur.

また、ニュース番組等の（比較的帯域の狭い）スピーチ信号が主体であることが自明の場合には、必ずしも全帯域の成分について解析処理をする必要はないため、この点においても無駄な処理が生じることになってしまう。 In addition, when it is obvious that a speech signal such as a news program is mainly (a relatively narrow band), it is not always necessary to perform analysis processing on the components of the entire band. Will end up.

また、マルチチャネル信号を含むストリームの場合には、全てのチャネルデータを復号した後にダウンミックス処理して２ｃｈ信号を得るため、処理負荷が高くなってしまう。一般に、マルチチャネル信号の場合に多くの主情報はフロントのセンターチャネルあるいはＬＲチャネルに含まれている。 Further, in the case of a stream including a multi-channel signal, the processing load increases because a 2ch signal is obtained by downmixing after decoding all channel data. In general, in the case of a multi-channel signal, a lot of main information is contained in the front center channel or LR channel.

また、音楽検出のみに限定して考察した場合であっても、放送や蓄積メディア等で一般的に利用されているオーディオ符号化方式であるＡＡＣ（Advanced Audio Coding）規格を想定した場合、周波数領域信号に対していくつかのオプション処理が存在し、それらに対する扱いが明確ではない。 In addition, even when considering only music detection, the frequency domain is assumed when an AAC (Advanced Audio Coding) standard, which is an audio encoding method generally used in broadcasting and storage media, is assumed. There are some optional processes for signals, and their treatment is not clear.

この発明の目的は、音声や楽曲等のオーディオ信号を含むコンテンツに対して、そのコンテンツの分類情報に応じて適切な検出処理を特定し、特定された検出処理に応じた解析方法を実行することにより、コンテンツに適した特徴検出処理を低処理負荷で行う（検出する）方法及び検出装置を提供することである。 An object of the present invention is to specify an appropriate detection process for content including audio signals such as voice and music according to the content classification information, and execute an analysis method according to the specified detection process Thus, a method and a detection apparatus for performing (detecting) a feature detection process suitable for content with a low processing load are provided.

この発明は、上記問題点に基づきなされたもので、入力ストリームからそのストリームの分類情報を取得する情報分析部と、この情報分析部により取得した分類情報から入力ストリームを、解析レベル、解析帯域及び解析チャネルの少なくとも１つについて解析することを特定する解析制御部と、この解析制御部にて特定された解析する方法に基づいて入力ストリームを解析し、検出項目として特定すべき特徴パラメータに変換する特徴パラメータ変換部と、この特徴パラメータ変換部により特定された特徴パラメータである検出項目が入力ストリームに含まれるか否かを検出する検出処理部と、この検出処理部により検出された検出項目の検出結果を入力ストリームの再生時に検索可能に記録する記録部と、を有するオーディオ信号特徴検出装置を提供するものである。 The present invention has been made on the basis of the above problems. An information analysis unit that acquires classification information of the stream from the input stream, and an input stream from the classification information acquired by the information analysis unit, an analysis level, an analysis band, and Analyzing the input stream based on an analysis control unit that identifies analysis of at least one of the analysis channels and the analysis method specified by the analysis control unit, and converts the input stream into a characteristic parameter that should be specified as a detection item A feature parameter conversion unit; a detection processing unit that detects whether or not a detection item that is a feature parameter specified by the feature parameter conversion unit is included in the input stream; and detection of a detection item detected by the detection processing unit A recording unit for recording the result so as to be searchable when the input stream is played back. There is provided an apparatus.

この発明の実施の形態によれば、検出処理する符号化データのメタ情報に応じた適切な検出処理を判定し、その検出処理に応じた必要な信号を取得するため、解析レベルや解析信号帯域あるいは解析対象チャネルを制御することにより最低限の処理負荷で高速に検出処理を行うことができる。これにより、比較的低速で安価なプロセッサでも処理が可能になり、低コスト化が図れる。 According to the embodiment of the present invention, in order to determine an appropriate detection process according to the meta information of the encoded data to be detected and acquire a necessary signal according to the detection process, the analysis level and the analysis signal band Alternatively, detection processing can be performed at high speed with a minimum processing load by controlling the analysis target channel. As a result, even a relatively low-speed and inexpensive processor can perform processing, and the cost can be reduced.

特に、解析対象がマルチチャネルである場合に、処理負荷が必要以上に増加することが低減でき、より高速に、かつシステムの処理能力に応じた柔軟な検出処理が実現できる。 In particular, when the analysis target is multi-channel, it is possible to reduce an increase in processing load more than necessary, and it is possible to realize a flexible detection process at a higher speed and according to the processing capacity of the system.

一般に、マルチチャネル信号の場合に多くの主情報はフロントのセンターチャネルあるいはＬＲチャネルに含まれている。この特性を考慮してコンテンツに応じて適切な検出項目（音楽検出、ＣＭ検出等）を判断し、それに応じたチャネル信号（フロントセンターチャネルあるいはフロントＬＲチャネル）のみを抽出して適切な（解析レベルおよび帯域で）解析を行うことで余分な復号処理を行うことなく高速な特徴検出処理を実現できる。 In general, in the case of a multi-channel signal, a lot of main information is contained in the front center channel or LR channel. Considering this characteristic, an appropriate detection item (music detection, CM detection, etc.) is determined according to the content, and only the corresponding channel signal (front center channel or front LR channel) is extracted and appropriate (analysis level) By performing the analysis (with a band), it is possible to realize a high-speed feature detection process without performing an extra decoding process.

以下、この発明の一実施の形態について図面を参照して説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態が適用される映像記録再生装置（ビデオレコーダ）の一例を示す。なお、以下に説明する本発明の実施の形態は、映像記録再生装置にのみ適用されるものではなく、例えばパーソナルコンピュータやビデオカメラ、もしくはビデオプログラムや映像コンテンツ等を再生可能な携帯端末装置においても適用可能である。また、携帯端末装置やパーソナルコンピュータもしくはビデオゲーム端末装置に対して供給可能に取り扱われるプログラムとして販売されることも可能である。 FIG. 1 shows an example of a video recording / playback apparatus (video recorder) to which an embodiment of the present invention is applied. The embodiment of the present invention described below is not only applied to a video recording / reproducing apparatus, but also, for example, in a personal computer, a video camera, or a portable terminal device capable of reproducing a video program, video content, etc. Applicable. It can also be sold as a program that can be supplied to a portable terminal device, a personal computer, or a video game terminal device.

図１に示すビデオレコーダ（映像記録再生装置）１は、例えば放送衛星または通信衛星を介して提供される衛星デジタルＴＶ放送、地表波（空間波）により提供される地上デジタル放送及びアナログＴＶ放送、あるいはケーブルネットワークを介して供給される映像コンテンツ等の受信機能を持つチューナ部（TV tuner）１０を有し、チューナ部１０からの出力は、映像系アナログ−デジタルコンバータ（Video ADC）１４と、オーディオ（音声／音楽）系アナログ−デジタルコンバータ（Audio ADC）１６に入力される。外部入力端子（Aux）１２からの入力信号もまた、映像系アナログ−デジタルコンバータ１４およびオーディオ（音声／音楽）系アナログ−デジタルコンバータ１６に入力される。 A video recorder (video recording / playback apparatus) 1 shown in FIG. 1 includes a satellite digital TV broadcast provided via a broadcasting satellite or a communication satellite, a terrestrial digital broadcast and an analog TV broadcast provided by a ground wave (spatial wave), Alternatively, a tuner unit (TV tuner) 10 having a function of receiving video content and the like supplied via a cable network is provided, and an output from the tuner unit 10 is a video analog-to-digital converter (Video ADC) 14 and an audio. The signal is input to an (audio / music) system analog-to-digital converter (Audio ADC) 16. An input signal from the external input terminal (Aux) 12 is also input to the video analog-digital converter 14 and the audio (sound / music) analog-digital converter 16.

映像系アナログ−デジタルコンバータ１４でデジタル化されたビデオストリームおよびオーディオ（音声／音楽）系アナログ−デジタルコンバータ１６でデジタル化されたオーディオストリームは、ＭＰＥＧエンコーダ（MPEG Encoder）２０に入力される。外部デジタル入力端子１８からのデジタルストリーム（ＭＰＥＧ２−ＴＳ（ＴＳは、Transport Streamの略）等）は、ＩＥＥＥ１３９４（あるいはＨＤＭＩ）等のインタフェース（Ｉ／Ｆ，interface）１９を介して、ＭＰＥＧエンコーダ２０に入力される。 The video stream digitized by the video analog-digital converter 14 and the audio stream digitized by the audio (sound / music) analog-digital converter 16 are input to an MPEG encoder 20. A digital stream (MPEG2-TS (TS is an abbreviation of Transport Stream), etc.) from the external digital input terminal 18 is sent to the MPEG encoder 20 via an interface (I / F, interface) 19 such as IEEE1394 (or HDMI). Entered.

チューナ部１０に供給されたＴＶ放送信号がＭＰＥＧ２−ＴＳ等のデジタル信号である場合は、（チューナ部１０からのデジタルストリームは）そのまま、ＭＰＥＧエンコーダ２０に入力される。 When the TV broadcast signal supplied to the tuner unit 10 is a digital signal such as MPEG2-TS, the digital stream from the tuner unit 10 is input to the MPEG encoder 20 as it is.

ＭＰＥＧエンコーダ２０は、入力されたＭＰＥＧ２−ＴＳをそのまま出力する（パススルー（Pass thourgh）する）場合以外は、入力されたストリームをＭＰＥＧ２−ＰＳにエンコードするか、ＭＰＥＧ４−ＡＶＣにエンコードする。本発明は、デジタル符号化されたストリームを記録する場合に、コンテンツの情報に基づいて特徴検出およびインデキシングを行う場合を想定しており、上述のうちの符号化されたストリームあるいはパススルーされたストリームに対して適用する処理となる。 The MPEG encoder 20 encodes the input stream into MPEG2-PS or encodes into MPEG4-AVC, except for outputting the input MPEG2-TS as it is (passing through). In the present invention, when a digitally encoded stream is recorded, it is assumed that feature detection and indexing are performed based on content information, and the above-described encoded stream or pass-through stream is recorded. It is processing to apply to.

ＭＰＥＧエンコーダ２０において処理されたストリームデータは、高速メモリ、例えばＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）２２等に一旦バッファリングされる。 The stream data processed in the MPEG encoder 20 is temporarily buffered in a high-speed memory such as an SDRAM (Synchronous Dynamic Random Access Memory) 22.

ＳＤＲＡＭ２２にバッファリングされ、所定の処理が施されたストリームデータは、その内容に応じて、所定のタイミングで、ＨＤＤ１０４、ディスクドライブユニット（Disk Drive Unit）２４もしくはメモリスロット２６に転送される。 The stream data buffered in the SDRAM 22 and subjected to the predetermined processing is transferred to the HDD 104, the disk drive unit 24 or the memory slot 26 at a predetermined timing according to the contents.

ＨＤＤ１０４は、ＨＤＤ（すなわちハードディスクドライブであり、例えば１ＴＢ（１０００ＧＢ））の容量が与えられているＨＤ（ハードディスク）を含む。 The HDD 104 includes an HD (hard disk) to which a capacity of HDD (that is, a hard disk drive, for example, 1 TB (1000 GB)) is given.

ディスクドライブユニット２４は、円盤状の記録媒体であって、ＨＤ（High Definition）ＤＶＤ規格（再生専用で１５ＧＢ、記録可能で２０ＧＢ）の光ディスク１００やＤＶＤ規格（４．５ＧＢ）の光ディスク１０２に、データ（ストリーム）を記録可能で、かつ既に記録されているデータ（ストリーム）を再生可能である。 The disk drive unit 24 is a disc-shaped recording medium, and stores data (optical (100 GB for reproduction only, 15 GB for recording) and optical disk 102 for DVD standard (4.5 GB)) on the HD (High Definition) DVD standard. Stream) can be recorded, and already recorded data (stream) can be reproduced.

メモリスロット２６には、例えば２ＧＢ程度の容量が与えられたカードメモリ１０６が挿入されて、用いられる。 For example, a card memory 106 having a capacity of about 2 GB is inserted into the memory slot 26 and used.

ディスクドライブユニット２４、ＨＤＤ１０４もしくはメモリスロット２６を介して、光ディスク１００または１０２、ＨＤ（ＨＤＤ１０４内）またはカードメモリ１０６から再生されたストリームデータは、ＳＤＲＡＭ２２を経由して、ＭＰＥＧデコーダ（MPEG Decoder）３０に転送される。 Stream data reproduced from the optical disk 100 or 102, HD (in the HDD 104) or the card memory 106 via the disk drive unit 24, HDD 104 or memory slot 26 is transferred to the MPEG decoder (MPEG Decoder) 30 via the SDRAM 22. Is done.

ＭＰＥＧデコーダ３０は、転送されてきたストリームに応じ、ＭＰＥＧ２−ＴＳ、ＭＰＥＧ２−ＰＳまたはＭＰＥＧ４−ＡＶＣをデコード可能である。 The MPEG decoder 30 can decode MPEG2-TS, MPEG2-PS or MPEG4-AVC according to the transferred stream.

ＭＰＥＧデコーダ３０でデコードされたビデオデータ（ＭＰＥＧ２−ＴＳまたはＭＰＥＧ２−ＰＳ）は、映像系デジタル−アナログコンバータ（Video DAC）３２により標準画質または高精細画質のアナログビデオ信号に変換され、ビデオ出力（Video Out）端子３６に供給される。なお、ビデオ出力端子３６と表示装置（モニタ装置／表示部）５２を接続することで、表示装置５２に映像が表示される。 Video data (MPEG2-TS or MPEG2-PS) decoded by the MPEG decoder 30 is converted into a standard or high-definition analog video signal by a video digital-analog converter (Video DAC) 32, and a video output (Video Out) terminal 36. The video is displayed on the display device 52 by connecting the video output terminal 36 and the display device (monitor device / display unit) 52.

一方、ＭＰＥＧデコーダ３０でデコードされたオーディオデータは、オーディオ（音声／音楽）系デジタル−アナログコンバータ（Audio DAC）３４によりアナログオーディオ信号に変換され、オーディオ（音声）出力（Audio Out）端子３８に供給される。なお、オーディオ出力端子３８とスピーカ（表示装置５２に内蔵されている場合や独立している場合がある）を接続することで、音声あるいは音楽が再生される。 On the other hand, the audio data decoded by the MPEG decoder 30 is converted into an analog audio signal by an audio (sound / music) system digital-analog converter (Audio DAC) 34 and supplied to an audio (sound) output (Audio Out) terminal 38. Is done. Note that audio or music is reproduced by connecting the audio output terminal 38 and a speaker (which may be built in the display device 52 or may be independent).

なお、ＭＰＥＧデコーダ３０に供給されたデータがＭＰＥＧ２−ＴＳの場合は、ＩＥＥＥ１３９４（またはＨＤＭＩ）等のインタフェース３７を経由して、そのままデジタル出力（Digital Out）端子３９に供給される。 When the data supplied to the MPEG decoder 30 is MPEG2-TS, the data is supplied as it is to the digital output (Digital Out) terminal 39 via the interface 37 such as IEEE1394 (or HDMI).

また、図１に示した録画再生装置（ＨＤＤＶＤレコーダ）は、主制御ブロック４０により制御される。主制御ブロック４０は、ストリームパーサとして機能するもので、図示しないが、ＭＰＵ（マイクロプロセッサ）もしくはＣＰＵ（セントラルプロセッサ）を含み、ファームウエアや種々な制御パラメータを格納するＥＥＰＲＯＭ４２、ワークＲＡＭ４４、タイマ４６等が付属している。なお、主制御ブロック４０は、ＳＤＲＡＭ２２及びＭＰＥＧデコーダ３０との間でストリームを管理し、映像の記録及び映像の再生に用いられることはいうまでもない。 Further, the recording / reproducing apparatus (HD DVD recorder) shown in FIG. The main control block 40 functions as a stream parser. Although not shown, the main control block 40 includes an MPU (microprocessor) or a CPU (central processor), and stores an EEPROM 42, a work RAM 44, a timer 46, etc. for storing firmware and various control parameters. Comes with. Needless to say, the main control block 40 manages streams between the SDRAM 22 and the MPEG decoder 30 and is used for video recording and video reproduction.

主制御ブロック４０にはまた、オーディオ信号特徴検出部４１０が内挿あるいは接続されている。 An audio signal feature detection unit 410 is also inserted or connected to the main control block 40.

オーディオ信号特徴検出部４１０は、図２を用いて後段に詳細に説明するが、ストリーム分離部４２０、ストリーム解析部４３０、メタ情報分析部４４０、適応解析制御部４５０、特徴パラメータ変換部４６０、検出処理部４７０、及びインデキシング（Indexing）部４８０等を含む。 The audio signal feature detection unit 410 will be described in detail later with reference to FIG. 2, but a stream separation unit 420, a stream analysis unit 430, a meta information analysis unit 440, an adaptive analysis control unit 450, a feature parameter conversion unit 460, a detection A processing unit 470, an indexing unit 480, and the like are included.

符号化ストリームは、ストリーム分離部４２０において、第１にメタ情報が分離され、図６を用いて後段に詳細に説明する検出項目が判断される。判断された検出項目は、図７を用いて後段に説明するが、符号化ストリームに対する解析レベルと解析帯域とが関連づけられる。これにより、本願発明においては、コンテンツに基づいて最適なストリーム解析方法が設定され、高速に（オーディオ信号の）特徴を検出する機能を実現することができる。 In the encoded stream, first, meta information is separated in the stream separation unit 420, and detection items to be described in detail later are determined using FIG. The determined detection items will be described later with reference to FIG. 7, and an analysis level and an analysis band for an encoded stream are associated with each other. Accordingly, in the present invention, an optimum stream analysis method is set based on the content, and a function of detecting features (of an audio signal) at high speed can be realized.

より詳細には、図２に示すストリーム分離部４２０において、受信したあるいは入力された符号化ストリームが、オーディオ符号化データとそのメタ情報とに分離される。 More specifically, the stream separation unit 420 shown in FIG. 2 separates the received or input encoded stream into audio encoded data and its meta information.

ストリーム分離部４２０において分離されたメタ情報は、メタ情報分析部４４０に入力され、（同）オーディオ符号化データは、ストリーム解析部４３０に供給される。なお、メタ情報としては、例えば、オーディオデータを含むコンテンツと共に放送される電子番組情報（ＥＰＧ）等がある。 The meta information separated in the stream separation unit 420 is input to the meta information analysis unit 440, and (same) audio encoded data is supplied to the stream analysis unit 430. The meta information includes, for example, electronic program information (EPG) that is broadcast together with contents including audio data.

メタ情報分析部４４０は、電子番組情報から受信ストリームに対応する番組分類情報を分析し、得られた番組分類情報を適応解析制御部４５０に出力する。 The meta information analysis unit 440 analyzes the program classification information corresponding to the received stream from the electronic program information, and outputs the obtained program classification information to the adaptive analysis control unit 450.

適応解析制御部４５０では、得られた番組分類情報からストリーム解析する方法を判定し、判定結果を制御情報としてストリーム解析部４３０に出力し、検出項目を検出処理部４７０に出力する。 The adaptive analysis control unit 450 determines a method for performing stream analysis from the obtained program classification information, outputs the determination result as control information to the stream analysis unit 430, and outputs detection items to the detection processing unit 470.

ストリーム解析部４３０は、ストリーム分離部４２０から入力されたオーディオ符号化ストリームを解析し、解析して得られた情報を特徴パラメータ変換部に出力する。この解析の際に、図７を用いて後段に説明するが、適応解析制御部からの制御情報で指示されるどのレベルまでストリームを解析するか、解析する信号帯域はどこまでか、どのチャネルを解析するかといった情報に基づいて処理を行う。 The stream analysis unit 430 analyzes the audio encoded stream input from the stream separation unit 420, and outputs information obtained by the analysis to the feature parameter conversion unit. In this analysis, FIG. 7 will be used to explain later, but to what level is the stream analyzed by the control information from the adaptive analysis control unit, what signal band is analyzed, what channel is analyzed Processing is performed based on information such as whether to do.

特徴パラメータ変換部４６０では、ストリーム解析部４３０によるストリーム解析結果として得られる情報（主にスペクトラム）から、検出項目に適した（検出項目として特定（抽出）すべき）特徴パラメータに変換し、検出処理部４７０に出力する。なお、変換する特徴パラメータとしては、各チャネルのパワー情報等がある。 The feature parameter conversion unit 460 converts information (mainly spectrum) obtained as a stream analysis result by the stream analysis unit 430 into a feature parameter suitable for a detection item (to be specified (extracted as a detection item)), and performs detection processing. To the unit 470. Note that the characteristic parameters to be converted include power information of each channel.

検出処理部４７０では、ストリーム解析部４３０におけるストリーム解析結果から得られる特徴パラメータに基づいて、適応解析制御部４５０にから指示された検出項目が検出されたか否かを判定し検出結果を、インデキシング部４８０へ出力する。 In the detection processing unit 470, based on the feature parameter obtained from the stream analysis result in the stream analysis unit 430, it is determined whether or not the detection item instructed from the adaptive analysis control unit 450 is detected, and the detection result is displayed as an indexing unit. Output to 480.

検出処理部４７０における検出処理としては、図６を用いて一例を示すが、例えばＣＭ（コマーシャルメッセージ、商業放送に含まれるコンテンツ本編と独立した構成）検出、音楽検出、歓声検出、及びコーナー音検出、等がある。 An example of the detection processing in the detection processing unit 470 is shown in FIG. 6. For example, CM (commercial message, content independent of main content included in commercial broadcast) detection, music detection, cheer detection, and corner sound detection , Etc.

ＣＭ検出の場合には、ＣＭの前後で挿入される一定レベル以下の無音を特徴パラメータとして得られるパワー情報から判定し、無音の間隔がＣＭの長さの一般的な規則（１５秒あるいは３０秒等）に当てはまるか否かを判定して、ＣＭの始まり／終わりを検出する。 In the case of CM detection, silence below a certain level inserted before and after the CM is determined from the power information obtained as a characteristic parameter, and the silence interval is a general rule of CM length (15 seconds or 30 seconds). Etc.) and the beginning / end of the CM is detected.

音楽検出では、受信するＬチャンネル（ｃｈ）とＲチャンネル（ｃｈ）の２チャンネル（ｃｈ）の信号が楽音信号の特徴としてのＬｃｈとＲｃｈとで信号の違いの度合い（Ｌ−Ｒ等の変動）と継続時間に基づいて判定して、音楽区間の始まり／終わりを検出する。 In music detection, the difference in signal between the L channel (ch) and the R channel (ch) of the received L channel (ch) is different between Lch and Rch, which is a characteristic of the musical sound signal (variation of LR, etc.). And determining the start / end of the music section.

歓声検出やコーナー音検出では、特徴パラメータとして得られるスペクトルの形状に基づいて、該当区間を検出する。 In cheer detection and corner sound detection, the corresponding section is detected based on the shape of the spectrum obtained as the characteristic parameter.

インデキシング部４８０では、検出処理部４７０からの検出結果とそれに対応する時間情報に基づいて、どの時刻に、どのイベントが発生したかを、検索処理に利用できるインデックス情報として保存する。なお、イベントとは、検出した特徴に対応するもので、ＣＭの始まり／終わり、音楽の始まり／終わり等である。また、インデキシング部４８０では、記録媒体に応じて、その記録媒体に固有のリードインエリアやヘッダー情報記録エリアまたはＴＯＣ（Table of contents）等の所定の記録領域に、上述のインデキシング情報を記録することはいうまでもない。 Based on the detection result from the detection processing unit 470 and the corresponding time information, the indexing unit 480 stores which event occurred at which time as index information that can be used for the search processing. The event corresponds to the detected feature, such as the start / end of CM, the start / end of music, and the like. The indexing unit 480 records the above-described indexing information in a predetermined recording area such as a lead-in area, a header information recording area, or a TOC (Table of contents) unique to the recording medium, depending on the recording medium. Needless to say.

図３は、図２により説明した適応解析制御部の構成の一例を説明する概略ブロック図である。 FIG. 3 is a schematic block diagram illustrating an example of the configuration of the adaptive analysis control unit described with reference to FIG.

検索方法判定部４５２は、メタ情報分析部４４０からの番組分類情報に基づいて、図６に示すが、予め設定された番組分類とそれに適した検出項目が対応付けられたテーブルを引き（テーブルを参照して）、検出項目を判定する。 Based on the program classification information from the meta-information analysis unit 440, the search method determination unit 452 draws a table in which a preset program classification and detection items suitable for the program classification are associated with each other (shown as a table). See) and determine the detection item.

例えば、音楽番組であれば音楽区間を、スポーツ番組であれば観客が盛り上がる歓声区間を、その他にＣＭ（コマーシャル）区間を検出項目として対応する。なお、ＣＭ区間としては、図６に示す通り、例えば音楽番組やスポーツ番組、バラエティ番組あるいは映画等において、広く検出される。 For example, a music section corresponds to a music program, a cheer section where a spectator excites the sports program, and a CM (commercial) section as a detection item. Note that the CM section is widely detected in, for example, a music program, a sports program, a variety program, or a movie as shown in FIG.

さらに、解析レベル判定部４５４において、図６に示した上述の検出項目に基づいて、ストリームをどの程度まで解析すべきかが判定される。 Further, the analysis level determination unit 454 determines to what extent the stream should be analyzed based on the above-described detection items shown in FIG.

また、解析帯域判定部４５６では、ストリーム解析する帯域を判定する（ストリームを解析すべき帯域が特定される）。 Also, the analysis band determination unit 456 determines a band for stream analysis (a band for analyzing the stream is specified).

またさらに、解析チャネル判定部４５８では、オーディオ符号化ストリームが、２ｃｈより多い、チャネル数のマルチチャネル（ｃｈ）信号の場合に、解析するチャネルを判定する。これは、検出項目によって、必要とされる解析レベルや帯域および解析チャネルが異なるためであり、図７に示すような、予め設定された検出項目とそれに適した解析処理（レベル、帯域、もしくはチャネル）が対応付けられたテーブルを引く（テーブルを参照する）ことにより判定できる。 Further, the analysis channel determination unit 458 determines a channel to be analyzed when the audio encoded stream is a multi-channel (ch) signal having more channels than two channels. This is because the required analysis level, bandwidth, and analysis channel differ depending on the detection item. As shown in FIG. 7, a preset detection item and an analysis process (level, bandwidth, or channel) suitable for the detection item are set. ) Can be determined by drawing a table associated with it (referring to the table).

図４は、ストリーム解析部の内部ブロック図を示している。 FIG. 4 shows an internal block diagram of the stream analysis unit.

ストリーム解析部４３０は、ＭＰＥＧ−２ＡＡＣ（Advanced Audio Coding）規格に基づくもので、シンタックス（Syntax）解析部４３２と逆量子化部４３４とジョイントステレオ（ＪＳ）部４３６とＴＮＳ（Temporal Noise Shaping）部４３８を、少なくとも有する。なお、シンタックス解析部４３２と逆量子化部４３４との間に、マルチチャネルストリームの場合に上述の解析チャネルを判定するための解析チャネル判定部４３３が位置（挿入）される。 The stream analysis unit 430 is based on the MPEG-2 AAC (Advanced Audio Coding) standard, and includes a syntax analysis unit 432, an inverse quantization unit 434, a joint stereo (JS) unit 436, and a TNS (Temporal Noise Shaping). It has at least part 438. Note that an analysis channel determination unit 433 for determining the above-described analysis channel in the case of a multi-channel stream is positioned (inserted) between the syntax analysis unit 432 and the inverse quantization unit 434.

シンタックス解析部４３２は、ＡＡＣ規格に従って、オーディオ符号化ストリームからハフマン復号等により量子化スペクトルやスケールファクタ（スペクトルのスケーリング情報）、及びまたはチャンネル間相関情報等の復号パラメータを抽出し、逆量子化部４３４へ出力する。ここまでの解析が解析レベル１（図７には示されていないが、いずれのストリームに関しても必ず解析される）に相当し、音声あるいは楽曲のチャネル数やサンプリング周波数等の基本的な符号化パラメータが得られる。 The syntax analysis unit 432 extracts a quantization parameter, a scale factor (spectral scaling information), or a decoding parameter such as correlation information between channels from an audio encoded stream by Huffman decoding or the like according to the AAC standard, and performs inverse quantization. Output to the unit 434. The analysis so far corresponds to analysis level 1 (not shown in FIG. 7, but it is always analyzed for any stream), and basic encoding parameters such as the number of voice or music channels and the sampling frequency. Is obtained.

ここで、入力符号化ストリームがマルチチャネル信号である場合は、適応解析制御からの解析チャネルの指定に応じ、フロントセンターチャネルあるいはフロントＬＲチャネルのみの復号パラメータを逆量子化部４３４に出力する。こうすることで、オーディオ情報として特徴を大きくロスすることなく、後続の逆量子化等の処理については、限定されたチャネル数分だけとなる。従って、処理負荷が大きく軽減される。 Here, when the input encoded stream is a multi-channel signal, the decoding parameter of only the front center channel or the front LR channel is output to the inverse quantization unit 434 according to the analysis channel designation from the adaptive analysis control. By doing so, the subsequent processing such as inverse quantization is performed only for the limited number of channels without greatly losing the characteristics as audio information. Therefore, the processing load is greatly reduced.

また、２ｃｈ以下のステレオ信号については、量子化スペクトル等の復号パラメータをそのまま逆量子化部４３４に出力する。 For stereo signals of 2ch or less, the decoding parameters such as the quantized spectrum are output to the inverse quantization unit 434 as they are.

逆量子化部４３４は、（シンタックス解析部４３２において解析された）量子化スペクトルを、スケールファクタに基づいて逆量子化し、本来のリニアスケールのスペクトラムを求める。ここまでの解析が解析レベル２に相当し、逆量子化スペクトラムが得られる。但し、後段のジョイントステレオ処理をしていないため、帯域によっては、ＬｃｈとＲｃｈの和信号の場合とＬｃｈの場合とが混在しているスペクトルである。 The inverse quantization unit 434 inversely quantizes the quantized spectrum (analyzed by the syntax analysis unit 432) based on the scale factor to obtain an original linear scale spectrum. The analysis so far corresponds to analysis level 2 and an inverse quantization spectrum is obtained. However, since the subsequent joint stereo processing is not performed, depending on the band, the spectrum is a mixture of Lch and Rch sum signals and Lch cases.

ジョイントステレオ（ＪＳ）部４３６は、逆量子化部４３４により逆量子化されたスペクトラムに、ＭＳ（Mid/Side）ステレオやＩＳ（インテンシティ）ステレオ処理を行うことによって、本来のＬｃｈ信号とＲｃｈ信号とに分かれたスペクトラムを求める。ここまでの解析が解析レベル３に相当し、各チャネルのスペクトラムが得られる。 The joint stereo (JS) unit 436 performs MS (Mid / Side) stereo or IS (intensity) stereo processing on the spectrum dequantized by the dequantization unit 434, thereby performing the original Lch signal and Rch signal. The spectrum divided into and is obtained. The analysis so far corresponds to analysis level 3, and the spectrum of each channel is obtained.

ＴＮＳ（Temporal Noise Shaping）部４３８は、符号化時の時間域信号で見ると、ノイズ成分を信号レベルの高い区間に集中（シェイピング）させて、ノイズを知覚させにくくするオプション処理である。すなわち、ＴＮＳ部４３８は、ストリーム解析処理から見ると、ＴＮＳ処理前のスペクトラムから本来の（エンコーダから見て時間周波数変換直後の）スペクトラムを復元するための合成フィルタをかけるもので、スペクトラムの概形を補正する。 A TNS (Temporal Noise Shaping) unit 438 is an optional process that makes it difficult to perceive noise by concentrating (shaping) a noise component in a section having a high signal level when viewed with a time domain signal at the time of encoding. That is, when viewed from the stream analysis process, the TNS unit 438 applies a synthesis filter for restoring the original spectrum (after the time frequency conversion as viewed from the encoder) from the spectrum before the TNS process. Correct.

また、デコーダの場合は、得られたスペクトラムについて、周波数時間変換であるＩＭＤＣＴ（Inverse Modified ＤＣＴ）処理によりＰＣＭ信号を求める（ことが要求される）が、ストリーム解析では周波数領域での信号のみを取り扱うので必要ない。このＴＮＳ処理後までの解析が解析レベル４に相当し、周波数時間変換直前の精度の高いスペクトラムである。 In the case of a decoder, a PCM signal is obtained (required) by IMDCT (Inverse Modified DCT) processing, which is frequency-time conversion, for the obtained spectrum, but stream analysis only handles signals in the frequency domain. So it is not necessary. The analysis until after the TNS process corresponds to the analysis level 4 and is a highly accurate spectrum immediately before the frequency time conversion.

上述したストリーム解析は、適応解析制御部４５０から指示される検出項目に基づく、上記の解析レベルによって、どの処理（解析レベル）まで行うかが制御される。これは、例えば、ＣＭ検出の場合には、無音かどうかを判定すればよく、チャネル毎の信号は、不要で解析レベル２のスペクトラムで十分であり、以降の処理は不要となることを意味している。 The above-described stream analysis controls up to which process (analysis level) is performed according to the above analysis level based on the detection item instructed from the adaptive analysis control unit 450. This means that, for example, in the case of CM detection, it may be determined whether or not there is silence, and the signal for each channel is unnecessary and the spectrum of analysis level 2 is sufficient, and the subsequent processing is unnecessary. ing.

また、音楽検出の場合は、Ｌｃｈ信号とＲｃｈ信号の和成分と差成分のパワー比が重要となるため、解析レベル３以上のスペクトラムが必要になる。反面、スピーチ信号（音声）のようなパルス的な信号に適用されやすいＴＮＳ処理は、使用頻度が少なく、適用されたとしても適用帯域も限定的であるため、解析レベル４のスペクトラムでなくとも十分である。 In the case of music detection, the power ratio of the sum component and the difference component of the Lch signal and the Rch signal is important, and therefore a spectrum of analysis level 3 or higher is required. On the other hand, TNS processing that is easily applied to pulse-like signals such as speech signals (speech) is used less frequently, and even if applied, the application band is limited. It is.

一方、歓声検出やコーナー検出の場合は、検出精度にも依存するが、検出されたスペクトラムの形状を含む特性が必要となり、精度の高いスペクトラムである解析レベル４の信号が必要になる。 On the other hand, in the case of cheer detection or corner detection, although it depends on the detection accuracy, characteristics including the shape of the detected spectrum are required, and an analysis level 4 signal that is a highly accurate spectrum is required.

このように、検出項目に応じた必要最低限の解析レベルまでのスペクトラムのみを算出することで、処理負荷が低減され、処理速度の高速化が図れる。 In this way, by calculating only the spectrum up to the minimum required analysis level according to the detection item, the processing load is reduced and the processing speed can be increased.

また、解析帯域についても同様であり、ニュース番組のようなスピーチ信号（音声）が主体の番組であれば、図７に示すように解析帯域も７ｋＨｚ程度までとれば十分であり、符号化ストリーム本来の２４ｋＨｚ帯域（４８ｋＨｚサンプリングの場合）まで解析する必要はない。一方、音楽を含む広帯域の信号については、符号化ストリームに含まれる信号帯域の全てを解析することが好ましい。 The same applies to the analysis band. If the program is mainly a speech signal (speech) such as a news program, it is sufficient to set the analysis band to about 7 kHz as shown in FIG. It is not necessary to analyze up to 24 kHz band (in the case of 48 kHz sampling). On the other hand, for a wideband signal including music, it is preferable to analyze the entire signal band included in the encoded stream.

また、解析チャネルについては、マルチチャネルストリームの場合の解析チャネル制御であり、例えばＣＭ検出の場合は、前述のように無音判定がベースになるため１チャネル分の情報で十分であり、フロントチャネル信号の内のより処理負荷の少ない（フロント）センターチャネルのみの解析とすることで負荷を低減できる。 The analysis channel is analysis channel control in the case of a multi-channel stream. For example, in the case of CM detection, information for one channel is sufficient because silence determination is the base as described above. The load can be reduced by analyzing only the (front) center channel having a smaller processing load.

これに対して、音楽検出の場合は、前述のようにチャネル間の信号特性（パワー比等）を利用して検出処理することが多いため、フロントＬＲチャネルの解析が必要になる。 On the other hand, in the case of music detection, detection processing is often performed using signal characteristics (power ratio or the like) between channels as described above, and therefore analysis of the front LR channel is required.

なお、歓声検出の場合には、アナウンサーの実況音声がセンターチャネルに含まれることが多いため、より効果的に歓声を拾うために、フロントＬＲチャネルを使って解析することが効果的である。 In the case of cheer detection, since the announcer's live speech is often included in the center channel, it is effective to analyze using the front LR channel in order to pick up cheers more effectively.

図５は、上述した一連のオーディオ信号検出方法をソフトウェアとして実現した場合のフローチャートである。 FIG. 5 is a flowchart when the above-described series of audio signal detection methods are implemented as software.

第１にストリーム分離ステップにおいて、受信したあるいは入力された符号化ストリームが、オーディオ符号化データとそのメタ情報とに分離される（Ｓ１）。 First, in the stream separation step, the received or input encoded stream is separated into audio encoded data and its meta information (S1).

ステップＳ１において分離されたメタ情報からは、メタ情報分析ステップにおいて、例えば電子番組情報（ＥＰＧ）等が抽出される（Ｓ２）。 For example, electronic program information (EPG) is extracted from the meta information separated in step S1 in the meta information analysis step (S2).

メタ情報分析ステップ（Ｓ２）において、ＥＰＧ（電子番組情報）等が抽出された場合にはＥＰＧから読み取ることのできる番組の属性、例えばニュース、映画、音楽等のメタ情報から、図６に示されるような番組分類情報と検出項目が対応づけられたテーブルに基づいて検出項目が特定（判定）される（Ｓ３）。なお、メタ情報がない場合や必要な属性情報がどのコンテンツでも共通の検出項目であるＣＭ検出を設定する。 In the meta information analysis step (S2), when EPG (electronic program information) or the like is extracted, the attributes of the program that can be read from the EPG, for example, meta information such as news, movies, and music are shown in FIG. Detection items are specified (determined) based on a table in which such program classification information and detection items are associated (S3). Note that CM detection, which is a common detection item, is set when there is no meta information or when any necessary attribute information is included in the content.

検出項目が特定されると、解析レベル判定ステップにおいて、図７に示されるような検出項目とどの程度まで詳細に解析すべきかの指標となる解析レベルが対応づけられたテーブルに基づいて、ストリーム解析時の解析レベルが特定（判定）される（Ｓ４）。 When the detection item is specified, in the analysis level determination step, the stream analysis is performed based on the table associating the detection item as shown in FIG. The analysis level at that time is specified (determined) (S4).

また、解析レベル判定と同様に、解析帯域判定ステップにおいて、図７に示されるようなテーブルに基づいて、検出項目から解析すべき帯域が特定（判定）される（Ｓ５）。 Similarly to the analysis level determination, the analysis band determination step specifies (determines) the band to be analyzed from the detection items based on the table as shown in FIG. 7 (S5).

続いて、これらの判定結果に基づいて符号化ストリームの解析処理が行われる。 Subsequently, an encoded stream analysis process is performed based on these determination results.

まず、ＡＡＣ規格に従って、オーディオ符号化ストリームからハフマン復号等により量子化スペクトルやスケールファクタ（スペクトルのスケーリング情報）、及びまたはチャネル間相関情報等の復号パラメータが抽出され（Ｓ６）、解析レベルを「レベル１」で終了してよいか否かが判断される（Ｓ７）。 First, in accordance with the AAC standard, a decoding parameter such as a quantized spectrum, a scale factor (spectral scaling information) and / or correlation information between channels is extracted from an audio encoded stream by Huffman decoding or the like (S6). It is determined whether or not the process can be terminated at "1" (S7).

ステップＳ７において、解析レベルを、「レベル１」で終了してよいと判断された場合（Ｓ７−ＹＥＳ）、ストリーム解析結果として得られる情報から、検出項目に適した（検出項目として特定（抽出）すべき）特徴パラメータに変換する特徴パラメータ変換が実行され（Ｓ１５）、ストリーム解析結果から得られた特徴パラメータに基づいて、検出処理が実行され（Ｓ１６）、検出項目が検出されたか否かを示す検出結果が、インデキシング（Indexing）される（Ｓ１７）。 If it is determined in step S7 that the analysis level can be terminated at "level 1" (S7-YES), the information obtained as the stream analysis result is suitable for the detection item (specified (extracted) as the detection item). The feature parameter conversion to be converted into the feature parameter is executed (S15), and the detection process is executed based on the feature parameter obtained from the stream analysis result (S16), indicating whether or not the detection item has been detected. The detection result is indexed (S17).

ステップＳ７において、解析レベルが「レベル１」では足りない（「レベル１」で終了できない）と判断された場合（Ｓ７−ＮＯ）、解析チャネルの指定の有無（解析チャネルが指定されているか否か）がチェックされ（Ｓ８）、指定無しの場合（Ｓ８−ＮＯ）は、全チャネルに対してステップＳ１０以降が実行される。 In step S7, if it is determined that the analysis level is not "level 1" (cannot be terminated at "level 1") (S7-NO), whether or not an analysis channel is specified (whether or not an analysis channel is specified) ) Is checked (S8), and if not specified (S8-NO), step S10 and subsequent steps are executed for all channels.

一方、解析チャネルが指定されている場合（Ｓ８−ＹＥＳ）においては、指定された解析チャネルのみに対してストリームを解析する方法として『逆量子化』が採用され、量子化スペクトルがスケールファクタに基づいて逆量子化され、本来のリニアスケールのスペクトラムが求められる（Ｓ１０）。この後、解析レベルを「レベル２」で終了してよいか否かが判断される（Ｓ１１）。 On the other hand, when the analysis channel is designated (S8-YES), “inverse quantization” is adopted as a method of analyzing the stream only for the designated analysis channel, and the quantized spectrum is based on the scale factor. Inverse quantization is performed to obtain the original linear scale spectrum (S10). Thereafter, it is determined whether or not the analysis level can be ended at “level 2” (S11).

ステップＳ１１において、解析レベルを「レベル２」で終了してよいと判断された場合（Ｓ１１−ＹＥＳ）、引き続いて、先に説明したステップＳ１５〜Ｓ１７が実行される。 If it is determined in step S11 that the analysis level can be ended at "level 2" (S11-YES), steps S15 to S17 described above are subsequently executed.

ステップＳ１１において、解析レベルが「レベル２」では不足（「レベル２」で終了できない）と判断された場合（Ｓ１１−ＮＯ）、ストリームを解析する方法として『ジョイントステレオ（ＪＳ）』が採用され、逆量子化されたスペクトラムに、ＭＳ（Mid/Side）ステレオ処理やＩＳ（インテンシティ）ステレオ処理が施されて、本来のＬｃｈ信号とＲｃｈ信号とに分かれたスペクトラムが求められる（Ｓ１２）。以下、解析レベルを「レベル３」で終了してよいか否かが判断される（Ｓ１３）。 If it is determined in step S11 that the analysis level is “level 2” that is insufficient (cannot be terminated at “level 2”) (S11—NO), “joint stereo (JS)” is adopted as a method for analyzing the stream. The inverse quantized spectrum is subjected to MS (Mid / Side) stereo processing and IS (intensity) stereo processing to obtain a spectrum divided into the original Lch signal and Rch signal (S12). Thereafter, it is determined whether or not the analysis level can be ended at “level 3” (S13).

ステップＳ１３において、解析レベルを「レベル３」で終了してよいと判断できた場合（Ｓ１３−ＹＥＳ）、同様に、先に説明したステップＳ１５〜Ｓ１７が実行される。 If it is determined in step S13 that the analysis level can be terminated at “level 3” (S13—YES), the previously described steps S15 to S17 are executed in the same manner.

ステップＳ１３において、解析レベルが「レベル３」でも不足（「レベル３」で終了できない）と判断された場合（Ｓ１３−ＮＯ）は、ストリームを解析する方法として『ＴＮＳ』が採用され、解析レベルとして「レベル４」が設定（特定）されて、検出されたスペクトラムの形状（概形）を含む特性が解析される（Ｓ１４）。 If it is determined in step S13 that the analysis level is “level 3”, but the analysis level is insufficient (cannot be terminated at “level 3”) (S13—NO), “TNS” is adopted as the method for analyzing the stream, “Level 4” is set (specified), and the characteristics including the detected spectrum shape (rough shape) are analyzed (S14).

以降は、先に説明したステップＳ１５〜Ｓ１７が実行される。 Thereafter, steps S15 to S17 described above are executed.

このように、ストリーム解析の際に、検出方法を判定し、解析レベルや解析信号帯域に加えて解析チャネルを制御することにより、マルチチャネルストリームに対しても、最適な処理負荷で高速に検出処理を行うことができる。すなわち、検出処理する符号化データのメタ情報を用いて、適切な（必要な）検出処理を判定し、その検出処理に応じた必要な信号を取得するための負荷を低減可能である。 In this way, by detecting the detection method during stream analysis and controlling the analysis channel in addition to the analysis level and analysis signal bandwidth, even multi-channel streams can be detected at high speed with the optimum processing load. It can be performed. That is, it is possible to reduce the load for determining an appropriate (necessary) detection process using the meta information of the encoded data to be detected and acquiring a necessary signal according to the detection process.

換言すると、解析レベル判定ステップにより判定された解析レベル判定結果に応じて、各ストリーム解析処理をスキップする（解析を終了してよい解析レベルが見つかった時点で、ストリーム解析処理を終了する）ことで、符号化ストリームに適した解析処理のみを実現できる。 In other words, according to the analysis level determination result determined in the analysis level determination step, each stream analysis process is skipped (the stream analysis process ends when an analysis level at which analysis can be completed is found). Only the analysis process suitable for the encoded stream can be realized.

また、解析帯域判定ステップでの解析帯域判定結果と解析チャネルによる解析チャネルの設定に応じて各ストリーム解析処理での解析帯域及び解析チャネルが制限されるため、処理負荷が軽減され、高速処理が可能となる。 In addition, the analysis band and analysis channel for each stream analysis process are limited according to the analysis band determination result in the analysis band determination step and the analysis channel setting by the analysis channel, so the processing load is reduced and high-speed processing is possible It becomes.

なお、図５を用いて説明した一連の処理は、主制御ブロック（ストリームパーサ）４０の図示しないＭＰＵまたはＣＰＵのファームウエアであってもよいし、ＥＥＰＲＯＭ４２に予め書き込まれたプログラムとして提供されてもよい。 The series of processing described with reference to FIG. 5 may be MPU or CPU firmware (not shown) of the main control block (stream parser) 40, or may be provided as a program written in the EEPROM 42 in advance. Good.

以上説明したように上記発明の実施の形態によれば、検出処理する符号化データのメタ情報に応じた適切な検出処理を判定し、その検出処理に応じた必要な信号を取得するために、解析レベルや解析信号帯域を制御することにより最適な処理負荷で高速に検出処理を行うことができる。この際、比較的簡単な計算で音楽検出を行うことができ、低コスト化できる。 As described above, according to the embodiment of the present invention, in order to determine an appropriate detection process according to the meta information of the encoded data to be detected, and to acquire a necessary signal according to the detection process, By controlling the analysis level and the analysis signal band, detection processing can be performed at high speed with an optimum processing load. At this time, music can be detected by a relatively simple calculation, and the cost can be reduced.

また、許容される処理負荷が限られる場合においても、検出精度に応じて解析レベルを制御することで、処理負荷に応じた柔軟な検出処理が実現できる。 Even when the allowable processing load is limited, a flexible detection process corresponding to the processing load can be realized by controlling the analysis level according to the detection accuracy.

なお、本発明の内容はここに記述した形態だけに限定されるものではなく、その主旨を逸脱しない範囲で、他にも様々な形態を取り得ることはいうまでもない。また、本発明では、映像記録再生装置（ビデオディスクレコーダ）を例に説明したが、ビデオレコーダが組み込まれたテレビジョン装置やパーソナルコンピュータ（ＰＣ）等、あるいはユニット化され、外部から付加可能なオーディオ（コンテンツ）再生装置等も含まれることはいうまでもない。なお、各実施の形態は、可能な限り適宜組み合わせて、もしくは一部を削除して実施されてもよく、その場合は、組み合わせもしくは削除に起因したさまざまな効果が得られる。 It should be noted that the content of the present invention is not limited to the form described here, and it goes without saying that various other forms can be taken without departing from the spirit of the invention. In the present invention, a video recording / reproducing apparatus (video disc recorder) has been described as an example. However, a television apparatus or a personal computer (PC) incorporating a video recorder, or a unitized audio that can be externally added. It goes without saying that (content) playback devices are also included. It should be noted that the embodiments may be implemented by appropriately combining them as much as possible, or by deleting some of them, and in that case, various effects resulting from the combination or deletion can be obtained.

本発明の実施の一形態が適用可能な映像記録再生装置（ビデオレコーダ）の一例を示す概略図。1 is a schematic diagram illustrating an example of a video recording / playback apparatus (video recorder) to which an embodiment of the present invention can be applied. 本発明の実施の一形態として利用可能なオーディオ信号特徴検出部の構成の一例を示す概略ブロック図。The schematic block diagram which shows an example of a structure of the audio signal feature detection part which can be utilized as one Embodiment of this invention. 本発明の実施の一形態であるオーディオ信号特徴検出部において用いられる適応解析制御部の構成の一例を示す概略ブロック図。The schematic block diagram which shows an example of a structure of the adaptive analysis control part used in the audio signal feature detection part which is one Embodiment of this invention. 本発明の実施の一形態であるオーディオ信号特徴検出部において用いられるストリーム解析部の構成の一例を示す概略ブロック図。The schematic block diagram which shows an example of a structure of the stream analysis part used in the audio signal feature detection part which is one Embodiment of this invention. 本発明の実施の一形態として利用可能なオーディオ信号特徴検出部をソフトウェアとして実現する例を説明するフローチャート。The flowchart explaining the example which implement | achieves the audio signal feature detection part which can be utilized as one Embodiment of this invention as software. 図３に示した検索方法判定部において用いられる番組分類情報と検出項目との対応づけの一例を示す概略図。Schematic which shows an example of matching with the program classification information used in the search method determination part shown in FIG. 3, and a detection item. 図３に示した解析レベル判定部において用いられる検出項目とどの程度まで詳細に解析すべきかの指標となる解析レベルの対応づけの一例を示す概略図。FIG. 4 is a schematic diagram illustrating an example of correspondence between detection items used in an analysis level determination unit illustrated in FIG. 3 and an analysis level that is an index of how much detail should be analyzed.

Explanation of symbols

１…ビデオレコーダ（映像記録再生装置）、１０…チューナ部、２０…エンコーダ、２４…ディスクドライブユニット、２６…メモリスロット、３０…デコーダ、４０…主制御ブロック（ストリームパーサ）、５０…オンスクリーン表示（ＯＳＤ）制御部、５２…表示装置、１００，１０２…記録媒体（ＤＶＤ規格の光ディスク）、１０４…ＨＤＤ（ハードディスクドライブ）、１０６…記録媒体（カードメモリ）、４１０…オーディオ信号特徴検出部、４２０…デコーダ部、４３０…ストリーム解析部、４３２…シンタックス（Syntax）解析部、４３４…逆量子化部、４３６…ジョイントステレオ（識別）部、４３８…ＴＮＳ（）部、４４０…メタ情報分析部、４５０…適応解析制御部、４５２…部、４５４…部、４５６…部、４６０…特徴パラメータ変換部、４７０…検出処理部、４８０…インデキシング（Indexing）部。 DESCRIPTION OF SYMBOLS 1 ... Video recorder (video recording / reproducing apparatus), 10 ... Tuner part, 20 ... Encoder, 24 ... Disk drive unit, 26 ... Memory slot, 30 ... Decoder, 40 ... Main control block (stream parser), 50 ... On-screen display ( OSD) control unit, 52 ... display device, 100, 102 ... recording medium (DVD standard optical disc), 104 ... HDD (hard disk drive), 106 ... recording medium (card memory), 410 ... audio signal feature detection unit, 420 ... Decoder unit, 430 ... stream analysis unit, 432 ... syntax analysis unit, 434 ... inverse quantization unit, 436 ... joint stereo (identification) unit, 438 ... TNS () unit, 440 ... meta information analysis unit, 450 ... adaptive analysis control unit, 452 ... part, 454 ... part, 456 ... part, 460 ... feature Parameter conversion unit, 470 ... detection processing section, 480 ... Indexing (Indexing) unit.

Claims

An information analysis unit that obtains classification information of the stream from the input stream;
An analysis control unit for specifying to analyze the input stream from at least one of the analysis level, the analysis band, and the analysis channel from the classification information acquired by the information analysis unit;
Analyzing the input stream based on the analysis method specified by the analysis control unit, and converting the feature parameter to a feature parameter to be specified as a detection item;
A detection processing unit that detects whether or not a detection item that is a feature parameter specified by the feature parameter conversion unit is included in the input stream;
A recording unit for recording the detection result of the detection item detected by the detection processing unit so as to be searchable during reproduction of the input stream;
An audio signal feature detection apparatus comprising:

The audio signal feature detection apparatus according to claim 1, wherein the analysis level executable by the analysis control unit includes an analysis level for changing an analysis signal band of encoded data.

The audio signal feature detection apparatus according to claim 1, wherein the analysis level executable by the analysis control unit includes an analysis level for changing an analysis signal band of encoded data and a channel of the analysis signal.

Get the classification information of the stream from the input stream
A method for analyzing the input stream from at least one of an analysis level, an analysis band, and an analysis channel from the obtained classification information is specified.
Analyzes the input stream based on the specified analysis method, converts it into feature parameters to be identified as detection items,
Detect that the detected item that is the specified feature parameter is included in the input stream,
Keep the detected detection items searchable when playing the input stream
An audio signal feature detection method.

5. The analysis method can be specified by executing a plurality of analysis levels as analysis levels of encoded data, and setting an optimum processing load by changing the analysis level as necessary. Audio signal feature detection method.

6. The audio signal feature detection method according to claim 5, wherein the analysis method is specified by changing an analysis level, an analysis signal band, and an analysis channel of encoded data.

A stream separation step of separating the encoded stream into audio encoded data and meta information thereof;
An information analysis step of extracting information of at least an audio signal from the content included in the stream from the separated meta information; and
Based on the specific information extracted,
Whether the detection item can be detected in the first level analysis with reference to at least the audio signal information of the content extracted by the information analysis step,
In the analysis of the second level following the first level, it is determined whether or not there is an analysis designation for a channel of the audio signal. If there is a channel analysis specification, the specified channel is analyzed and the detection item can be detected.
Whether the detection item can be detected in the analysis of the third level following the second level,
Whether it is necessary to detect the detection item by the analysis of the fourth level following the third level,
And detecting a feature of the audio signal,
Indexing the detected detection items so that they can be searched during playback of the input stream;
An audio signal feature detection program including at least.

8. The audio signal feature detection program according to claim 7, wherein the first level in the feature detection step includes at least a step of obtaining an encoding parameter applied when the stream is encoded.

8. The audio signal feature detection program according to claim 7, wherein the second level analysis in the feature detection step includes at least a step of dequantizing the quantized spectrum based on a scale factor to obtain a linear scale spectrum.

In the second level analysis, the presence / absence of analysis channel designation is checked. If there is an analysis channel designation, the quantized spectrum is inversely quantized based on the scale factor for the designated channel, and the linear scale spectrum is obtained. The audio signal feature detection program according to claim 9, further comprising:

In the second level analysis, whether or not an analysis channel is designated is checked. If no analysis channel is designated, the quantized spectrum is inversely quantized based on a scale factor for all channels to obtain a linear scale spectrum. The audio signal feature detection program according to claim 9, comprising at least a step.

8. The audio signal feature detection program according to claim 7, wherein the third level analysis in the feature detection step includes at least a step of obtaining correlation information between channels.

8. The audio signal feature detection program according to claim 7, wherein the fourth level analysis in the feature detection step includes at least a step of obtaining a characteristic including the shape of the detected spectrum.