JP5028651B2

JP5028651B2 - Information processing apparatus and content analysis program

Info

Publication number: JP5028651B2
Application number: JP2008130478A
Authority: JP
Inventors: 千加志杉浦
Original assignee: Fujitsu Mobile Communications Ltd
Current assignee: Fujitsu Mobile Communications Ltd
Priority date: 2008-05-19
Filing date: 2008-05-19
Publication date: 2012-09-19
Anticipated expiration: 2028-05-19
Also published as: JP2009278582A

Description

本発明は情報処理装置およびコンテンツ解析プログラムに関する。 The present invention relates to an information processing apparatus and a content analysis program.

近年、記憶装置の記憶容量が大きくなり、様々なコンテンツを放送や配信などによって取得することができるようになった。これに伴い、コンテンツに対して、所定の区間を検出してインデキシングし、ユーザが所望する区間だけを再生することができる機器が求められている。例えば、放送番組を録画して、再生することを目的とするレコーダにおいて、放送局から送信される信号を受信して記録する際に、ＣＭ区間や見どころ区間を検出してインデキシングしておき、この情報を用いてユーザが見たいシーンであるとインデキシングされたシーンだけを再生することができる録画装置が知られている。 In recent years, the storage capacity of storage devices has increased, and various contents can be acquired by broadcasting or distribution. Accordingly, there is a need for a device that can detect and index a predetermined section of content and reproduce only a section desired by the user. For example, in a recorder intended to record and replay a broadcast program, when receiving and recording a signal transmitted from a broadcast station, CM sections and highlight sections are detected and indexed. There is known a recording apparatus that can reproduce only a scene indexed as a scene that a user wants to see using information.

このように映像音声データから所定の区間をインデキシングする装置は、音響信号や映像信号を解析して所定の特徴量を算出し、この特徴量を用いて所定の区間を検出する。例えば特許文献１には、ＰＣＭ方式によってデジタル化された音声データを周波数領域に直交変換した直交変換係数を算出し、この直交変換係数に基づく特徴量を用いて所定の区間を検出する楽曲区間検出装置が提案されている。
特開２００７−１８０６６９号公報 In this way, an apparatus that indexes a predetermined section from video / audio data calculates a predetermined feature amount by analyzing an audio signal or a video signal, and detects the predetermined section using the feature amount. For example, in Patent Document 1, music section detection is performed in which an orthogonal transform coefficient obtained by orthogonally transforming audio data digitized by the PCM method into a frequency domain is calculated, and a predetermined section is detected using a feature amount based on the orthogonal transform coefficient. A device has been proposed.
JP 2007-180669 A

特許文献１に記載される発明のように、デジタル圧縮された音響信号を解析するためには、全てのフレームに対してデコード処理を行ったうえで特徴量の算出を行う必要がある。 As in the invention described in Patent Document 1, in order to analyze a digitally compressed acoustic signal, it is necessary to calculate a feature amount after performing decoding processing on all frames.

しかしながら、携帯端末のようにロースペックな機器でこのような解析を行うと、全てのフレームに対するデコード処理の負荷が高い。そのため、例えば、ワンセグ放送などから携帯端末が符号化されたコンテンツを受信して記憶しながらリアルタイムに解析を行うようなコンテンツ解析処理が困難である。リアルタイムにコンテンツ解析処理を行わない場合であっても、全てのフレームに対するデコード処理を行うため、電池の消費量が増大する。 However, when such an analysis is performed with a low-spec device such as a portable terminal, the load of decoding processing for all frames is high. For this reason, for example, it is difficult to perform content analysis processing in which a mobile terminal receives and stores encoded content from one seg broadcasting or the like and performs analysis in real time. Even when content analysis processing is not performed in real time, battery consumption increases because decoding processing is performed for all frames.

また、携帯端末が通信によって取得したコンテンツを解析する場合、伝送エラーが生じやすく、精度良く区間を検出することが困難である。 Moreover, when analyzing the content acquired by the mobile terminal through communication, a transmission error is likely to occur, and it is difficult to accurately detect the section.

そこで本発明は、コンテンツの解析精度を維持したままで、省メモリ低演算でコンテンツを解析することができる情報処理装置を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide an information processing apparatus that can analyze content with low memory-saving and low computation while maintaining content analysis accuracy.

上記目的を達成するために、本発明による情報処理装置は、エンコードされたコンテンツから所定の区間を分類する情報処理装置であって、前記コンテンツに含まれるフレームを所定の間引き率で間引いてデコードするデコード手段と、前記デコード手段によってデコードされたフレームの特徴量を算出するフレーム特徴量算出手段と、所定のフレーム群からなるセグメントの特徴量を離散化して算出するときに、前記フレーム特徴量算出手段によって算出されたフレームの特徴量を用いてセグメント単位の特徴量を算出し、算出したセグメント単位の特徴量と基準値との差が所定の値以上であれば、このセグメント単位の特徴量を用いてセグメントの特徴量を離散化し、算出したセグメント単位の特徴量と基準値との差が所定の値未満であれば、セグメント単位の特徴量と基準値との差が所定の値以上となるか全てのフレームの特徴量によってセグメント単位の特徴量を算出するまでフレームの間引き率を変更して前記デコード手段によるデコードを行って、最終的に得られたセグメント単位の特徴量を用いてセグメントの特徴量を離散化するセグメント特徴量離散化手段と、前記セグメント単位特徴量離散化手段によって取得された離散化されたセグメントの特徴量を用いてコンテンツの区間分類を行う区間分類手段とを有することを特徴としている。 In order to achieve the above object, an information processing apparatus according to the present invention is an information processing apparatus that classifies a predetermined section from encoded content, and decodes frames included in the content by thinning out at a predetermined thinning rate. A decoding means; a frame feature quantity calculating means for calculating a feature quantity of a frame decoded by the decoding means; and a frame feature quantity calculating means for discretizing a feature quantity of a segment consisting of a predetermined frame group. If the difference between the calculated feature value of the segment unit and the reference value is greater than or equal to a predetermined value, the feature value of the segment unit is used. The segment feature value is discretized and the difference between the calculated segment feature value and the reference value is less than the predetermined value. For example, the decoding means performs decoding by changing the frame thinning rate until the difference between the segment unit feature amount and the reference value is equal to or greater than a predetermined value or until the segment unit feature amount is calculated based on the feature amount of all frames. The segment feature quantity discretizing means for discretizing the segment feature quantity using the finally obtained segment unit feature quantity, and the discretization obtained by the segment unit feature quantity discretizing means It has the section classification means which classifies the section of the content using the feature amount of the segment.

また、本発明によるコンテンツ解析プログラムは、エンコードされたコンテンツから所定の区間を分類するコンテンツ解析プログラムであって、コンテンツに含まれるフレームを所定の間引き率で間引いてデコードするデコード機能と、前記デコード機能によってデコードされたフレームの特徴量を算出するフレーム特徴量算出機能と、所定のフレーム群からなるセグメントの特徴量を離散化して算出するときに、前記フレーム特徴量算出機能によって算出されたフレームの特徴量を用いてセグメント単位の特徴量を算出し、算出したセグメント単位の特徴量と基準値との差が所定の値以上であれば、このセグメント単位の特徴量を用いてセグメントの特徴量を離散化し、算出したセグメント単位の特徴量と基準値との差が所定の値未満であれば、セグメント単位の特徴量と基準値との差が所定の値以上となるか全てのフレームの特徴量によってセグメント単位の特徴量を算出するまでフレームの間引き率を変更して前記デコード機能によるデコードを行って、最終的に得られたセグメント単位の特徴量を用いてセグメントの特徴量を離散化するセグメント特徴量離散化機能と、前記セグメント単位特徴量離散化機能によって取得された離散化されたセグメントの特徴量を用いてコンテンツの区間分類を行う区間分類機能とを有することを特徴としている。 The content analysis program according to the present invention is a content analysis program for classifying a predetermined section from encoded content, the decoding function for thinning and decoding the frames included in the content at a predetermined thinning rate, and the decoding function A frame feature amount calculation function for calculating the feature amount of the frame decoded by the step S, and a feature of the frame calculated by the frame feature amount calculation function when the feature amount of a segment consisting of a predetermined frame group is discretized and calculated. If the difference between the calculated feature value of the segment unit and the reference value is equal to or greater than a predetermined value, the segment feature value is discrete using the segment unit feature value. The difference between the calculated feature value of the segment unit and the reference value is less than the predetermined value. If the difference between the segment unit feature amount and the reference value is equal to or greater than a predetermined value or the segment unit feature amount is calculated from the feature amount of all frames, the frame decimation rate is changed and decoding is performed by the decoding function. The segment feature discretization function that discretizes the segment feature quantity using the finally obtained segment unit feature quantity, and the discretized discretization obtained by the segment unit feature quantity discretization function It has a section classification function for performing section classification of content using the feature amount of the segment.

本発明によれば、コンテンツの解析精度を維持したままで、省メモリ低演算でコンテンツを解析することができる情報処理装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the information processing apparatus which can analyze a content by memory-saving low calculation can be provided, maintaining the analysis accuracy of a content.

以下、本発明の実施形態について、ワンセグ放送を受信して記録可能な携帯電話機を例として図面を参照しながら説明する。ただし、本発明は携帯電話機に限定されず、マルチメディアコンテンツを再生可能な機器であれば良い。例えば、ＤＶＤレコーダなどのようにデジタル放送を受信して記録再生する装置でも良いし、インターネットを介して取得したコンテンツを再生する再生機器であっても良い。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, taking as an example a mobile phone capable of receiving and recording one-segment broadcasting. However, the present invention is not limited to a mobile phone, and any device capable of reproducing multimedia contents may be used. For example, it may be a device that receives and broadcasts digital broadcasts such as a DVD recorder, or may be a playback device that plays back content acquired via the Internet.

図１は、本実施形態に係る携帯電話機の構成を示すブロック図である。携帯電話機１全体の制御は、制御部５１によって行われ、制御部５１はＣＰＵ、ＲＯＭ、ＲＡＭから構成される。また、携帯電話機１は、無線通信用のアンテナ５２とワンセグ放送受信用のアンテナ５８を備えている。 FIG. 1 is a block diagram showing a configuration of a mobile phone according to the present embodiment. Control of the entire mobile phone 1 is performed by the control unit 51, and the control unit 51 includes a CPU, a ROM, and a RAM. The mobile phone 1 also includes an antenna 52 for wireless communication and an antenna 58 for receiving one-segment broadcasting.

アンテナ５２は、基地局から出力された電波を受信し、電気信号に変換する。ＲＦ部５３は、アンテナ５２から入力された電気信号を周波数変換して、ベースバンド信号を通信制御部５４に出力する。通信制御部５４は、このベースバンド信号を変調し、誤り訂正を行ったデータをメディアごとに分離する。例えば、通話音声を受信した場合には、受信データに含まれるオーディオデータを復号し、音声入出力部５５に出力する。また例えば、テレビ電話などのように受信データに動画像データが含まれていれば、この動画像データを復号し、表示部５６に出力する。 The antenna 52 receives the radio wave output from the base station and converts it into an electrical signal. The RF unit 53 converts the frequency of the electrical signal input from the antenna 52 and outputs a baseband signal to the communication control unit 54. The communication control unit 54 modulates the baseband signal and separates the error-corrected data for each medium. For example, when a call voice is received, the audio data included in the received data is decoded and output to the voice input / output unit 55. For example, if the received data includes moving image data such as a videophone, the moving image data is decoded and output to the display unit 56.

アンテナ５８はワンセグ放送の電波を受信し、選局された物理チャネルの信号を取得して電気信号に変換し、ＲＦ部５９に出力する。ＲＦ部５９では、受信信号をベースバンド信号に変換し、放送受信部６０に出力する。放送受信部６０では、入力された信号をＡＤ変換した後に、ＯＦＤＭ復調し、誤り訂正を行って、ＴＳ（Transport Stream）パケットを取得する。 The antenna 58 receives the radio wave of the one-segment broadcasting, acquires the selected physical channel signal, converts it to an electrical signal, and outputs it to the RF unit 59. The RF unit 59 converts the received signal into a baseband signal and outputs it to the broadcast receiving unit 60. The broadcast receiving unit 60 performs AD conversion on the input signal, performs OFDM demodulation, performs error correction, and obtains a TS (Transport Stream) packet.

この受信データを再生する場合には、ＴＳパケットからＡｕｄｉｏＥＳ、ＶｉｄｅｏＥＳ、ＴｅｘｔＥＳのＥＳ（ＥｌｅｍｅｎｔａｒｙＳｔｒｅａｍ）を分離し、それぞれのＥＳがデコードされる。デコードされたＡｕｄｉｏＥＳは、音声入出力部５５に出力する。また、ＶｉｄｅｏＥＳとＴｅｘｔＥＳは、表示部５６に出力する。 When reproducing this received data, the ES (Elementary Stream) of Audio ES, Video ES, and Text ES is separated from the TS packet, and each ES is decoded. The decoded Audio ES is output to the audio input / output unit 55. VideoES and TextES are output to the display unit 56.

それに対してＴＳパケットを保存する場合、この放送コンテンツから所定の区間をインデキシングしながら記憶部５７に保存することができる。コンテンツ処理部１００は、このコンテンツインデキシングの処理を行う。コンテンツ処理部１００で例えば、受信したワンセグ放送を録画するときに、ＣＭ区間を検出してインデキシングしながら記憶しておくことによって、視聴時にＣＭ区間を除いて視聴することやＣＭ区間だけを視聴することなどが可能となる。 On the other hand, when TS packets are stored, a predetermined section can be stored in the storage unit 57 while indexing from the broadcast content. The content processing unit 100 performs the content indexing process. For example, when the received one-segment broadcasting is recorded in the content processing unit 100, by viewing and storing the CM section while indexing, the CM section is viewed or only the CM section is viewed during viewing. It becomes possible.

図２は、コンテンツ処理部１００の構成を示すブロック図である。コンテンツ処理部１００は、Ｄｅｍｕｘ部１０１、伝送エラー検出フィルタ部１０２、デコード部１０３、コンテンツ解析部１０４、区間分類部１０５、ＴＳデータ一時記憶部１０６、解析結果統合部１０７を有する。 FIG. 2 is a block diagram illustrating a configuration of the content processing unit 100. The content processing unit 100 includes a demux unit 101, a transmission error detection filter unit 102, a decoding unit 103, a content analysis unit 104, a section classification unit 105, a TS data temporary storage unit 106, and an analysis result integration unit 107.

放送受信部６０によって取得されたＴＳパケットは、Ｄｅｍｕｘ部１０１およびＴＳデータ一時記憶部１０６に入力される。ＴＳデータ一時記憶部１０６では、解析結果統合部１０７によって対応するＴＳパケットの処理が終了するまで一時的にＴＳパケットを記憶する。なお、本実施形態では、ＴＳデータの一時記憶先としてＴＳデータ一時記憶部１０６を設けるとして説明するが、ＴＳデータ一時記憶部１０６を設けず、後述のマルチメディアコンテンツＤＢ部５７ａにＴＳデータを一時的に保存する構成にしても良い。 The TS packet acquired by the broadcast receiving unit 60 is input to the Demux unit 101 and the TS data temporary storage unit 106. The TS data temporary storage unit 106 temporarily stores TS packets until the analysis result integration unit 107 finishes processing of the corresponding TS packet. In the present embodiment, the TS data temporary storage unit 106 is provided as a temporary storage destination of TS data. However, the TS data temporary storage unit 106 is not provided, and TS data is temporarily stored in a multimedia content DB unit 57a described later. It is also possible to make a configuration in which the data is stored.

Ｄｅｍｕｘ部１０１では、入力されたＴＳパケットからＡｕｄｉｏＥＳ、ＶｉｄｅｏＥＳ、ＴｅｘｔＥＳの少なくとも１つのＥＳ（ＥｌｅｍｅｎｔａｒｙＳｔｒｅａｍ）を分離抽出する。 The demux unit 101 separates and extracts at least one ES (Elementary Stream) of AudioES, VideoES, and TextES from the input TS packet.

分離抽出されたＥＳは、伝送エラー検出フィルタ部１０２による伝送エラーの検出およびデコード部１０３によるデコード処理が行われる。そして、コンテンツ解析部１０４は、デコード部１０３から入力されたデコードデータを用いてコンテンツの解析を行う。 The separated and extracted ES is subjected to transmission error detection by the transmission error detection filter unit 102 and decoding processing by the decoding unit 103. Then, the content analysis unit 104 analyzes the content using the decoded data input from the decoding unit 103.

区間分類部１０５では、コンテンツ解析部１０４によって解析された結果を用いてコンテンツから所定の区間を分類する処理を行う。 The section classification unit 105 performs processing for classifying a predetermined section from the content using the result analyzed by the content analysis unit 104.

解析結果統合部１０７では、区間分類部１０５によって解析された結果とＴＳデータ一時記憶部１０６に記憶されたデータとを関連付けて統合し、メタデータ付きマルチメディアコンテンツとして出力し、記憶部５７内のマルチメディアコンテンツＤＢ部５７ａにこのメタデータ付きマルチメディアコンテンツを記憶させる。 In the analysis result integration unit 107, the result analyzed by the section classification unit 105 and the data stored in the TS data temporary storage unit 106 are associated and integrated, output as multimedia content with metadata, and stored in the storage unit 57. This multimedia content with metadata is stored in the multimedia content DB unit 57a.

このような構成によって、コンテンツを解析して、解析結果をメタデータとしてコンテンツに付すことができる。以下では、コンテンツ解析部１０４、伝送エラー検出フィルタ部１０２、区間分類部１０５について詳しく説明する。 With such a configuration, it is possible to analyze the content and attach the analysis result to the content as metadata. Hereinafter, the content analysis unit 104, the transmission error detection filter unit 102, and the section classification unit 105 will be described in detail.

（コンテンツ解析部１０４）
図３は、コンテンツに対する区間分類処理を行うための特徴量を抽出するコンテンツ解析部１０４のブロック図である。コンテンツ解析部１０４は、解析制御部１０４ａ、セグメント単位特徴量抽出部１０４ｂ、時間継続特徴量抽出部１０４ｃを含む。セグメント単位特徴量抽出部１０４ｂと時間継続特徴量抽出部１０４ｃは、コンテンツに対してどのような区間分類処理を行うかに応じて、いずれか一方もしくは両方が動作する。また、セグメント単位特徴量抽出部１０４ｂによって抽出される第１の特徴量を用いて時間継続特徴量抽出部１０４ｃを動作させても良い。 (Content analysis unit 104)
FIG. 3 is a block diagram of the content analysis unit 104 that extracts feature amounts for performing section classification processing on content. The content analysis unit 104 includes an analysis control unit 104a, a segment unit feature amount extraction unit 104b, and a time duration feature amount extraction unit 104c. Either one or both of the segment unit feature amount extraction unit 104b and the time continuation feature amount extraction unit 104c operate depending on what kind of section classification processing is performed on the content. Further, the time continuation feature amount extraction unit 104c may be operated using the first feature amount extracted by the segment unit feature amount extraction unit 104b.

解析制御部１０４ａは、デコード部１０３に対してフレームを所定の間引き率で間引いてデコードするようデコード要求を出力する。そして、デコード部１０３からデコードデータを取得すると、区間分類部１０５において区間分類処理を行うために用いる特徴量を抽出するための処理を動作させるようセグメント単位特徴量抽出部１０４ｂまたは時間継続特徴量抽出部１０４ｃを制御する。解析制御部１０４ａがセグメント単位特徴量抽出部１０４ｂを動作させるよう制御する場合には、デコード部１０３から取得したデコードデータをセグメント単位特徴量抽出部１０４ｂに出力するとともに、セグメント単位特徴量抽出部１０４ｂでの特徴量抽出処理に用いる第１の解析要件を出力する。また、解析制御部１０４ａが時間継続特徴量抽出部１０４ｃを動作させるよう制御する場合には、デコード部１０３から取得したデコードデータを時間継続特徴量抽出部１０４ｃに出力するとともに、時間継続特徴量抽出部１０４ｃでの特徴量抽出処理に用いる第２の解析要件を出力する。 The analysis control unit 104a outputs a decoding request to the decoding unit 103 so as to decode the frame with a predetermined thinning rate. When the decoded data is acquired from the decoding unit 103, the segment unit feature amount extraction unit 104b or the time continuation feature amount extraction is performed so that the section classification unit 105 operates the process for extracting the feature amount used for the section classification process. The unit 104c is controlled. When the analysis control unit 104a controls the segment unit feature amount extraction unit 104b to operate, the decode data acquired from the decode unit 103 is output to the segment unit feature amount extraction unit 104b and the segment unit feature amount extraction unit 104b. The first analysis requirement used for the feature quantity extraction process at is output. When the analysis control unit 104a controls the time continuation feature amount extraction unit 104c to operate, the decode data acquired from the decoding unit 103 is output to the time continuation feature amount extraction unit 104c and the time continuation feature amount extraction is performed. The second analysis requirement used for the feature amount extraction processing in the unit 104c is output.

なお、コンテンツ解析部１０４において、特徴量として算出する値が波形情報ではなく周波数情報であれば、デコード部１０３でフルデコードする必要が無い。例えば、コンテンツ解析部１０４において解析するのがＡｕｄｉｏＥＳのとき、コンテンツ解析部１０４においてスペクトルやＭＦＣＣなどの周波数情報を算出するのであれば、デコード部１０３でフルデコードしてＰＣＭを生成するのではなく、ＭＤＣＴまでデコードすれば、ＭＤＣＴをＰＣＭに変換する処理、ならびに、ＰＣＭを周波数情報に変換する処理を削減することができる。 If the value calculated as the feature amount in the content analysis unit 104 is not waveform information but frequency information, the decoding unit 103 does not need to perform full decoding. For example, when the content analysis unit 104 analyzes AudioES, if the content analysis unit 104 calculates frequency information such as spectrum and MFCC, the decoding unit 103 does not perform full decoding to generate PCM. Decoding up to MDCT can reduce the process of converting MDCT to PCM and the process of converting PCM to frequency information.

これによって、例えば、ワンセグ放送を録画中に逐次コンテンツを解析し、録画終了とほぼ同時に解析結果とコンテンツとがリンクする場合、録画中にユーザは番組を視聴していないため、情報をフルデコードすることなく、ＭＤＣＴまでＡｕｄｉｏＥＳをデコードすることによって、処理を大幅に削減することができる。 As a result, for example, when content is sequentially analyzed during recording of one-segment broadcasting and the analysis result and content are linked almost simultaneously with the end of recording, the user does not watch the program during recording, so the information is fully decoded. Without decoding, it is possible to greatly reduce the processing by decoding AudioES up to MDCT.

セグメント単位特徴量抽出部１０４ｂは、入力されたデコードデータからフレームごとの特徴量を算出し、このフレームごとの特徴量を用いて、複数のフレームから構成されるセグメント単位での特徴量を算出し、この値を基準値に基づいて離散化する。 The segment unit feature quantity extraction unit 104b calculates a feature quantity for each frame from the input decoded data, and uses the feature quantity for each frame to calculate a feature quantity for each segment composed of a plurality of frames. This value is discretized based on the reference value.

ここでは、受信したワンセグ放送から、Ｄｅｍｕｘ部１０１によってＡｕｄｉｏＥＳを分離抽出し、区間分類部１０５においてＡｕｄｉｏＥＳを音楽が流れている区間と音楽が流れていない区間とに分類する場合を例に、セグメント単位特徴量抽出部１０４ｂでの特徴量抽出処理について説明する。 Here, an example is shown in which the audio ES is separated and extracted from the received one-segment broadcasting by the Demux unit 101, and the segment classification unit 105 classifies the Audio ES into a section in which music is flowing and a section in which music is not flowing. The feature amount extraction processing in the feature amount extraction unit 104b will be described.

ステレオ音源の場合、音楽が流れている区間はＬとＲの差分が大きく、音楽が流れていない区間はＬとＲの差分が小さい傾向がある。そこで、セグメント単位特徴量抽出部１０４ｂではＬとＲの差分の大きさをセグメント単位で抽出し、そのセグメント単位特徴量によってセグメントを離散化（ここでは２値化）して、区間分類部１０５において音楽区間と非音楽区間に分類する。 In the case of a stereo sound source, the difference between L and R tends to be large in a section where music is flowing, and the difference between L and R tends to be small in a section where music is not flowing. Therefore, the segment unit feature quantity extraction unit 104b extracts the magnitude of the difference between L and R in segment units, discretizes the segment by the segment unit feature quantity (in this case, binarization), and the section classification unit 105 Classify into music and non-music segments.

図４は、セグメント単位特徴量抽出部１０４ｂで処理されるデコードデータの例を示したイメージ図である。図４では、Ｍ個のフレームから構成されるセグメントの特徴量を抽出している。セグメント単位特徴量抽出部１０４ｂには、解析制御部１０４ａからデコードデータが入力されるが、このデコードデータは、解析制御部１０４ａで設定された間引き率で間引いてデコードされたフレームのデータである。図４は、デコード部１０３において１／２の間引き率でフレームがデコードされている例であり、コンテンツのフレームのうちデコードされたフレームを斜線で示している。セグメント単位特徴量抽出部１０４ｂには、このデコードされたフレームのデータが入力される。 FIG. 4 is an image diagram showing an example of decoded data processed by the segment unit feature amount extraction unit 104b. In FIG. 4, the feature amount of a segment composed of M frames is extracted. The segment unit feature quantity extraction unit 104b receives decode data from the analysis control unit 104a. This decode data is frame data decoded by thinning out at a thinning rate set by the analysis control unit 104a. FIG. 4 shows an example in which frames are decoded at the decoding unit 103 at a decimation rate of 1/2, and the decoded frames among the content frames are indicated by hatching. The decoded frame data is input to the segment unit feature extraction unit 104b.

図５は、セグメント単位特徴量抽出部１０４ｂの処理を示すフローチャートである。図４および図５を用いて、セグメント単位特徴量抽出部１０４ｂの処理について説明する。 FIG. 5 is a flowchart showing the processing of the segment unit feature amount extraction unit 104b. The process of the segment unit feature quantity extraction unit 104b will be described with reference to FIGS.

まず、セグメント単位特徴量抽出部１０４ｂは、解析制御部１０４ａで設定された間引き率でフレームが間引かれてデコードされたデータを取得する（Ｓ１０１）。解析制御部１０４ａでの間引き率の設定については、後ほど詳述する。例えば図４では、間引き率１／２であり、フレーム（１）、フレーム（３）、・・・フレーム（Ｍ−１）という１つおきにデコードされたフレームを示している。 First, the segment unit feature amount extraction unit 104b acquires data decoded by thinning out frames at the thinning rate set by the analysis control unit 104a (S101). The setting of the thinning rate in the analysis control unit 104a will be described in detail later. For example, FIG. 4 shows a frame that is decoded every other frame (1), frame (3),.

このようなデコードデータを取得すると、セグメント単位特徴量抽出部１０４ｂは、デコードされたフレームごとに特徴量を算出する（Ｓ１０２）。音楽区間と非音楽区間とを分類するために、ＬとＲの差分を算出する。なお、図４では、第ｎセグメントのｍ番目のフレームの特徴量をｘ_ｎ（ｍ）と表現している。 When such decoded data is acquired, the segment unit feature amount extraction unit 104b calculates a feature amount for each decoded frame (S102). In order to classify the music section and the non-music section, the difference between L and R is calculated. In FIG. 4, the feature quantity of the mth frame of the nth segment is expressed as x _n (m).

そして、デコードされたフレームの特徴量を用いて、セグメント単位の特徴量を概算する（Ｓ１０３）。セグメント単位の特徴量の概算値は、例えば、図４に示したようにデコードされたフレームの特徴量の平均ベクトルとして求めることができる。 Then, the feature amount of the segment unit is approximated using the decoded frame feature amount (S103). The approximate value of the segment-based feature value can be obtained, for example, as an average vector of the feature values of the decoded frames as shown in FIG.

そして、このセグメント単位の特徴量の概算値とセグメントを２値化する基準値との差が所定の閾値以上か未満かを判定する（Ｓ１０４）。この２値化のための基準値と所定の閾値は、解析制御部１０４ａから第１の解析要件としてセグメント単位特徴量抽出部１０４ｂに入力された値である。 Then, it is determined whether or not the difference between the approximate value of the segment feature amount and the reference value for binarizing the segment is equal to or greater than a predetermined threshold (S104). The reference value and the predetermined threshold value for binarization are values input from the analysis control unit 104a to the segment unit feature quantity extraction unit 104b as the first analysis requirement.

セグメント単位の特徴量の概算値が基準値を大幅に上回っている場合や、下回っている場合には、同じセグメント内の全てのフレームの特徴量を用いてセグメント単位の特徴量を算出したとしても、２値化の判定が覆る可能性は低い。そこで、特徴量として算出する値や、間引き率などに応じて、どの程度基準値から離れていると２値化の判定が覆る可能性が低いのかを予め統計的に調べておき、解析制御部１０４ａがそれぞれの条件に対応した閾値の情報として管理する。そして、解析制御部１０４ａがセグメント単位特徴量抽出部１０４ｂを制御するときに、間引き率や基準値などに応じた閾値と基準値を第１の解析要件として出力している。 If the estimated value of the segment-wise feature value is significantly above or below the reference value, the feature value of the segment unit may be calculated using the feature values of all the frames in the same segment. There is a low possibility that the determination of binarization will be overturned. Therefore, according to the value calculated as the feature amount, the thinning rate, etc., it is statistically examined in advance how far away from the reference value there is a possibility that the determination of binarization will be covered, and the analysis control unit 104a manages as threshold information corresponding to each condition. When the analysis control unit 104a controls the segment unit feature amount extraction unit 104b, the threshold value and the reference value corresponding to the thinning rate, the reference value, and the like are output as the first analysis requirement.

つまりステップＳ１０４では、特徴量の概算値と閾値との比較によって、そのセグメント単位の特徴量の概算値によって２値化を行うか否かを判定している。そこで、ステップＳ１０４においてＹｅｓと判定された場合には、セグメント単位の特徴量の概算値によって、セグメントの特徴量を２値化する（Ｓ１０５〜Ｓ１０７）。 That is, in step S104, it is determined whether or not to perform binarization based on the approximate value of the feature value of the segment unit by comparing the approximate value of the feature value and the threshold value. Therefore, if it is determined Yes in step S104, the segment feature value is binarized based on the approximate value of the segment unit feature value (S105 to S107).

また、ステップＳ１０４においてＮｏと判定された場合には、解析制御部１０４ａから受信したデコードデータが、セグメント内の全てのフレームをデコードしたデータではないならば（Ｓ１０８のＮｏ）、解析制御部１０４ａに対してフレームの間引き率を変更するよう要求を出力する（Ｓ１０９）。 If it is determined No in step S104, if the decoded data received from the analysis control unit 104a is not data obtained by decoding all the frames in the segment (No in S108), the analysis control unit 104a is informed. In response to this, a request to change the frame thinning rate is output (S109).

セグメント単位特徴量抽出部１０４ｂからフレームの間引き率の変更要求を取得した解析制御部１０４ａは、デコード部１０３に対してフレームの間引き率を指定して、デコード要求を出力する。変更後のフレームの間引き率は、変更前の間引き率よりも低くなるように設定し、以前デコードされなかったフレームの全部または一部を追加でデコードする。これによって、１セグメントを構成するフレームのうち、デコードされたフレームの割合が多くなり、当該セグメントの特徴量をより正確に概算することができる。 The analysis control unit 104a that has acquired the frame decimation rate change request from the segment unit feature amount extraction unit 104b specifies the frame decimation rate to the decoding unit 103 and outputs a decode request. The frame thinning rate after the change is set to be lower than the thinning rate before the change, and all or a part of the frame that has not been decoded before is additionally decoded. As a result, the proportion of decoded frames in the frames constituting one segment increases, and the feature amount of the segment can be estimated more accurately.

そして、セグメント単位特徴量抽出部１０４ｂは、セグメント単位の特徴量の概算値と基準値との差が所定の閾値以上の場合（Ｓ１０４のＹｅｓ）、もしくは、１セグメントに含まれる全てのフレームの特徴量を用いてセグメント単位の特徴量を算出した場合（Ｓ１０８のＮｏ）まで、間引き率を変更して特徴量を概算し、その概算値によって２値化できるか否かの判定を行う処理を繰り返す。 Then, the segment unit feature amount extraction unit 104b determines whether the difference between the approximate value of the segment unit feature amount and the reference value is equal to or greater than a predetermined threshold (Yes in S104) or the features of all frames included in one segment. Until the feature amount of the segment unit is calculated using the amount (No in S108), the feature amount is approximated by changing the thinning rate, and the process of determining whether the binarization can be performed based on the approximate value is repeated. .

このような処理に従ってセグメント単位特徴量抽出部１０４ｂが判定したセグメントごとの特徴量の２値化結果は、第１の特徴量として区間分類部１０５に出力する。 The binarization result of the feature quantity for each segment determined by the segment unit feature quantity extraction unit 104b according to such processing is output to the section classification unit 105 as the first feature quantity.

以上のように、セグメント単位の特徴量を２値化するときに、最初から１セグメントに含まれる全てのフレームをデコードするのではなく、フレームが間引かれたデコードデータによってセグメント単位の特徴量を２値化できる場合があるため、デコードにかかる処理量を削減することができる。また、セグメント単位の特徴量の概算値が２値化の基準値に近い場合には、セグメント単位の特徴量をより精度よく算出するよう、デコードされなかった全てのフレームもしくは一部のフレームを追加でデコードさせることができるため、特徴量算出の精度劣化を抑えることができる。 As described above, when binarizing the feature value of the segment unit, not all the frames included in one segment are decoded from the beginning, but the feature value of the segment unit is determined by the decoded data obtained by thinning out the frames. Since there are cases where binarization is possible, the amount of processing required for decoding can be reduced. Also, if the approximate value of the segment-wise feature value is close to the binarization reference value, add all or some of the frames that were not decoded so that the segment-wise feature value can be calculated more accurately Therefore, it is possible to suppress degradation in accuracy of feature quantity calculation.

なお、ステップＳ１０４において、所定の閾値を０に設定することも可能である。この場合、ステップＳ１０８およびＳ１０９は実行されることが無く、最初に算出したセグメント単位の特徴量の概算値を用いて特徴量の２値化を行うため、特徴量算出の精度は劣化するが、処理量を更に削減することが可能である。 In step S104, the predetermined threshold value can be set to 0. In this case, steps S108 and S109 are not executed, and the feature value is binarized using the approximate value of the feature value calculated for each segment. It is possible to further reduce the processing amount.

なお、セグメント単位の特徴量を概算するときに用いるデコードデータのフレーム間引き率は、既に特徴量の算出を行ったセグメントの特徴量に応じて設定しても良い。例えば、１つ前のセグメント単位の特徴量が基準値付近ならば、現在のセグメントもセグメント単位の特徴量が基準値付近になる可能性が高いため、最初から間引き率を低めに設定し、１つ前のセグメント単位の特徴量が基準値から大幅に離れている場合には、最初の間引き率を高めに設定しても良い。 It should be noted that the frame decimation rate of the decoded data used when approximating the segment unit feature amount may be set according to the segment feature amount for which the feature amount has already been calculated. For example, if the feature value of the previous segment unit is near the reference value, the current segment is likely to have the feature value of the segment unit near the reference value. If the feature amount of the previous segment unit is far from the reference value, the initial thinning rate may be set higher.

また例えば、過去の複数のデータを用いて線形予測やスプライン補間を行い、現在のセグメントの特徴量を予測し、予測結果が基準値付近ならば、間引き率を低めに設定し、基準値から大幅に離れている場合には、間引き率を高めに設定しても良い。 In addition, for example, linear prediction or spline interpolation is performed using a plurality of past data, and the feature amount of the current segment is predicted. If the prediction result is close to the reference value, the decimation rate is set to a low value and greatly increased from the reference value. If they are far away, the thinning rate may be set higher.

また、前述の例では、音楽性を判定するために特徴量としてＬとＲの差分を算出し、特徴量を２値化する場合について説明したが、特徴量として算出する値は、どのような区間を分類するかによって異なる。例えば、分類する区間によって、パワー値、スペクトル、メルフィルタバンクケプストラム係数（ＭＦＣＣ）、ＬＰＣケプストラム、分類する区間ごとに予め定められたパタンとの類似度などであっても良い。また、２値化する場合について説明したが、３値化、４値化などの離散化であっても、このような方法を用いることによって処理量を削減することができる。 In the above-described example, the difference between L and R is calculated as a feature quantity to determine musicality, and the feature quantity is binarized. However, what value is calculated as the feature quantity? It depends on whether the section is classified. For example, the power value, spectrum, mel filter bank cepstrum coefficient (MFCC), LPC cepstrum, similarity to a pattern predetermined for each class to be classified may be used depending on the class to be classified. Further, the case of binarization has been described, but even with discretization such as ternarization and quaternarization, the processing amount can be reduced by using such a method.

時間継続特徴量抽出部１０４ｃは、入力されたデコードデータから、フレームごとの特徴量を算出し、特徴量が所定の基準を満たすフレームが所定数以上継続する場合に、そのフレームを特定の区間として抽出する。 The time continuation feature amount extraction unit 104c calculates a feature amount for each frame from the input decoded data. When a predetermined number or more of frames satisfying a predetermined criterion continues, the frame is set as a specific section. Extract.

以下の時間継続特徴量抽出部１０４ｃの説明では、コンテンツから無音の区間を抽出する場合を例に用いる。コンテンツから無音の区間を抽出する場合には、特徴量としてフレームごとにパワーを算出し、所定の基準値未満のパワーを持つフレームを”ｈ”、所定の基準値以上のパワーを持つフレームを”ｌ”として２値化する。そして、”ｈ”のフレームが所定の継続数（ここでは３として説明する）以上連続した場合に、その区間を無音区間”Ｈ”として抽出する。 In the following description of the time continuation feature amount extraction unit 104c, a case where a silent section is extracted from content is used as an example. When a silent section is extracted from the content, the power is calculated for each frame as a feature quantity, and a frame having a power less than a predetermined reference value is “h”, and a frame having a power greater than the predetermined reference value is “ It is binarized as l ″. When a frame of “h” continues for a predetermined number of continuations (explained here as 3), the section is extracted as a silent section “H”.

時間継続特徴量抽出部１０４ｃには、解析制御部１０４ａからデコードデータが入力される。図６は、時間継続特徴量抽出部１０４ｃで処理されるデコードデータの例を示したイメージ図である。解析制御部１０４ａから時間継続特徴量抽出部１０４ｃへは、まず図６において斜線で示した３つおきのフレームがデコード部１０３によってデコードされて入力される。フレームの間引きは、無音区間”Ｈ”を判定するための特徴量”ｈ”継続数に基づいて決定される。ここでは、”ｈ”が３フレーム継続すると無音区間”Ｈ”と判定するため、３フレームおきのフレームがデコードされている。 The decode data is input from the analysis control unit 104a to the time continuation feature amount extraction unit 104c. FIG. 6 is an image diagram illustrating an example of decoded data processed by the time continuation feature amount extraction unit 104c. First, every third frame shown by hatching in FIG. 6 is decoded by the decoding unit 103 and input from the analysis control unit 104a to the time continuation feature amount extraction unit 104c. The frame thinning is determined based on the number of continuations of the characteristic amount “h” for determining the silent section “H”. Here, every three frames are decoded in order to determine a silent period “H” when “h” continues for three frames.

図７は、時間継続特徴量抽出部１０４ｃの処理を示すフローチャートである。時間継続特徴量抽出部１０４ｃは、解析制御部１０４ａからデコードデータを取得する（Ｓ２０１）。このデコードデータは、図６のように３つおきにデコードされたフレームである。時間継続特徴量抽出部１０４ｃは、取得したデコードデータに対して特徴量の算出を行い、算出された特徴量と所定の基準値とを比較して、”ｈ”と”ｌ”とに２値化する（Ｓ２０２）。 FIG. 7 is a flowchart showing processing of the time continuation feature amount extraction unit 104c. The time continuation feature amount extraction unit 104c acquires the decoded data from the analysis control unit 104a (S201). This decoded data is a frame decoded every three as shown in FIG. The time continuation feature amount extraction unit 104c calculates a feature amount for the acquired decoded data, compares the calculated feature amount with a predetermined reference value, and binarizes “h” and “l”. (S202).

”ｈ”と判定されたフレームは、無音区間（”ｈ”のフレームが所定数以上連続する区間）を構成するフレームである可能性がある。そこで、”ｈ”と判定されたフレームを抽出し（Ｓ２０３）、そのフレームを含めた無音区間が存在するか否かを判定する。まず、ステップＳ２０３で抽出したフレームの１つ前のフレームをデコードするよう解析制御部１０４ａからデコード部１０３に要求を出す（Ｓ２０４）。時間継続特徴量抽出部１０４ｃがデコード部１０３によってデコードされたデータを取得すると、フレーム単位での特徴量を算出し、その特徴量が所定の閾値以上であるか否かに応じて、”ｈ”と”ｌ”に２値化する。（Ｓ２０５）。このフレームが”ｈ”である場合には（Ｓ２０６のＹｅｓ）、更に１つ前のフレームに対するデコード要求を出力し（Ｓ２０７）、デコードされたフレームの特徴量を２値化する。このステップＳ２０５〜Ｓ２０７の処理は、フレームの特徴量が”ｌ”と判定されたフレームが現れるまで繰り返される。これによって、特徴量が”ｈ”と判定されたフレームがどのフレームから連続しているのかが分かる。 There is a possibility that a frame determined as “h” is a frame constituting a silent section (a section in which a predetermined number or more of “h” frames continue). Therefore, a frame determined as “h” is extracted (S203), and it is determined whether there is a silent section including the frame. First, the analysis control unit 104a issues a request to the decoding unit 103 to decode the frame immediately before the frame extracted in step S203 (S204). When the time continuation feature amount extraction unit 104c acquires the data decoded by the decoding unit 103, the feature amount is calculated for each frame, and “h” is determined depending on whether the feature amount is equal to or greater than a predetermined threshold. And binarized into “l”. (S205). If this frame is “h” (Yes in S206), a decoding request for the previous frame is output (S207), and the feature amount of the decoded frame is binarized. The processes in steps S205 to S207 are repeated until a frame whose frame feature amount is determined to be “l” appears. As a result, it can be seen from which frame the frames for which the feature amount is determined to be “h” are continuous.

また、ステップＳ２０３で抽出したフレームの１つ後のフレームをデコードするよう解析制御部１０４ａからデコード部１０３に要求を出し（Ｓ２０８）、デコードされたフレームの特徴量を算出して、その特徴量が所定の閾値以上であるか否かに応じて”ｈ”と”ｌ”に２値化し（Ｓ２０９）、このフレームが”ｈ”なら（Ｓ２１０のＹｅｓ）、更に１つ後のフレームに対するデコード要求を出力し（Ｓ２１１）、デコードされたフレームの特徴量を２値化する。このステップＳ２０９〜Ｓ２１１の処理は、”ｌ”と判定されたフレームが現れるまで繰り返す。これによって、特徴量が”ｈ”と判定されたフレームがどのフレームまで連続しているのかが分かる。 In addition, a request is issued from the analysis control unit 104a to the decoding unit 103 to decode the frame immediately after the frame extracted in step S203 (S208), and the feature amount of the decoded frame is calculated. It is binarized into “h” and “l” depending on whether or not it is greater than or equal to a predetermined threshold value (S209). If this frame is “h” (Yes in S210), a decoding request for the next frame is issued. It outputs (S211) and binarizes the feature quantity of the decoded frame. The processes in steps S209 to S211 are repeated until a frame determined to be “l” appears. As a result, it can be understood to what frame a frame whose feature value is determined to be “h” continues.

この情報を用いることによって、”ｈ”と判定されたフレームが３フレーム以上連続している区間を無音区間として抽出することができるので、フレームごとに、無音区間のフレームであるか否かの情報を第２の特徴量として出力する（Ｓ２１２）。 By using this information, it is possible to extract a section in which three or more frames determined to be “h” are continuous as a silent section. Therefore, information on whether or not each frame is a frame of a silent section. Is output as the second feature amount (S212).

このように、最初に所定の間隔で間引いてデコードされたフレームのうち、特徴量が”ｈ”と判定されたフレームのまわりを詳細にデコードするため、精度を劣化させることなく、デコード処理を削減することができる。 In this way, the frame that was first decoded at a predetermined interval and decoded is decoded in detail around the frame whose feature value is determined to be “h”, so the decoding process is reduced without degrading accuracy. can do.

（伝送エラー検出フィルタ部１０２）
次に、伝送エラー検出フィルタ部１０２について説明する。図８は伝送エラー検出フィルタ部１０２とデコード部１０３の詳細な構成を示すブロック図である。伝送エラー検出フィルタ部１０２は、ヘッダーサーチ部１０２ａと伝送エラー判定部１０２ｂから構成される。 (Transmission error detection filter unit 102)
Next, the transmission error detection filter unit 102 will be described. FIG. 8 is a block diagram showing detailed configurations of the transmission error detection filter unit 102 and the decoding unit 103. The transmission error detection filter unit 102 includes a header search unit 102a and a transmission error determination unit 102b.

ヘッダーサーチ部１０２ａは、Ｄｅｍｕｘ部１０１から入力されるＥＳのヘッダーをサーチして、ＥＳをフレーム単位に区切る。フレーム単位に区切ったＥＳは、フレームデータとして、伝送エラー判定部１０２ｂに出力するとともに、デコード部１０３のフレームデータ一時記憶部１０３ａに記憶する。 The header search unit 102a searches the ES header input from the Demux unit 101 and divides the ES into frame units. The ES divided into frame units is output as frame data to the transmission error determination unit 102b and is also stored in the frame data temporary storage unit 103a of the decoding unit 103.

伝送エラー判定部１０２ｂには、ＰＴＳ（ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ）などのタイムスタンプと、フレームデータとが入力される。伝送エラー判定部１０２ｂは、タイムスタンプが入力されるごとに、前回入力されたタイムスタンプが示す時刻と今回入力されたタイムスタンプが示す時刻との時間差を第１の時間長として算出する。また、前回タイムスタンプが入力されたＥＳフレームから、今回タイムスタンプが入力されたＥＳフレームまでの時間長を第２の時間長として算出する。こうして算出された第１の時間長と第２の時間長とが等しい場合には、伝送エラーが無かったと判定し、第１の時間長と第２の時間長とが等しくない場合には、伝送エラーがあったと判定する。 A time stamp such as a PTS (Presentation Time Stamp) and frame data are input to the transmission error determination unit 102b. Each time a time stamp is input, the transmission error determination unit 102b calculates the time difference between the time indicated by the previously input time stamp and the time indicated by the time stamp input this time as the first time length. The time length from the ES frame in which the previous time stamp is input to the ES frame in which the current time stamp is input is calculated as the second time length. If the first time length and the second time length calculated in this way are equal, it is determined that there is no transmission error, and if the first time length and the second time length are not equal, transmission is performed. Judge that there was an error.

このように、伝送エラー判定部１０２ｂでは、タイムスタンプを利用することによって、パケットロスによるエラーがあったかどうかを確実に検出することができるようになり、正常なフレームデータがどの時間帯に存在し、異常を含む可能性があるフレームデータがどの時間帯に存在するのかを正確に判定することができる。伝送エラー判定部１０２ｂによって得られた伝送エラーの情報は、ＴＳデータ一時記憶部１０６に出力するとともに、区間分類部１０５に出力する。 In this way, the transmission error determination unit 102b can reliably detect whether there is an error due to packet loss by using the time stamp, and in which time zone normal frame data exists, It is possible to accurately determine in which time zone the frame data that may contain the abnormality exists. The transmission error information obtained by the transmission error determination unit 102 b is output to the TS data temporary storage unit 106 and also output to the section classification unit 105.

（区間分類部１０５）
次に、区間分類部１０５について説明する。区間分類部１０５は、セグメント単位特徴量抽出部１０４ｂから出力される第１の特徴量と、時間継続特徴量抽出部１０４ｃから出力される第２の特徴量のどちらか、もしくは両方を用いて、所望する分類区間を取得する。 (Section classification unit 105)
Next, the section classification unit 105 will be described. The section classification unit 105 uses either or both of the first feature value output from the segment unit feature value extraction unit 104b and the second feature value output from the time continuation feature value extraction unit 104c, or both. Get the desired classification interval.

例えば、セグメント単位特徴量抽出部１０４ｂからセグメント単位のＬとＲの差分を示す情報が入力されると、区間分類部１０５において音楽が流れている区間と音楽が流れていない区間とに分類することができる。また、例えば、時間継続特徴量抽出部１０４ｃから出力された無音区間を示す情報を用いて、区間分類部１０５は無音区間と無音区間との間隔が所定の時間であるならば、その無音区間と無音区間の間をＣＭ区間と判定する。 For example, when information indicating the difference between L and R in segment units is input from the segment unit feature amount extraction unit 104b, the segment classification unit 105 classifies the segment into a segment where music is flowing and a segment where music is not flowing. Can do. Further, for example, using the information indicating the silent section output from the time continuation feature amount extraction unit 104c, the section classification unit 105 determines that the silent section is the interval between the silent section and the silent section. A period between silent sections is determined as a CM section.

この区間分類部１０５での区間分類処理には、伝送エラー判定部１０２ｂから入力された伝送エラーの判定結果を用いる。図９を用いて伝送エラー判定部１０２ｂから入力された伝送エラーの判定結果を用いて区間を分類する処理について説明する。 For the section classification processing in the section classification unit 105, the transmission error determination result input from the transmission error determination unit 102b is used. The process of classifying sections using the transmission error determination result input from the transmission error determination unit 102b will be described with reference to FIG.

なお、区間分類部１０５には、セグメント単位特徴量抽出部１０４ｂから、セグメント単位のＬとＲの差分の概算値が入力されるものとし、区間分類部１０５において音楽区間を検出するとして説明する。 Note that it is assumed that the segment classification unit 105 receives an approximate value of the difference between L and R in segment units from the segment unit feature quantity extraction unit 104b, and the segment classification unit 105 detects a music segment.

区間分類部１０５では、セグメント単位特徴量抽出部１０４ｂから入力されたセグメント単位のＬとＲの差分の概算値に基づいて、音楽性が高いと判定できるセグメント（ＬとＲの差分が大きいセグメント）と、音楽性が低いと判定できるセグメント（ＬとＲの差分が小さいセグメント）とに離散化する。 The segment classification unit 105 can determine that the musicality is high based on the approximate value of the difference between L and R in the segment unit input from the segment unit feature quantity extraction unit 104b (a segment having a large difference between L and R). And discretized into segments (segments with a small difference between L and R) that can be determined to have low musicality.

なお、図９では、音楽性が高いと判定されたセグメントを”Ｈ”、音楽性が低いと判定されたセグメントを”Ｌ”と示す。そして、区間分類部１０５は、Ｈのセグメントが例えば３つ連続すると音楽区間の開始と判定し、音楽区間の開始後Ｌのセグメントが例えば３つ連続すると音楽区間の終了と判定するとして説明する。 In FIG. 9, a segment determined to have high musicality is indicated as “H”, and a segment determined to have low musicality is indicated as “L”. Then, the section classification unit 105 will be described as determining that the music section starts when, for example, three H segments continue, and determining that the music section ends when, for example, three L segments follow the music section.

区間分類部１０５に入力されるセグメントには、セグメントを構成する全てのフレームが伝送エラーを起こしていない場合と、セグメントを構成するフレームの一部または全部が伝送エラーを起こしている場合とがある。図９では、伝送エラーを起こしたフレームを含むセグメントを”×”で示している。 The segment input to the section classification unit 105 includes a case where all the frames constituting the segment do not cause a transmission error and a case where a part or all of the frames constituting the segment cause a transmission error. . In FIG. 9, a segment including a frame in which a transmission error has occurred is indicated by “x”.

全てのフレームが伝送エラーを起こしていない場合には、セグメント単位特徴量抽出部１０４ｂでは、セグメントのＬとＲの差分の概算値に基づいて”Ｈ”と”Ｌ”に２値化することができる。しかし、伝送エラーが起こったフレームが含まれていると、セグメントのＬとＲの差分の概算値が正確に求められていない可能性がある。そこで、伝送エラー検出フィルタ部１０２によって検出された伝送エラーの情報を区間分類部１０５に入力して、伝送エラーが起こったフレームが含まれるセグメントを確率的に２値化する。 If no transmission error has occurred in all frames, the segment unit feature extraction unit 104b may binarize into “H” and “L” based on the approximate value of the difference between L and R of the segment. it can. However, if a frame in which a transmission error has occurred is included, there is a possibility that the approximate value of the difference between L and R of the segment is not accurately obtained. Therefore, the information on the transmission error detected by the transmission error detection filter unit 102 is input to the section classification unit 105, and the segment including the frame in which the transmission error has occurred is binarized stochastically.

確率的に２値化する方法としては、例えば、連続した所定数分（例えば５つ）のセグメントの”Ｈ”、”Ｌ”の出現パタンを予め学習しておき、伝送エラーが起こった箇所を中心としたセグメントの”Ｈ”と”Ｌ”の出現パタンを学習結果と比較することによって確率的にエラー発生箇所の予測を行うことができる。具体的には、例えば伝送エラーの起こった箇所を中心として前後２セグメントが”ＨＨ×ＨＬ”というパタンだった場合であって、予め学習したＨ、Ｌのパタンとして”ＨＨＨＨＬ”が８ケース、”ＨＨＬＨＬ”が２ケースあるような場合には、”ＨＨＨＨＬ”のパタンで出現する確率が”ＨＨＬＨＬ”で出現する確率よりも高い。そこで、この伝送エラーの起こったフレームを含むセグメントは”Ｈ”と判定する。 As a method of binarizing stochastically, for example, the occurrence pattern of “H” and “L” of a predetermined number of consecutive segments (for example, five) is learned in advance, and the location where the transmission error occurs is determined. By comparing the appearance pattern of “H” and “L” of the centered segment with the learning result, it is possible to predict an error occurrence place stochastically. Specifically, for example, in the case where two segments before and after the transmission error center are the pattern “HH × HL”, and “HHHHL” is 8 cases as the previously learned H and L patterns, “ When there are two cases of “HHLHL”, the probability of appearing in the “HHHLHL” pattern is higher than the probability of appearing in “HHLHL”. Therefore, the segment including the frame in which the transmission error has occurred is determined as “H”.

また、確率的に２値化する別の方法としては、エラーが起こった箇所の前後２、３セグメントに”Ｌ”が多ければ、当該セグメントも”Ｌ”と判定し、隣接するセグメントに”Ｈ”が多ければ、当該セグメントを”Ｈ”と判定することもできる。このように、確率的に２値化した特徴量を用いて区間を分類することによって、高い精度で区間を分類することができる。 As another method of binarizing stochastically, if there are many “L” s in the two or three segments before and after the place where the error occurred, the segment is also determined to be “L” and the adjacent segment is set to “H”. If there are many "", the segment can be determined to be "H". In this way, by classifying sections using the stochastic binarized feature values, sections can be classified with high accuracy.

なお、上述の説明では、セグメント単位特徴量抽出部１０４ｂで判定されたセグメント単位の特徴量を用いて区間を分類する場合を説明したが、フレームごとの特徴量を用いて区間を分類しても良い。 In the above description, the case is described in which the sections are classified using the segment-unit feature quantities determined by the segment-unit feature quantity extraction unit 104b. However, even if the sections are classified using the feature quantities for each frame. good.

なお、上記実施形態に限定されることはなく、本発明の要旨を逸脱しない範囲において、適宜変更しても良い。 In addition, it is not limited to the said embodiment, You may change suitably in the range which does not deviate from the summary of this invention.

本発明の実施形態に係る携帯電話機の構成を示すブロック図。1 is a block diagram showing a configuration of a mobile phone according to an embodiment of the present invention. 本発明の実施形態に係る携帯電話機のコンテンツ処理部の構成を示すブロック図。The block diagram which shows the structure of the content processing part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機のコンテンツ解析部のブロック図。The block diagram of the content analysis part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機のセグメント単位特徴量抽出部で処理イメージ図。The processing image figure in the segment unit feature-value extraction part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機のセグメント単位特徴量抽出部の処理を示すフローチャート。The flowchart which shows the process of the segment unit feature-value extraction part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機の時間継続特徴量抽出部で処理されるデコードデータの例を示したイメージ図。The image figure which showed the example of the decoding data processed with the time continuation feature-value extraction part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機の時間継続特徴量抽出部の処理を示すフローチャート。The flowchart which shows the process of the time continuation feature-value extraction part of the mobile telephone which concerns on embodiment of this invention. 本発明の実施形態に係る携帯電話機の伝送エラー検出フィルタ部とデコード部の構成を示すブロック図。The block diagram which shows the structure of the transmission error detection filter part and decoding part of the mobile telephone which concern on embodiment of this invention. 本発明の実施形態に係る携帯電話機の区間分類部の処理イメージ図。The processing image figure of the area classification | category part of the mobile telephone which concerns on embodiment of this invention.

Explanation of symbols

１携帯電話機、５１制御部、５２アンテナ、５３ＲＦ部、５４通信制御部、５５音声入出力部、５６表示部、５７記憶部、５７ａマルチメディアコンテンツＤＢ部、５８アンテナ、５９ＲＦ部、６０放送受信部、１００コンテンツ処理部、１０１Ｄｅｍｕｘ部、１０２伝送エラー検出フィルタ部、１０２ａヘッダーサーチ部、１０２ｂ伝送エラー判定部、１０３デコード部、１０３ａフレームデータ一時記憶部、１０３ｂデコード処理部、１０４コンテンツ解析部、１０４ａ解析制御部、１０４ｂセグメント単位特徴量抽出部、１０４ｃ時間継続特徴量抽出部、１０５区間分類部、１０６ＴＳデータ一時記憶部、１０７解析結果統合部 1 cellular phone, 51 control unit, 52 antenna, 53 RF unit, 54 communication control unit, 55 voice input / output unit, 56 display unit, 57 storage unit, 57a multimedia content DB unit, 58 antenna, 59 RF unit, 60 broadcast Receiving unit, 100 content processing unit, 101 demux unit, 102 transmission error detection filter unit, 102a header search unit, 102b transmission error determination unit, 103 decoding unit, 103a frame data temporary storage unit, 103b decoding processing unit, 104 content analysis unit 104a Analysis control unit 104b Segment unit feature extraction unit 104c Time duration feature extraction unit 105 Segment classification unit 106 TS data temporary storage unit 107 Analysis result integration unit

Claims

An information processing apparatus for classifying a predetermined section from encoded content,
Decoding means for decoding by decoding the frame included in the content at a predetermined thinning rate;
Frame feature amount calculating means for calculating a feature amount of the frame decoded by the decoding means;
When discriminating and calculating the feature amount of a segment consisting of a predetermined frame group, the feature amount of the segment unit is calculated using the feature amount of the frame calculated by the frame feature amount calculating unit, and the calculated segment unit amount of If the difference between the feature quantity and the reference value is equal to or greater than a predetermined value, the segment feature quantity is discretized using the segment unit feature quantity, and the difference between the calculated segment unit feature quantity and the reference value is a predetermined value. If the value is less than the value, the frame decimation rate is changed until the difference between the segment unit feature amount and the reference value is equal to or greater than a predetermined value or the segment unit feature amount is calculated based on the feature amount of all frames. Segment feature discretization means for discretizing the segment feature quantity using the segment unit feature quantity obtained by decoding by the decode means
An information processing apparatus comprising: section classification means for performing section classification of content using the discretized segment feature quantity acquired by the segment unit feature quantity discretization means.

An information processing apparatus for classifying a predetermined section from encoded content,
Decoding means for decoding by decoding the frame included in the content at a predetermined thinning rate;
Frame feature amount calculating means for calculating a feature amount of the frame decoded by the decoding means;
Determination means for determining whether or not the frame feature value calculated by the frame feature value calculation means matches the condition of the predetermined section;
A front and rear frames of the frame that is determined to match the condition of the predetermined section by the determining means, with respect to awe frames not been decoded by said decoding means, sequentially, the decoding by the decoding means, said frame The calculation of the feature amount by the feature amount calculation unit and the determination of whether the feature amount by the determination unit matches the condition of the predetermined section are repeated until a frame that does not match the condition of the predetermined section appears. Thus , an information processing apparatus comprising section classification means for determining the continuation of frames that meet the condition of the predetermined section .

A communication means for transmitting and receiving data;
A transmission error detecting means for detecting a frame that has been transmitted by a communication error and cannot be correctly received among the frames of content received by the communication means;
The information processing apparatus according to claim 1, wherein the section classification unit probabilistically estimates a feature amount of a segment including a frame in which a transmission error has occurred by the transmission error detection unit from previous and subsequent segments.

A content analysis program for classifying a predetermined section from encoded content,
A decoding function of decoding by thinning out frames included in the content at a predetermined thinning rate,
A frame feature amount calculation function for calculating a feature amount of a frame decoded by the decoding function;
When discriminating and calculating the feature amount of a segment consisting of a predetermined frame group, the feature amount of the segment unit is calculated using the feature amount of the frame calculated by the frame feature amount calculation function, and the calculated segment unit amount is calculated. If the difference between the feature quantity and the reference value is equal to or greater than a predetermined value, the segment feature quantity is discretized using the segment unit feature quantity, and the difference between the calculated segment unit feature quantity and the reference value is a predetermined value. If the value is less than the value, the frame decimation rate is changed until the difference between the segment unit feature amount and the reference value is equal to or greater than a predetermined value or the segment unit feature amount is calculated based on the feature amount of all frames. A segment feature discretization function that discretizes the segment feature using the segment unit feature obtained by decoding using the decode function.
A content analysis program, comprising: a section classification function for performing section classification of content using the discretized segment feature quantity acquired by the segment unit feature quantity discretization function.

A content analysis program for classifying a predetermined section from encoded content,
A decoding function for decoding by decoding the frames included in the content at a predetermined thinning rate;
A frame feature amount calculation function for calculating a feature amount of a frame decoded by the decoding function;
A determination function for determining whether or not the frame feature value calculated by the frame feature value calculation function matches the condition of the predetermined section;
A front and rear frames of the frame that is determined to match the condition of the predetermined section by the determining function for awe frames not been decoded by the decoding function, sequentially, the decoding by the decoding function, the frame The calculation of the feature amount by the feature amount calculation function and the determination of whether the feature amount by the determination function matches the condition of the predetermined section are repeated until a frame that does not match the condition of the predetermined section appears. And a section classification function for determining the continuation of frames that meet the predetermined section condition .