JP2004163672A

JP2004163672A - Audio processing method

Info

Publication number: JP2004163672A
Application number: JP2002329702A
Authority: JP
Inventors: Masanobu Funakoshi; 正伸船越
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-11-13
Filing date: 2002-11-13
Publication date: 2004-06-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide an audio processing method capable of reproducing a digest part irrelevantly to a file format and suppressing the occurrence of noise in the case of reproducing the part between the digest parts. <P>SOLUTION: An encoding part 3 encodes an audio signal inputted from a microphone 2 and holds position information on respective frames. Simultaneously, a digest information generation part 4 analyzes the audio signal inputted from the microphone 2 to decide the digest part. A list of heat position information on encoded data of respective frames including the decided part is generated as digest information. Then a watermark generation part 5 constitutes watermark data from the digest information and a watermark insertion part 6 embeds the watermark data in the encoded data and stores encoded data of all frames including the frame where the water mark data are embedded in a storage device 7. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、オーディオ処理技術に関するものである。
【０００２】
【従来の技術】
高音質オーディオの符号化方式として、変換符号化方式である、ドルビーデジタル（ＡＣ−３）や、ＡＴＲＡＣ−３、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＣｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）１ＬａｙｅｒＩＩ、ＬａｙｅｒＩＩＩ（ＭＰ３）、ＭＰＥＧ２−ＡＡＣなどの符号化方式が世の中に広く使われている。
【０００３】
これらの符号化方式の一部はＩＳＯ（ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ：国際標準化機構）によって国際標準化されている。
【０００４】
上述したようなディジタル符号化規格の普及に伴い、これらをコンピュータなどの上で扱うためのファイルフォーマットが策定されている。例えばＭＰＥＧ−４ではその規格にファイルフォーマットが定義されている。さらにはコンピュータのＯＳやネットワークの構成に依存して、多くのファイルフォーマットが普及している。
【０００５】
上述したようなディジタル符号化規格の普及に伴い、コンテンツ業界からは著作権保護の問題が強く提起されるようになってきた。これに対して、セキュリティに関する情報や暗号化のために電子透かし技術が開発されている。これはデータ再生時にデータが変化しない、あるいは、変化が知覚できないレベルで少量の情報を埋め込む技術である。オーディオデータに対して電子透かしを埋め込む技術としては従来からいくつかの技術が開示されている（特許文献１，特許文献２を参照）。
【０００６】
このような電子透かしは、主にマルチメディアデータの著作権情報を埋め込むことによって、マルチメディアデータの著作権を保護するために利用されているが、その他の目的に応用可能である。
【０００７】
【特許文献１】
特開２００１−２０２０８９号公報
【特許文献２】
特開平１１−３１６５９９号公報
【０００８】
【発明が解決しようとする課題】
これらのファイルフォーマットではフレームの同期やフレームの制御に関して各種各様の方式が採用されている。したがって、これらのファイルに格納されているオーディオデータをダイジェスト再生したい場合は、ダイジェスト再生するフレームの位置情報を各フォーマット独自の形式で符号化データとは別に格納するため、フォーマット間でダイジェスト情報を共有できないという問題が生じている。
【０００９】
たとえば、同じＭＰＥＧ１−ＡｕｄｉｏＬａｙｅｒＩＩデータを格納したファイルであっても、動画像との同期のためのタイムスタンプの付与の方法はＡＶＩファイルフォーマットやＱｕｉｃｋＴｉｍｅファイルフォーマットでは異なる。また、可変長符号化した場合は、フレーム毎のデータ量が一定ではない。したがって、ダイジェスト再生を行なう場合、ファイルフォーマットのヒントトラックにストリーム中でダイジェスト再生するフレームの位置情報やそれぞれの符号化モードを格納しておき、ダイジェスト再生時にはこの情報を用いて情報の読み出し位置を制御して実現する方法が一般的である。
【００１０】
しかしながら、このような方式では、ファイルフォーマットを変換した際に、変換先のフォーマットに対応する領域が無ければ消失し、別なファイルフォーマットではランダムアクセスを行なうのに非常な困難が伴うことが生じる。また、ビットストリームにこれらの情報を追加することは他のデータとの互換性を損なう危険性をはらんでいる。符号によっては任意のデータを書き込むデータ領域、例えば各フレームにおいて、ＭＰＥＧ１−ＡｕｄｉｏＬａｙｅｒＩＩＩであればアンシラリデータ域に追加情報を記載できるが、この領域に格納されるデータは各アプリケーションごとに任意となっており、互換性が取れないという問題がある。
【００１１】
本発明は以上の問題に鑑みてなされたものであり、ファイルフォーマットに関係なくダイジェスト部分を再生可能にするオーディオ処理技術を提供することを目的とする。
【００１２】
また、オーディオデータをダイジェスト再生する場合、ダイジェスト部分間をそのまま再生してしまうと、音のとぎれが生じるばかりでなく、プチプチというノイズが頻繁に発生してしまうという問題が生じてしまう。
【００１３】
本発明の別の目的としては、ダイジェスト部分間を再生する場合、ノイズの発生を抑制するオーディオ処理技術を提供することを目的とする。
【００１４】
【課題を解決するための手段】
本発明の目的を達成するために、例えば本発明のオーディオ処理方法は以下の構成を備える。
【００１５】
すなわち、入力されたオーディオ信号をフレーム毎に符号化し、フレーム毎の符号化データを生成する符号化工程と、
入力されたオーディオ信号において、ダイジェスト部分を含むフレーム群の符号化データを特定する情報を、前記入力されたオーディオ信号の各フレームのうち、予め定められたフレームの符号化データに対して埋め込む埋め込み工程と
を備えることを特徴とする。
【００１６】
【発明の実施の形態】
以下添付図面を参照して、本発明を好適な実施形態に従って詳細に説明するが、要するに本発明は入力されたオーディオ信号をフレーム毎に符号化し、フレーム毎の符号化データを生成し、入力されたオーディオ信号において、ダイジェスト部分を含むフレーム群の符号化データを特定する情報を、入力されたオーディオ信号の各フレームのうち、予め定められたフレームの符号化データに対して埋め込むものである。
【００１７】
また復号化する場合においては、オーディオ信号の符号化データにおけるオーディオ信号のダイジェスト部分の位置情報が埋め込まれたフレームの符号化データを含む、複数のフレームの符号化データにおいて、位置情報が埋め込まれたフレームから位置情報を抽出し、複数のフレームの符号化データから、抽出された位置情報に基づいたフレームを復号し、再生するオーディオ処理を行うものである。以下、かかる構成の具体的な例を実施形態として説明する。
【００１８】
［第１の実施形態］
図１は本実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。１はオーディオ処理装置本体である。２はオーディオ信号を入力するマイクである。３はマイク２を介して入力したオーディオ信号をフレーム単位で符号化する符号化部である。４はオーディオ信号の各特性（音圧、周波数など）に基づき、特徴的な部分を抜き出し、抜き出したオーディオ信号の先頭と終端に相当する符号化データの先頭位置をダイジェスト情報として格納するダイジェスト情報生成部である。５はダイジェスト信号生成部４の出力から透かしデータを生成する透かし生成部であり、６は透かしデータを符号化データに埋め込む透かし挿入部である。７は生成された符号化データを記録する記憶装置である。
【００１９】
同図に示した符号化部３、ダイジェスト情報生成部４、透かし生成部５、透かし挿入部６の夫々はハードウェアで構成されていても良いし、ソフトウェアで構成されていても良い。要するに、入力されたオーディオ信号をフレーム毎に符号化し、フレーム毎の符号化データを生成する符号化部と、入力されたオーディオ信号において、ダイジェスト部分を含むフレーム群の符号化データを特定する情報を、前記入力されたオーディオ信号の各フレームのうち、予め定められたフレームの符号化データに対して埋め込む透かし挿入部として機能する構成を備えればよい。
【００２０】
以下の説明では各部がソフトウェアで構成されているものとして説明する。
【００２１】
図４は本実施形態におけるオーディオ処理装置１の基本構成を示すブロック図である。本実施形態におけるオーディオ処理装置はＣＰＵ１２０１、ＲＡＭ１２０２、ＲＯＭ１２０３、キーボード１２０４，マウス１２０５、Ｉ／Ｆ１２０６、マイク１２０７、表示部１２０８，外部記憶装置１２０９、記憶媒体ドライブ１２１０、ネットワークＩ／Ｆ１２１１、バス１２１２により構成されている。また図１に示した符号化部３、ダイジェスト情報生成部４、透かし生成部５、透かし挿入部６の夫々は本実施形態ではソフトウェアにより構成されているので、このソフトウェアのプログラムをＣＰＵ１２０１が実行することで、本実施形態におけるオーディオ処理装置は図１に示した機能構成を有することになる。
【００２２】
ＣＰＵ１２０１は、ＲＡＭ１２０２やＲＯＭ１２０３に格納されているプログラムやデータを用いて本装置全体の制御を行うと共に、後述のオーディオ処理を行うために、各部を制御する。ＲＡＭ１２０２は、外部記憶装置１２０９や記憶媒体ドライブ１２１０から読み出したプログラムやデータ、処理対象のデータなどを一時的に記憶するエリアを備えると共に、ＣＰＵ１２０１が各種の処理を行う際に用いるワークエリアも備える。ＲＯＭ１２０３は、本装置全体の制御を行うためのプログラムやデータを保持する。
【００２３】
キーボード１２０４、マウス１２０５は、ユーザコマンド入力装置であり、ＣＰＵ１２０１に対して各種の指示を入力することが出来る。マイク１２０７はオーディオ信号を入力するためのもので、図１に示したマイク２に相当する。マイク１２０７から入力するオーディオ信号はＩ／Ｆ１２０６でＡ／Ｄ変換され、その結果はＲＡＭ１２０２に書き込まれる。表示部１２０８は、ＣＲＴや液晶画面などにより構成されており、各種の画像情報や文字情報を表示することができる。
【００２４】
外部記憶装置１２０９は、ハードディスクドライブなどの大容量情報記憶装置であって、ＯＳや後述のオーディオ処理を行うためのプログラムやデータを格納する。後述のオーディオ処理を行うためのプログラムは上述の通り、図１に示した符号化部３、ダイジェスト情報生成部４、透かし生成部５、透かし挿入部６により構成されている。また、図１に示した記憶装置７は外部記憶装置１２０９に相当するものである。
【００２５】
記憶媒体ドライブ１２１０は、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体からプログラムやデータを読み取り、ＲＡＭ１２０２や外部記憶装置１２０９に出力する。なお、この記憶媒体に後述のオーディオ処理を行うためのプログラムやデータを格納させておいても良い。
【００２６】
また、記憶媒体が情報書き込み可能なものである場合、記憶媒体ドライブ１２１０はこの記憶媒体に対して各種のプログラムやデータを書き込むことが出来る。ネットワークＩ／Ｆ１２１１は、オーディオ処理装置をインターネットやＬＡＮなどのネットワークに接続するためのＩ／Ｆであって、オーディオ処理装置に対してプログラムやデータを上記ネットワークに接続された他の装置からダウンロードしたり、オーディオ処理装置からプログラムやデータを上記ネットワークに接続された他の装置に対して送信する際には、ネットワークＩ／Ｆ１２１１を介して行われる。バス１２１２は上述の各部を繋ぐためのものであって、各部はこのバス１２１２を介して互いにデータ通信が可能となる。
【００２７】
本実施形態ではオーディオ信号のみからダイジェスト情報を生成する場合について説明する。また、説明の便宜のため、符号化方式としてＭＰＥＧ２−ＡＡＣを用いるが、ＭＰＥＧ１ＡｕｄｉｏＬａｙｅｒＩ，ＩＩ，ＩＩＩ，ＡＴＲＡＣ３，ＡＣ−３などのその他の変換符号化方式についても全く同様な方法で実現可能である。
【００２８】
以下では、ＣＰＵ１２０１が、図１に示した符号化部３、ダイジェスト情報生成部４、透かし生成部５、透かし挿入部６により構成されるソフトウェアのプログラムを実行することで行われるオーディオ処理について説明する。
【００２９】
まず、マイク２（１２０７）で集音されたオーディオ信号は連続して符号化部３に入力される。符号化部３は入力されたオーディオ信号に対してフレーム毎にＡＡＣ符号化方式で符号化し、各フレームの符号化データをメモリ（ＲＡＭ１２０２）に保持する。また、符号化部３は各フレームの符号化処理が終了する毎に、各フレームの符号化データの位置情報をダイジェスト情報生成部４へ送出する。この位置情報としては本実施形態では符号化データ先頭からのバイト数（先頭位置）とする。なお、本実施形態では符号化方式としてＭＰＥＧ２−ＡＡＣを用いており、各フレームがバイト単位で作成されることを前提にしているが、各フレームがバイト単位にならない符号化方式の場合は、位置情報として先頭からのビット数を用いてもよい。
【００３０】
ダイジェスト情報生成部４では、マイク２から入力される音声信号（オーディオ信号）の音圧や周波数を適宜分析し、音圧と周波数の変化や連続性などによってダイジェストとして抜き出す部分（ダイジェスト部分）を判定し、ダイジェスト部分の先頭と終端に当たる符号化フレームの先頭位置を、符号化部３から送出される位置情報を元に算出し、逐次メモリに格納する。
【００３１】
図５に示すように、ダイジェスト部分の先頭と終端の符号化フレームの符号化データの先頭位置情報のリストをダイジェスト情報とする。なお、このような音声に基づくダイジェスト生成技術は一般的に用いられており、公知であるため、ここでは詳細は説明しない。
【００３２】
透かし生成部５はダイジェスト情報生成部４から、ダイジェスト部分の先頭フレームの先頭位置と終端フレームの先頭位置を受け取り、これらを例えば、各々３２ビットの正の整数として扱い、透かしデータとする。生成された透かしデータと、予め設定された埋め込み位置は透かし挿入部６に入力される。
【００３３】
透かし挿入部６は、ダイジェスト情報生成部４に格納されているダイジェスト情報を元に、符号化部３から読み出した符号化データの予め設定された位置に、生成された透かしデータを埋め込む。本実施形態では、ダイジェスト部分の先頭フレームに、終端フレームの位置を埋め込み、終端フレームに次のダイジェスト部分の先頭位置を埋め込む。
【００３４】
この埋め込みについて、図６を用いて説明する。図６は各フレームの符号化データにおいて、ダイジェスト部分に対する埋め込みの一例を示す図である。同図において斜線部分がダイジェスト部分を含むフレームの符号化データ群であって、同図では先頭位置が５０バイト目から１００バイト目までの符号化データと、先頭位置が２００バイト目から（２００＋α）バイト目（α＞０）までの符号化データとがこれに相当する。
【００３５】
この状態で透かし挿入部６は、５０バイト目の符号化データに対しては同じダイジェスト部分の最後のフレームである１００バイト目の符号化データの先頭位置のバイト数、即ち数値「１００」を埋め込む。また、この１００バイト目の符号化データには、次のダイジェスト部分の先頭位置のバイト数である「２００」を埋め込む。このようにすることで、最初に符号化されたフレームから見て、最初に埋め込みが行われたフレームを探すことができると、このフレームから、このフレームに埋め込まれた数値が表現する位置のフレームまでを再生対象とすることでダイジェスト部分を再生することが出来る。またこのダイジェスト部分の最後のフレームに埋め込まれた数値を抽出することが出来ると、次にダイジェスト部分を続けて再生することが出来る。なお、先頭フレームや終端フレームの位置情報は先頭バイト数に限定されるものではなく、他の指標を用いてもよい。
【００３６】
なお、埋め込みの方法としては、例えば各符号化データにおける最も高周波のサブバンドにおける、各周波数スペクトルを±１の範囲で増減させ、意図的に奇数、もしくは偶数にして符号化データを変更する方法などで透かしデータを埋め込めばよい。すなわち、埋め込むデータの１ビットが０であれば、最後の周波数スペクトルを偶数に、１であれば奇数にする。
【００３７】
また他にも特開平１１−３１６５９９号公報「電子透かし埋め込み装置、オーディオ符号化装置および記録媒体」に記載されている方法等の既存の方法を使用しても良い。このようにして透かしデータが埋め込まれた符号化データは記憶装置７（外部記憶装置１２０９）の所定の位置に蓄積される。
【００３８】
以上説明した、本実施形態におけるオーディオ処理のフローチャートを図１０に示す。まずステップＳ２０１にて、装置の初期化を行なう。次に、ステップＳ２０２にて符号化部３がマイク２（１２０７）から入力されたオーディオ信号の符号化を行なうと共に、各フレームの位置情報をメモリ（ＲＡＭ１２０２）に保持する。またこれと平行して、ダイジェスト情報生成部４はステップＳ２０３にてマイク２（１２０７）から入力されたオーディオ信号を分析してダイジェスト部を上記方法により判定する。そして判定した部分の先頭と終端の符号化フレームの先頭位置情報のリストをダイジェスト情報として生成する。
【００３９】
図７にこのリストの構成例を示す。各ダイジェスト部分にはＩＤ番号が付けられており、各ＩＤ番号にはダイジェスト部分の先頭フレームのフレーム番号と位置情報（先頭からのバイト数）、終端フレームのフレーム番号と位置情報（先頭からのバイト数）が対応している。
【００４０】
次にステップＳ２０５にて透かし生成部５はダイジェスト情報から透かしデータを構成し、ステップＳ２０６にて透かし挿入部６は符号化データに透かしデータを埋め込み、ステップＳ２０７にて透かしデータの埋め込まれたフレームを含む全てのフレームの符号化データを記憶装置７（外部記憶装置１２０９）に蓄積する。
【００４１】
図８に、本実施形態におけるオーディオ処理装置のＲＡＭ１２０２の構成を示す。ＲＡＭ１２０２には上記ＯＳ、そして上記符号化部３として機能する符号化ソフトウェアのプログラム、上記透かし挿入部６として機能する透かし埋め込みソフトウェアのプログラム、上記ダイジェスト情報生成部４として機能するダイジェスト情報生成ソフトウェアのプログラムがロードされている。またＲＡＭ１２０２には入力したオーディオ信号を格納するためのオーディオエリア、各フレームの符号化データを格納するための符号エリア、ＣＰＵ１２０１が各種処理を行うために使用するワーキングエリアが備わっている。
【００４２】
以上の説明により、本実施形態におけるオーディオ処理装置によって、ダイジェスト再生時に復号再生する部分の情報を透かしとして埋め込むことにより、復号側でのダイジェスト再生を行なえるようにすることが可能である。
【００４３】
また、ダイジェスト部分を示す情報を符号化データに埋め込むことで、符号化データを含むファイルのフォーマットに依存せずに、復号側でダイジェスト部分を特定することが出来る。
【００４４】
なお本実施形態では透かしデータを埋め込んだフレームを含む全てのフレームの符号化データは記憶装置７に記憶させたが、これに限定されるものではなく、記憶媒体ドライブ１２１０により書き込み可能な記憶媒体に記憶させても良いし、ネットワークＩ／Ｆ１２１１を介してネットワークに接続された外部装置に対して送信しても良い。
【００４５】
［第２の実施形態］
本実施形態における、上記第１の実施形態におけるオーディオ処理装置によって生成した、透かしデータを埋め込んだフレームを含む全てのフレームの符号化データを復号再生可能であると共に、ダイジェスト部分の再生が可能なオーディオ処理装置について以下説明する。なお本実施形態におけるオーディオ処理装置も、第１の実施形態のオーディオ処理装置による符号化データを復号するためにＭＰＥＧ２−ＡＡＣ符号化方式を採用するが、これに限定されない。
【００４６】
図２は本実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。１１は本実施形態におけるオーディオ処理装置の本体である。１２は記憶装置であり、第１の実施形態におけるオーディオ処理装置によって生成された符号化データが記録されている。１３は不図示のユーザがオーディオ処理装置の起動、各種条件の設定、再生の指示を行なうための端末である。１４は記憶装置１２を制御する制御部である。１５は符号化データから透かしデータを抽出する透かし抽出部である。１６は抽出された透かしデータからダイジェスト情報を再生するダイジェスト情報再生部である。１７はダイジェスト情報によって指定されたフレームのみを再生するように制御部１４を制御する情報を生成する制御情報生成部である。１８は符号化されたオーディオデータを復号する復号部である。１９は復号されて再生されたオーディオ信号を出力するスピーカーである。
【００４７】
同図に示した制御部１４、透かし抽出部１５、ダイジェスト情報再生部１６、制御情報生成部１７，復号部１８の夫々はハードウェアで構成されていても良いし、ソフトウェアで構成されていても良い。要するに、オーディオ信号の符号化データにおける当該オーディオ信号のダイジェスト部分の位置情報が埋め込まれたフレームの符号化データを含む、複数のフレームの符号化データにおいて、当該位置情報が埋め込まれたフレームから当該位置情報を抽出する透かし抽出部と、前記複数のフレームの符号化データから、抽出された位置情報に基づいたフレームを復号し、再生する復号部として機能する構成を備えればよい。
【００４８】
以下の説明では各部がソフトウェアで構成されているものとして説明する。
【００４９】
図１２は本実施形態におけるオーディオ処理装置１１の基本構成を示すブロック図である。本実施形態におけるオーディオ処理装置は第１の実施形態におけるオーディオ処理装置のマイク１２０７の代わりにスピーカ１５０７がＩ／Ｆ１５０６に接続されており、Ｉ／Ｆ１５０６は復号されたオーディオ信号をＤ／Ａ変換してスピーカ１５０７に出力する点以外は第１の実施形態におけるオーディオ処理装置と同じ構成を有している。
【００５０】
また図２に示した制御部１４、透かし抽出部１５、ダイジェスト情報再生部１６、制御情報生成部１７，復号部１８の夫々は本実施形態ではソフトウェアにより構成されているので、このソフトウェアのプログラムをＣＰＵ１５０１が実行することで、本実施形態におけるオーディオ処理装置は図２に示した機能構成を有することになる。
【００５１】
以下では、ＣＰＵ１５０１が、図２に示した制御部１４、透かし抽出部１５、ダイジェスト情報再生部１６、制御情報生成部１７，復号部１８により構成されるソフトウェアのプログラムを実行することで行われるオーディオ処理について説明する。
【００５２】
まず端末１３から再生するオーディオデータが選択される。この端末は第１の実施形態におけるオーディオ処理装置であってもよいし、キーボード１５０４やマウス１５０５などのユーザコマンド入力装置であってもよい。制御部１４は、選択されたオーディオデータの先頭を読み出せるように記憶装置１２（外部記憶装置１５０９、記憶媒体ドライブ１５１０）に制御信号を送る。記憶装置１２は制御信号により、読み出し位置を変更し、読み出しを開始する。
【００５３】
読み出された符号化データはフレーム単位でオーディオ処理装置１１に入力され、透かし抽出部１５に入力される。端末１３からの指示により通常再生を行う場合には、透かし抽出部１５は何も処理を行わず、入力をバイパスし、復号部１８に符号化データを送出する。一方、端末１３からの指示によりダイジェスト再生を行う場合には、透かし抽出部１５では上記第１の実施形態の図１における透かし挿入部７が行う処理手順とは逆の手順に従って符号化データから３２ビットの透かしデータを抽出し、ダイジェスト情報再生部１６に入力する。
【００５４】
ダイジェスト情報再生部１６は透かしデータを解析してダイジェスト部分の開始フレームの符号化データと終端フレームの符号化データの位置情報（先頭からのバイト数）をダイジェスト情報として再生し、制御情報生成部１７に通知する。制御情報生成部１７は通知された情報を元に、ダイジェスト部分のフレームのみを再生するように制御部１４を制御する情報を生成する。
【００５５】
透かしデータの抽出が終わった符号化データはそのまま復号部１８に入力され、復号処理されてオーディオ信号に戻され、スピーカ１９（１５０７）から音としてで出力される。
【００５６】
以下、ダイジェスト再生を行う場合の図２に示した各部が行う処理について更に詳細に説明する。透かし抽出部１５は記憶装置１２に格納されている各フレームの符号化データを１フレーム目から順に参照し、透かしデータが埋め込まれているフレームであると最初に判断したフレーム（第１のフレーム）の符号化データを特定する。そして透かし抽出部１５は特定した第１のフレームの符号化データに埋め込まれた透かしデータを抽出し、抽出した３２ビットの整数で表される数値データを、第１のフレームを先頭フレームとするダイジェスト部分の終端のフレーム（第２のフレーム）の位置情報（先頭からのバイト数）として、この位置情報と、上記第１のフレームの位置情報を制御情報生成部１７に送出する。
【００５７】
制御情報生成部１７は、上記第１のフレームから第２のフレームまでの符号化データを制御部１４が記憶装置１２から復号部１８に送出するような制御指示情報を生成し、制御部１４に送出する。
【００５８】
制御部１４はこの制御指示情報を受け、指示されたフレーム（第１のフレームから第２のフレーム）の符号化データを記憶装置１２から読み出して復号部１８に送出する。復号部１８は受けた符号化データを復号し、Ｄ／Ａ変換した後にスピーカ１９（１５０７）に音として出力する。
【００５９】
また、上記第２のフレームを再生した後には、透かし抽出部１５は第２のフレームに埋め込まれている透かしデータを抽出し、抽出した３２ビットの整数で表される数値データを、次のダイジェスト部分の先頭フレーム（これを新たに第１のフレームとする）してダイジェスト情報再生部１６に送出すると共に、この先頭フレームに埋め込まれた透かしデータを、次のダイジェスト部分の終端のフレーム（これを新たに第２のフレームとする）の位置情報（先頭からのバイト数）として、ダイジェスト情報再生部１６に送出する。以降の処理については同じである。
【００６０】
以上説明した、本実施形態におけるオーディオ処理である復号再生処理のフローチャートを図１１に示す。まずステップＳ３０１にて、オーディオ処理装置の初期化を行ない、ステップＳ３０２にて処理の終了判断を行なう。終了しない場合にはステップＳ３０３にて上記端末１３から指示された再生対象のオーディオデータの１フレーム目の符号化データが透かし抽出部１５によって読み込まれる。次に制御部１４はステップＳ３０４にて、端末１３からの指示が通常再生モード（上記通常再生を行うモード）であるのか、ダイジェスト再生モード（上記ダイジェスト部分の再生を行うモード）であるのかを判断し、以降の処理を変更する。
【００６１】
ダイジェスト再生モードの場合は、ステップＳ３０５に処理が進む。通常再生の場合は、ステップＳ３０８に処理が進む。通常再生モードの場合、ステップＳ３０８で透かし抽出部１５は何もせずに１フレームの符号化データを復号部１８に送出し、復号部１８は送出された１フレームの符号化データを復号し、復号結果をＤ／Ａ変換し、スピーカ１９（１５０７）に出力する。
【００６２】
一方、ダイジェスト再生を行なう場合は、ステップＳ３０５にて透かし抽出部１５が透かしデータが埋め込まれたフレームの符号化データから透かしデータをダイジェスト部分の先頭のフレームの符号化データの位置情報として抽出し、透かしデータが埋め込まれていたフレームの位置情報と、この透かしデータが表す位置情報をダイジェスト情報再生部１６に送出する。ステップＳ３０６ではダイジェスト情報再生部１６が送出された２つの位置情報からダイジェスト情報を再生し、ステップＳ３０７にて制御情報生成部１７がダイジェスト情報を元に、制御部１４を制御する情報を生成する。そしてステップＳ３０９では制御部１４がダイジェスト情報で指示されるフレームの符号化データを記憶装置１２から復号部１８に送出し、復号部１８は１フレームづつ符号化データを復号してスピーカ１９（１５０７）に出力する。
【００６３】
そしてステップＳ３１０で再生しているフレームが終端フレームである場合、処理はステップＳ３１２に移行し、透かし抽出部１５は終端フレームに埋め込まれた透かしデータを表す数値を次のダイジェストの先頭フレームの位置情報として抽出すると共に、この先頭フレームに埋め込まれた透かしデータを、次のダイジェストの終端フレームの位置情報として抽出する。すなわち次にダイジェスト部分を特定する。そしてステップＳ３１３で制御部１４に次のダイジェスト部分のフレームの符号化データを記憶装置１２から読み出させる。
【００６４】
図９に、本実施形態におけるオーディオ処理装置のＲＡＭ１２０２の構成を示す。ＲＡＭ１２０２には上記ＯＳ、そして上記復号部１８として機能する復号化ソフトウェアのプログラム、上記透かし抽出部１５として機能する透かし抽出ソフトウェアのプログラム、上記ダイジェスト情報再生部１６として機能するダイジェスト情報生成ソフトウェアのプログラムがロードされている。またＲＡＭ１２０２には復号したオーディオ信号を格納するためのオーディオエリア、各フレームの符号化データを格納するための符号エリア、ＣＰＵ１２０１が各種処理を行うために使用するワーキングエリアが備わっている。
【００６５】
以上の説明により、本実施形態におけるオーディオ処理装置によって、符号化データを含むファイルのフォーマットに依存せずに、ダイジェスト部分を再生することが出来る。
【００６６】
［第３の実施形態］
図３は本実施形態に係るオーディオ処理装置の機能構成を示すブロック図である。同図において、図１と同じ部部については同じ番号を付けており、その説明を省略する。
【００６７】
図３において、１０１は本実施形態におけるオーディオ処理装置本体である。１０２はダイジェスト情報生成部であるが、第１の実施形態におけるダイジェスト情報生成部４と異なり、オーディオ信号を分析する機能は持たない。その代わりに、外部（後述の動画像符号化システム）から送信されるダイジェスト部分の先頭と終端の時間情報（タイムスタンプ）を受信し、その情報を元に、適宜符号化データのダイジェスト部分の先頭フレームと終端フレームの位置に変換し、これらをリストとして保持するダイジェスト情報を生成する。１０３は動画像符号化システムであり、オーディオ処理装置に対して上記タイムスタンプを出力する。
【００６８】
また本実施形態におけるオーディオ処理装置の基本構成は第１の実施形態と同じ図４に示したとおりである。
【００６９】
上記のように構成されたオーディオ処理装置におけるオーディオデータの処理動作を以下で説明する。マイク２で集音されたオーディオ信号は連続して符号化部３に入力され、符号化部３は入力されたオーディオ信号に対して各フレーム毎にＡＡＣ符号化方式で符号化し、各フレーム毎の符号化データをメモリ（ＲＡＭ１２０２）に保持する。また、１つのフレームの符号化処理が終了する毎に、各フレームの位置情報をダイジェスト情報生成部１０２へ送出する。
【００７０】
同時にダイジェスト情報生成部１０２では、動画像符号化システム１０３からダイジェストとして抜き出される部分の先頭と終端の時間情報を受信し、これらに相当するフレームの符号化データの先頭位置を、符号化部３から送出される位置情報を元に算出し、逐次格納する。なお、本実施形態において、動画像によるダイジェスト情報は、動画像符号化システムにおいて動画像の分析によって自動生成されても良いし、ユーザが意図して生成したものであっても良い。すなわち、どのような手法で生成されたかには関わらない。
【００７１】
透かし生成部５は第１の実施形態と同様に、ダイジェスト部分の先端フレームの位置と終端フレームの位置から透かしデータを生成する。生成された透かしデータと埋め込み位置は透かし挿入部６に入力される。透かし挿入部６は第１の実施形態と同様に、符号化部３から読み出した符号化データの適宜位置に、生成された透かしデータを埋め込む。透かしデータが埋め込まれた符号化データは記憶装置７の所定の位置に蓄積される。
【００７２】
なお本実施形態においてはオーディオの符号化方式をＭＰＥＧ２−ＡＡＣとしたが他の符号化方式、たとえば、ＭＰＥＧ１ＡｕｄｉｏＬａｙｅｒＩ，ＩＩ，ＩＩＩ、ＭＰＥＧ４、ＡＴＲＡＣ３，ＡＣ−３でももちろんかまわない。
【００７３】
なお、本実施形態におけるオーディオ処理装置によって符号化されたオーディオデータは、上記第２の実施形態のオーディオ処理装置で復号可能であり、同様にダイジェスト再生を行なうことができる。
【００７４】
［第４の実施形態］
本実施形態におけるオーディオ処理装置は、第１の実施形態、もしくは第３の実施形態のオーディオ処理装置によって得られるオーディオ信号の符号化データを復号し、ダイジェスト部分を再生可能にするものであるが、複数のダイジェスト部分を連続して再生する場合に、ダイジェスト部分間で生じやすいノイズを抑制する。
【００７５】
本実施形態におけるオーディオ処理装置の機能構成を図１３に示す。同図において、図２と同じ部分については同じ番号を付けており、その説明を省略する。１５０１は本実施形態におけるオーディオ処理装置本体である。１５０２は基本的には第２の実施形態の制御部１４と同じ動作を行うが、それに加えて、ヴォリューム２１を制御する。ヴォリューム２１は復号部１８によって復号されたオーディオ信号の出力（振幅）を、設定された値に応じた大きさに制御する。よってスピーカ１９からはヴォリューム２１によって制御された結果のオーディオ信号が出力されることになる。
【００７６】
また本実施形態におけるオーディオ処理装置の基本構成は図４に示すとおりである。なお、ヴォリューム２１をハードウェアにて構成する場合には、図４に示す構成にハードウェアとしてのヴォリューム２１を付加し、ＣＰＵ１２０１の制御対象に加える必要がある。またヴォリューム２１をソフトウェアにて構成する場合には、このソフトウェアは外部記憶装置１２０９に保存されており、必要に応じてＲＡＭ１２０２に読み出される。また記憶媒体ドライブ１２１０により記憶媒体から読み出されても良い。
【００７７】
以上の構成を有する本実施形態におけるオーディオ処理装置が行うオーディオ処理について以下、説明する。まず各部を初期化するが、ヴォリューム２１に設定する値を１０に設定する。この値はオーディオ信号を増幅する大きさ、すなわちスピーカ１９から出力される音の大きさ（ヴォリューム）を示しており、この値が大きければ大きいほどスピーカ１９から出力される音の大きさは大きくなる。なお本実施形態ではこの値の設定範囲は０〜１０とする。
【００７８】
第２の実施形態と同様にして、記憶装置１２から読み出された符号化データは復号部１８により復号処理が施され、フレーム単位でのオーディオ信号が得られる。得られたオーディオ信号はヴォリューム２１に送出される。ここで制御情報生成部１７は、ダイジェスト情報再生部１６からダイジェスト部分の開始フレームの符号化データと終端フレームの符号化データの位置情報をダイジェスト情報として得、復号部１８がダイジェスト部分の終端フレームを復号再生するタイミングで、制御部１５０２に、ヴォリューム２１に設定している値を１づつ減じる制御を行う制御信号を送出する。
【００７９】
この１づつ減じる速度としては、例えば終端フレームを再生する時間内にヴォリューム２１に設定されている値が０となるような速度とする。このように終端フレームの再生を、ヴォリュームを小さくしながら行うことで、ダイジェスト部分の最後の部分をフェードアウトさせることが出来る。
【００８０】
そしてこの値が０になった時点で制御部１５０２は次のダイジェスト部分の再生を行うよう、各部を制御する。そして次のダイジェスト部分の再生が始まると、ヴォリューム２１に設定されている値は０であるので、制御情報生成部１７はこの値が１０になるまで１づつ増加させる制御を行う信号を制御部１５０２に送出する。
【００８１】
この１づつ増加させる速度としては、例えば先端フレームを再生する時間内にヴォリューム２１に設定されている値が１０となるような速度とする。このように先端フレームの再生を、ヴォリュームを大きくしながら行うことで、ダイジェスト部分の先頭の部分をフェードインさせることができる。また以上のフェードイン、フェードアウトにより、ダイジェスト部分間のみ音のヴォリュームを下げるので、ダイジェスト部分間で発生しやすいノイズの出力を抑制することが出来る。
【００８２】
本実施形態におけるオーディオ処理装置が行うオーディオ処理のフローチャートを図１４に示す。なお、ステップＳ１６００〜ステップＳ１６０９の各ステップの処理は、第１の実施形態の図１１に示すステップＳ３０１〜ステップＳ３１０の各ステップの処理と同じであるので、その説明を省略する。すなわち本実施形態のオーディオ処理の特徴であるステップＳ１６０９以降の処理について説明する。
【００８３】
ステップＳ１６０７で復号されたフレームがダイジェスト部分の終端フレームである場合、処理はステップＳ１６１３に以降する。ステップＳ１６１３ではヴォリューム２１に設定されている値（Ｖｏｌｕｍｅ）が０であるか否かが判断される。０である場合には処理をステップＳ１６１６に移行し、制御部１５０２は各部を制御して、ステップＳ３１２，Ｓ３１３と同様にして次のダイジェスト部分のフレームの符号化データの読み出しを行う。
【００８４】
一方、Ｖｏｌｕｍｅが０でない場合には、処理はステップＳ１６１４に移行し、制御情報生成部１７は制御信号を制御部１５０２に送出し、制御部１５０２はステップＳ１６１５でＶｏｌｕｍｅの値を１つづつ減じる。
【００８５】
一方、ステップＳ１６０７で復号されたフレームがダイジェスト部分の終端フレームではない場合、処理はステップＳ１６０９からステップＳ１６１０に移行し、ステップＳ１６１０ではＶｏｌｕｍｅが１０，すなわち最大値であるか否かが判断される。Ｖｏｌｕｍｅが１０である場合には処理をステップＳ１６０１に戻し、Ｖｏｌｕｍｅが１０でない場合には、処理はステップＳ１６１１に移行し、制御情報生成部１７は制御信号を制御部１５０２に送出し、制御部１５０２はステップＳ１６１２でＶｏｌｕｍｅの値を１つづつ加算する。
【００８６】
以上の説明により、本実施形態におけるオーディオ処理装置によって、ダイジェスト部分間で発生しやすいノイズの出力を抑制することが出来る。
【００８７】
なお、本実施形態ではＶｏｌｕｍｅの値を下げる場合には０まで下げていたが、これに限定されるものではなく、人間に知覚されがたいレベル（値）であれば０でなくとも例えば２や３などのレベルでも良い。
【００８８】
また、本実施形態ではＶｏｌｕｍｅの値を上げる場合には１０まで上げていたが、これに限定されるものではなく、予め設定した、もしくは所定レベルまで上げても良い。
【００８９】
［第５の実施形態］
上記実施形態に加えて出力時に新たなファイルフォーマットで記述することはもちろん可能である。この際に例えばアンシラリデータ領域等にダイジェスト情報を記載することはもちろんかまわない。
【００９０】
また、上記実施形態において透かしの埋め込み方法として、ダイジェスト先頭フレームに終端位置の情報を、また、ダイジェスト終端フレームに次のダイジェスト先頭位置の情報を埋め込む方法を取っているが、これはその他の方法をとってもかまわない。例えば、ダイジェストとなるフレーム全てに対して、予め定められた同じ透かしを埋め込む方法をとっても良い。また、先頭や終端部分の複数フレームに透かしを分割して埋め込んでも良い。
【００９１】
［その他の実施形態］
さらに、本発明は上記実施形態を実現するための装置及び方法のみに限定されるものではなく、上記システム又は装置内のコンピュータ（ＣＰＵあるいはＭＰＵ）に、上記実施形態を実現するためのソフトウエアのプログラムコードを供給し、このプログラムコードに従って上記システムあるいは装置のコンピュータが上記各種デバイスを動作させることにより上記実施形態を実現する場合も本発明の範疇に含まれる。
【００９２】
またこの場合、前記ソフトウエアのプログラムコード自体が上記実施形態の機能を実現することになり、そのプログラムコード自体、及びそのプログラムコードをコンピュータに供給するための手段、具体的には上記プログラムコードを格納した記憶媒体は本発明の範疇に含まれる。
【００９３】
この様なプログラムコードを格納する記憶媒体としては、例えばフロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ等を用いることができる。
【００９４】
また、上記コンピュータが、供給されたプログラムコードのみに従って各種デバイスを制御することにより、上記実施形態の機能が実現される場合だけではなく、上記プログラムコードがコンピュータ上で稼働しているＯＳ（オペレーティングシステム）、あるいは他のアプリケーションソフト等と共同して上記実施形態が実現される場合にもかかるプログラムコードは本発明の範疇に含まれる。
【００９５】
更に、この供給されたプログラムコードが、コンピュータの機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに格納された後、そのプログラムコードの指示に基づいてその機能拡張ボードや機能格納ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって上記実施形態が実現される場合も本発明の範疇に含まれる。
【００９６】
【発明の効果】
以上の説明により、本発明によって、ファイルフォーマットに関係なくダイジェスト部分を再生することができる。また、本発明によって、ダイジェスト部分間を再生する場合、ノイズの発生を抑制することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。
【図２】本発明の第２の実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。
【図３】本発明の第３の実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。
【図４】本発明の第１の実施形態におけるオーディオ処理装置の基本構成を示すブロック図である。
【図５】算出される先頭位置を説明する図である。
【図６】各フレームの符号化データにおいて、ダイジェスト部分に対する埋め込みの一例を示す図である。
【図７】ダイジェスト情報としてのリストの構成例を示す図である。
【図８】本発明の第１の実施形態におけるオーディオ処理装置のＲＡＭ１２０２の構成を示す。
【図９】本発明の第２の実施形態におけるオーディオ処理装置のＲＡＭ１２０２の構成を示す。
【図１０】本発明の第１の実施形態におけるオーディオ処理のフローチャートである。
【図１１】本発明の第２の実施形態におけるオーディオ処理のフローチャートである。
【図１２】本発明の第２の実施形態におけるオーディオ処理装置１１の基本構成を示すブロック図である。
【図１３】本発明の第４の実施形態におけるオーディオ処理装置の機能構成を示すブロック図である。
【図１４】本発明の第４の実施形態におけるオーディオ処理のフローチャートである。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to audio processing technology.
[0002]
[Prior art]
Codes such as Dolby Digital (AC-3), ATRAC-3, MPEG (Moving Picture Coding Experts Group) 1 Layer II, Layer III (MP3), and MPEG2-AAC, which are transform coding systems, are used as high-quality audio coding systems. Is widely used in the world.
[0003]
Some of these coding systems are internationally standardized by ISO (International Organization for Standardization).
[0004]
With the spread of the digital encoding standards as described above, file formats for handling these on a computer or the like have been formulated. For example, MPEG-4 defines a file format in that standard. Further, many file formats are widely used depending on the OS of the computer and the configuration of the network.
[0005]
With the spread of the digital coding standard as described above, the problem of copyright protection has been strongly raised by the content industry. On the other hand, digital watermark technology has been developed for security-related information and encryption. This is a technique for embedding a small amount of information at a level at which data does not change or a change cannot be perceived during data reproduction. Several techniques have been disclosed as techniques for embedding a digital watermark in audio data (see Patent Documents 1 and 2).
[0006]
Such a digital watermark is mainly used to protect the copyright of multimedia data by embedding the copyright information of the multimedia data, but can be applied to other purposes.
[0007]
[Patent Document 1]
JP 2001-22089 A
[Patent Document 2]
JP-A-11-316599
[0008]
[Problems to be solved by the invention]
In these file formats, various methods are used for frame synchronization and frame control. Therefore, if you want to play back the audio data stored in these files in digest format, the digest information is shared between formats because the position information of the frame to be digested is stored separately from the encoded data in a format unique to each format. There is a problem that it cannot be done.
[0009]
For example, even if the file stores the same MPEG1-Audio Layer II data, the method of adding a time stamp for synchronizing with a moving image is different between the AVI file format and the QuickTime file format. When variable-length coding is performed, the data amount for each frame is not constant. Therefore, when performing the digest reproduction, the position information and the respective encoding modes of the frames to be reproduced in the stream are stored in the hint track of the file format, and the information read position is controlled using the information during the digest reproduction. In general, a method of realizing this is realized.
[0010]
However, in such a method, when a file format is converted, it disappears if there is no area corresponding to the format of the conversion destination, and it may be very difficult to perform random access in another file format. In addition, adding such information to the bit stream risks breaking compatibility with other data. Depending on the code, additional information can be described in an ancillary data area in a data area in which arbitrary data is written, for example, in each frame if MPEG1-Audio Layer III is used, but data stored in this area is optional for each application. There is a problem that compatibility cannot be obtained.
[0011]
The present invention has been made in view of the above problems, and an object of the present invention is to provide an audio processing technique that enables a digest portion to be reproduced regardless of a file format.
[0012]
In addition, when audio data is digest-reproduced, if the digest portion is reproduced as it is, not only is there a break in the sound, but also there is a problem that frequent popping noise occurs.
[0013]
Another object of the present invention is to provide an audio processing technique for suppressing generation of noise when reproducing between digest portions.
[0014]
[Means for Solving the Problems]
In order to achieve the object of the present invention, for example, an audio processing method of the present invention has the following configuration.
[0015]
That is, an encoding step of encoding the input audio signal for each frame and generating encoded data for each frame;
An embedding step of embedding information for specifying encoded data of a frame group including a digest portion in encoded data of a predetermined frame among the input audio signals in an input audio signal; When
It is characterized by having.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings according to a preferred embodiment. In short, the present invention encodes an input audio signal for each frame, generates encoded data for each frame, and receives input data. In the audio signal, information for specifying encoded data of a frame group including a digest portion is embedded in encoded data of a predetermined frame among frames of the input audio signal.
[0017]
In the case of decoding, the position information is embedded in the encoded data of a plurality of frames, including the encoded data of the frame in which the position information of the digest part of the audio signal in the encoded data of the audio signal is embedded. Audio processing for extracting position information from a frame, decoding a frame based on the extracted position information from encoded data of a plurality of frames, and reproducing the frame is performed. Hereinafter, a specific example of such a configuration will be described as an embodiment.
[0018]
[First Embodiment]
FIG. 1 is a block diagram illustrating a functional configuration of the audio processing device according to the present embodiment. Reference numeral 1 denotes an audio processing device main body. Reference numeral 2 denotes a microphone for inputting an audio signal. Reference numeral 3 denotes an encoding unit that encodes an audio signal input via the microphone 2 in frame units. Reference numeral 4 denotes digest information generation for extracting a characteristic portion based on each characteristic (sound pressure, frequency, and the like) of the audio signal and storing, as digest information, the start positions of encoded data corresponding to the start and end of the extracted audio signal. Department. Reference numeral 5 denotes a watermark generation unit that generates watermark data from the output of the digest signal generation unit 4, and reference numeral 6 denotes a watermark insertion unit that embeds the watermark data into encoded data. Reference numeral 7 denotes a storage device for recording the generated encoded data.
[0019]
Each of the encoding unit 3, the digest information generating unit 4, the watermark generating unit 5, and the watermark inserting unit 6 shown in FIG. 1 may be configured by hardware or may be configured by software. In short, an encoding unit that encodes an input audio signal for each frame and generates encoded data for each frame, and information for identifying encoded data of a frame group including a digest portion in the input audio signal. A configuration may be provided that functions as a watermark insertion unit that embeds coded data of a predetermined frame among the frames of the input audio signal.
[0020]
In the following description, each unit is described as being configured by software.
[0021]
FIG. 4 is a block diagram illustrating a basic configuration of the audio processing device 1 according to the present embodiment. The audio processing device according to the present embodiment includes a CPU 1201, a RAM 1202, a ROM 1203, a keyboard 1204, a mouse 1205, an I / F 1206, a microphone 1207, a display unit 1208, an external storage device 1209, a storage medium drive 1210, a network I / F 1211, and a bus 1212. Have been. Further, since the encoding unit 3, the digest information generating unit 4, the watermark generating unit 5, and the watermark inserting unit 6 shown in FIG. 1 are each configured by software in the present embodiment, the CPU 1201 executes the software program. Thus, the audio processing device according to the present embodiment has the functional configuration shown in FIG.
[0022]
The CPU 1201 controls the entire apparatus using programs and data stored in the RAM 1202 and the ROM 1203, and controls each unit in order to perform audio processing described later. The RAM 1202 includes an area for temporarily storing programs and data read from the external storage device 1209 and the storage medium drive 1210, data to be processed, and the like, and also includes a work area used when the CPU 1201 performs various processes. The ROM 1203 holds programs and data for controlling the entire apparatus.
[0023]
A keyboard 1204 and a mouse 1205 are user command input devices, and can input various instructions to the CPU 1201. The microphone 1207 is for inputting an audio signal, and corresponds to the microphone 2 shown in FIG. The audio signal input from the microphone 1207 is A / D converted by the I / F 1206, and the result is written to the RAM 1202. The display unit 1208 includes a CRT, a liquid crystal screen, and the like, and can display various types of image information and character information.
[0024]
The external storage device 1209 is a large-capacity information storage device such as a hard disk drive, and stores an OS, programs and data for performing audio processing described later. As described above, the program for performing the audio processing described below includes the encoding unit 3, the digest information generation unit 4, the watermark generation unit 5, and the watermark insertion unit 6 illustrated in FIG. The storage device 7 shown in FIG. 1 corresponds to the external storage device 1209.
[0025]
The storage medium drive 1210 reads a program or data from a storage medium such as a CD-ROM or a DVD-ROM, and outputs the program or data to the RAM 1202 or the external storage device 1209. Note that a program and data for performing the audio processing described later may be stored in this storage medium.
[0026]
When the storage medium is a writable information device, the storage medium drive 1210 can write various programs and data to the storage medium. The network I / F 1211 is an I / F for connecting the audio processing device to a network such as the Internet or a LAN, and downloads programs and data to the audio processing device from other devices connected to the network. Also, transmission of a program or data from the audio processing device to another device connected to the network is performed via the network I / F 1211. The bus 1212 connects the above-described units, and the units can perform data communication with each other via the bus 1212.
[0027]
In the present embodiment, a case will be described in which digest information is generated only from an audio signal. For convenience of explanation, MPEG2-AAC is used as an encoding method, but other conversion encoding methods such as MPEG1 Audio Layer I, II, III, ATRAC3, and AC-3 can be realized in exactly the same manner. It is.
[0028]
Hereinafter, audio processing performed by the CPU 1201 executing a software program including the encoding unit 3, the digest information generation unit 4, the watermark generation unit 5, and the watermark insertion unit 6 illustrated in FIG. 1 will be described. .
[0029]
First, the audio signal collected by the microphone 2 (1207) is continuously input to the encoding unit 3. The encoding unit 3 encodes the input audio signal by the AAC encoding method for each frame, and stores encoded data of each frame in a memory (RAM 1202). Further, the encoding unit 3 sends the position information of the encoded data of each frame to the digest information generation unit 4 every time the encoding process of each frame is completed. In this embodiment, the position information is the number of bytes (head position) from the head of the encoded data. In the present embodiment, MPEG2-AAC is used as the encoding method, and it is assumed that each frame is created in byte units. However, in the case of an encoding method in which each frame is not in byte units, The number of bits from the beginning may be used as the information.
[0030]
The digest information generation unit 4 appropriately analyzes the sound pressure and frequency of the audio signal (audio signal) input from the microphone 2 and determines a portion (digest portion) to be extracted as a digest based on changes or continuity of the sound pressure and frequency. Then, the start position of the encoded frame corresponding to the start and end of the digest part is calculated based on the position information sent from the encoding unit 3, and is sequentially stored in the memory.
[0031]
As shown in FIG. 5, a list of the start position information of the encoded data of the encoded frames at the beginning and end of the digest part is defined as digest information. It should be noted that such a digest generation technique based on voice is generally used and well-known, and thus will not be described in detail here.
[0032]
The watermark generation unit 5 receives the start position of the start frame of the digest part and the start position of the end frame of the digest part from the digest information generation unit 4 and treats them as, for example, 32-bit positive integers, and sets them as watermark data. The generated watermark data and the preset embedding position are input to the watermark insertion unit 6.
[0033]
The watermark insertion unit 6 embeds the generated watermark data at a preset position of the encoded data read from the encoding unit 3 based on the digest information stored in the digest information generation unit 4. In the present embodiment, the position of the end frame is embedded in the head frame of the digest part, and the head position of the next digest part is embedded in the end frame.
[0034]
This embedding will be described with reference to FIG. FIG. 6 is a diagram showing an example of embedding of a digest part in encoded data of each frame. In the figure, the hatched portion is a group of encoded data of the frame including the digest portion. In the diagram, the head position is the encoded data from the 50th byte to the 100th byte, and the head position is the 200th byte from the 200th byte (200 + α). The encoded data up to the byte (α> 0) corresponds to this.
[0035]
In this state, the watermark inserting unit 6 embeds the number of bytes at the head position of the 100th byte of the encoded data which is the last frame of the same digest portion, that is, the numerical value “100” into the 50th byte of the encoded data. . Also, "200" which is the number of bytes at the head position of the next digest part is embedded in the 100th byte of the encoded data. In this way, when the first embedded frame can be searched for from the first encoded frame, the frame at the position represented by the numerical value embedded in this frame is extracted from this frame. The digest portion can be reproduced by setting up to the reproduction target. Also, if the numerical value embedded in the last frame of the digest part can be extracted, the digest part can be subsequently reproduced. Note that the position information of the first frame and the last frame is not limited to the number of first bytes, and other indices may be used.
[0036]
As a method of embedding, for example, a method of increasing or decreasing each frequency spectrum in the highest frequency subband in each encoded data within a range of ± 1, and intentionally changing the encoded data to an odd or even number, or the like. Then, the watermark data may be embedded. That is, if one bit of the data to be embedded is 0, the last frequency spectrum is an even number, and if 1 is 1, the last frequency spectrum is an odd number.
[0037]
In addition, an existing method such as the method described in JP-A-11-316599 “Digital watermark embedding device, audio encoding device and recording medium” may be used. The encoded data in which the watermark data is embedded as described above is stored at a predetermined position in the storage device 7 (external storage device 1209).
[0038]
FIG. 10 shows a flowchart of the audio processing according to the present embodiment described above. First, in step S201, the apparatus is initialized. Next, in step S202, the encoding unit 3 encodes the audio signal input from the microphone 2 (1207), and stores the position information of each frame in the memory (RAM 1202). In parallel with this, the digest information generating unit 4 analyzes the audio signal input from the microphone 2 (1207) in step S203, and determines the digest unit by the above method. Then, a list of the start position information of the encoded frames at the beginning and end of the determined portion is generated as digest information.
[0039]
FIG. 7 shows a configuration example of this list. An ID number is assigned to each digest part, and each ID number has the frame number and position information (the number of bytes from the beginning) of the first frame of the digest part, and the frame number and position information (the bytes from the beginning) of the end frame. Numbers) correspond.
[0040]
Next, in step S205, the watermark generation unit 5 forms watermark data from the digest information. In step S206, the watermark insertion unit 6 embeds the watermark data in the encoded data. The encoded data of all the included frames is stored in the storage device 7 (external storage device 1209).
[0041]
FIG. 8 shows a configuration of the RAM 1202 of the audio processing device according to the present embodiment. In the RAM 1202, the OS, the program of the encoding software functioning as the encoding unit 3, the program of the watermark embedding software functioning as the watermark inserting unit 6, and the program of the digest information generating software functioning as the digest information generating unit 4 Has been loaded. The RAM 1202 has an audio area for storing an input audio signal, a code area for storing encoded data of each frame, and a working area used by the CPU 1201 for performing various processes.
[0042]
As described above, the audio processing apparatus according to the present embodiment can perform digest reproduction on the decoding side by embedding information of a part to be decoded and reproduced at the time of digest reproduction as a watermark.
[0043]
Also, by embedding the information indicating the digest part in the encoded data, the digest part can be specified on the decoding side without depending on the format of the file including the encoded data.
[0044]
In the present embodiment, the encoded data of all the frames including the frame in which the watermark data is embedded are stored in the storage device 7. However, the present invention is not limited to this. The information may be stored, or may be transmitted to an external device connected to the network via the network I / F 1211.
[0045]
[Second embodiment]
In this embodiment, it is possible to decode and reproduce the encoded data of all the frames including the frame in which the watermark data is embedded, which is generated by the audio processing apparatus in the first embodiment, and to reproduce the digest part. The processing device will be described below. Note that the audio processing device according to the present embodiment also employs the MPEG2-AAC encoding method to decode the encoded data by the audio processing device according to the first embodiment, but is not limited thereto.
[0046]
FIG. 2 is a block diagram illustrating a functional configuration of the audio processing device according to the present embodiment. Reference numeral 11 denotes a main body of the audio processing device according to the present embodiment. Reference numeral 12 denotes a storage device in which encoded data generated by the audio processing device according to the first embodiment is recorded. Reference numeral 13 denotes a terminal for a user (not shown) to start the audio processing apparatus, set various conditions, and issue a reproduction instruction. A control unit 14 controls the storage device 12. Reference numeral 15 denotes a watermark extracting unit that extracts watermark data from encoded data. Reference numeral 16 denotes a digest information reproducing unit that reproduces digest information from the extracted watermark data. Reference numeral 17 denotes a control information generation unit that generates information for controlling the control unit 14 so as to reproduce only the frame specified by the digest information. A decoding unit 18 decodes the encoded audio data. Reference numeral 19 denotes a speaker for outputting a decoded and reproduced audio signal.
[0047]
Each of the control unit 14, the watermark extracting unit 15, the digest information reproducing unit 16, the control information generating unit 17, and the decoding unit 18 shown in FIG. 1 may be configured by hardware or may be configured by software. good. In short, in the encoded data of the audio signal, in the encoded data of a plurality of frames, including the encoded data of the frame in which the position information of the digest part of the audio signal is embedded, What is necessary is just to have a configuration that functions as a watermark extracting unit that extracts information and a decoding unit that decodes and reproduces a frame based on the extracted position information from the encoded data of the plurality of frames.
[0048]
In the following description, each unit is described as being configured by software.
[0049]
FIG. 12 is a block diagram illustrating a basic configuration of the audio processing device 11 according to the present embodiment. In the audio processing device according to the present embodiment, a speaker 1507 is connected to the I / F 1506 instead of the microphone 1207 of the audio processing device according to the first embodiment, and the I / F 1506 performs D / A conversion on the decoded audio signal. The configuration is the same as that of the audio processing device according to the first embodiment except that the audio processing device outputs the audio signal to the speaker 1507.
[0050]
In the present embodiment, each of the control unit 14, the watermark extraction unit 15, the digest information reproduction unit 16, the control information generation unit 17, and the decryption unit 18 shown in FIG. When executed by the CPU 1501, the audio processing device according to the present embodiment has the functional configuration illustrated in FIG.
[0051]
Hereinafter, audio performed by the CPU 1501 executing a software program including the control unit 14, the watermark extracting unit 15, the digest information reproducing unit 16, the control information generating unit 17, and the decoding unit 18 illustrated in FIG. The processing will be described.
[0052]
First, audio data to be reproduced is selected from the terminal 13. This terminal may be the audio processing device according to the first embodiment, or may be a user command input device such as a keyboard 1504 or a mouse 1505. The control unit 14 sends a control signal to the storage device 12 (external storage device 1509, storage medium drive 1510) so that the head of the selected audio data can be read. The storage device 12 changes the read position according to the control signal and starts reading.
[0053]
The read encoded data is input to the audio processing device 11 on a frame-by-frame basis, and is input to the watermark extracting unit 15. When normal reproduction is performed according to an instruction from the terminal 13, the watermark extraction unit 15 does not perform any processing, bypasses the input, and sends the encoded data to the decoding unit 18. On the other hand, when the digest reproduction is performed according to the instruction from the terminal 13, the watermark extracting unit 15 converts the encoded data into 32 in accordance with the reverse procedure of the processing procedure performed by the watermark inserting unit 7 in FIG. 1 of the first embodiment. Bit watermark data is extracted and input to the digest information reproducing unit 16.
[0054]
The digest information reproducing unit 16 analyzes the watermark data, reproduces the position information (the number of bytes from the beginning) of the encoded data of the start frame and the encoded data of the end frame of the digest part as digest information, and reproduces the control information generation unit 17. Notify The control information generation unit 17 generates information for controlling the control unit 14 to reproduce only the frame of the digest part based on the notified information.
[0055]
The encoded data from which the extraction of the watermark data has been completed is directly input to the decoding unit 18, subjected to decoding processing, returned to an audio signal, and output as sound from the speaker 19 (1507).
[0056]
Hereinafter, the processing performed by each unit illustrated in FIG. 2 when performing the digest reproduction will be described in further detail. The watermark extracting unit 15 refers to the encoded data of each frame stored in the storage device 12 in order from the first frame, and first determines that the frame is the frame in which the watermark data is embedded (first frame). Is specified. The watermark extracting unit 15 extracts the watermark data embedded in the encoded data of the specified first frame, and converts the extracted numerical data represented by an integer of 32 bits into a digest having the first frame as the first frame. This position information and the position information of the first frame are sent to the control information generation unit 17 as the position information (the number of bytes from the head) of the frame at the end of the part (the second frame).
[0057]
The control information generation unit 17 generates control instruction information such that the control unit 14 sends the encoded data from the first frame to the second frame from the storage device 12 to the decoding unit 18, Send out.
[0058]
The control unit 14 receives the control instruction information, reads out the encoded data of the instructed frame (from the first frame to the second frame) from the storage device 12 and sends it to the decoding unit 18. The decoding unit 18 decodes the received encoded data, performs D / A conversion, and outputs the sound to the speaker 19 (1507).
[0059]
After reproducing the second frame, the watermark extracting unit 15 extracts the watermark data embedded in the second frame, and converts the extracted numerical data represented by a 32-bit integer into the next digest. The first frame of the portion (which is referred to as a new first frame) is sent to the digest information reproducing unit 16 and the watermark data embedded in the first frame is sent to the end frame of the next digest portion (this is referred to as the first frame). It is transmitted to the digest information reproducing unit 16 as position information (the number of bytes from the head) of the new frame (which is a new second frame). The subsequent processing is the same.
[0060]
FIG. 11 shows a flowchart of the decoding / reproducing processing, which is the audio processing in the present embodiment, described above. First, in step S301, the audio processing device is initialized, and in step S302, the end of the process is determined. If not, the watermark extraction unit 15 reads the encoded data of the first frame of the audio data to be reproduced specified by the terminal 13 in step S303. Next, in step S304, the control unit 14 determines whether the instruction from the terminal 13 is the normal reproduction mode (the mode for performing the normal reproduction) or the digest reproduction mode (the mode for reproducing the digest part). Then, the subsequent processing is changed.
[0061]
In the case of the digest reproduction mode, the process proceeds to step S305. In the case of normal reproduction, the process proceeds to step S308. In the case of the normal reproduction mode, in step S308, the watermark extraction unit 15 sends one frame of encoded data to the decoding unit 18 without doing anything, and the decoding unit 18 decodes and decodes the transmitted one frame of encoded data. The result is D / A converted and output to the speaker 19 (1507).
[0062]
On the other hand, when performing the digest reproduction, in step S305, the watermark extracting unit 15 extracts the watermark data from the encoded data of the frame in which the watermark data is embedded as the position information of the encoded data of the first frame of the digest part, The position information of the frame in which the watermark data is embedded and the position information represented by the watermark data are sent to the digest information reproducing unit 16. In step S306, the digest information reproducing unit 16 reproduces the digest information from the two pieces of transmitted position information. In step S307, the control information generating unit 17 generates information for controlling the control unit 14 based on the digest information. Then, in step S309, the control unit 14 sends the encoded data of the frame specified by the digest information from the storage device 12 to the decoding unit 18, and the decoding unit 18 decodes the encoded data one frame at a time, and the speaker 19 (1507) Output to
[0063]
If the frame being reproduced in step S310 is the end frame, the process proceeds to step S312, where the watermark extracting unit 15 sets the numerical value representing the watermark data embedded in the end frame as the position information of the first digest of the next digest. And the watermark data embedded in the first frame is extracted as the position information of the last frame of the next digest. That is, the digest part is specified next. Then, in step S313, the control unit 14 causes the storage unit 12 to read the encoded data of the frame of the next digest part.
[0064]
FIG. 9 shows a configuration of the RAM 1202 of the audio processing device according to the present embodiment. The RAM 1202 stores the OS, a program of decoding software functioning as the decoding unit 18, a program of watermark extraction software functioning as the watermark extracting unit 15, and a program of digest information generating software functioning as the digest information reproducing unit 16. Has been loaded. The RAM 1202 has an audio area for storing decoded audio signals, a code area for storing encoded data of each frame, and a working area used by the CPU 1201 for performing various processes.
[0065]
As described above, the digest part can be reproduced by the audio processing device according to the present embodiment without depending on the format of the file including the encoded data.
[0066]
[Third Embodiment]
FIG. 3 is a block diagram showing a functional configuration of the audio processing device according to the present embodiment. In the figure, the same parts as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted.
[0067]
In FIG. 3, reference numeral 101 denotes an audio processing apparatus main body according to the present embodiment. Reference numeral 102 denotes a digest information generating unit, but unlike the digest information generating unit 4 in the first embodiment, does not have a function of analyzing an audio signal. Instead, it receives the time information (time stamp) at the beginning and end of the digest part transmitted from the outside (moving picture coding system described later), and based on the information, appropriately starts the leading part of the digest part of the encoded data. It converts them into the positions of the frame and the end frame, and generates digest information that holds these as a list. A moving image encoding system 103 outputs the time stamp to an audio processing device.
[0068]
The basic configuration of the audio processing apparatus according to the present embodiment is the same as that of the first embodiment shown in FIG.
[0069]
The operation of processing audio data in the audio processing device configured as described above will be described below. The audio signal collected by the microphone 2 is continuously input to the encoding unit 3, which encodes the input audio signal for each frame by the AAC encoding method, and for each frame. The encoded data is stored in the memory (RAM 1202). In addition, every time the encoding process of one frame is completed, the position information of each frame is transmitted to the digest information generating unit 102.
[0070]
At the same time, the digest information generation unit 102 receives the time information of the beginning and end of the part extracted as a digest from the video encoding system 103, and determines the start position of the encoded data of the frame corresponding to these at the encoding unit 3 Is calculated based on the position information sent from the server and stored sequentially. In the present embodiment, the digest information based on the moving image may be automatically generated by analyzing the moving image in the moving image encoding system, or may be generated intentionally by the user. In other words, it does not depend on the method used to generate it.
[0071]
As in the first embodiment, the watermark generation unit 5 generates watermark data from the position of the leading frame and the position of the terminal frame of the digest part. The generated watermark data and the embedding position are input to the watermark insertion unit 6. The watermark insertion unit 6 embeds the generated watermark data at an appropriate position of the encoded data read from the encoding unit 3 as in the first embodiment. The encoded data in which the watermark data is embedded is stored at a predetermined position in the storage device 7.
[0072]
In this embodiment, the audio encoding method is MPEG2-AAC, but other encoding methods such as MPEG1 Audio Layer I, II, III, MPEG4, ATRAC3, and AC-3 may be used.
[0073]
Note that the audio data encoded by the audio processing device according to the present embodiment can be decoded by the audio processing device according to the second embodiment, and the digest reproduction can be performed similarly.
[0074]
[Fourth embodiment]
The audio processing device according to the present embodiment decodes encoded data of an audio signal obtained by the audio processing device according to the first embodiment or the third embodiment, and enables a digest portion to be reproduced. When a plurality of digest portions are continuously reproduced, noise that is likely to occur between the digest portions is suppressed.
[0075]
FIG. 13 shows a functional configuration of the audio processing device according to the present embodiment. In the figure, the same parts as those in FIG. 2 are denoted by the same reference numerals, and description thereof will be omitted. Reference numeral 1501 denotes an audio processing apparatus main body according to the present embodiment. 1502 basically performs the same operation as the control unit 14 of the second embodiment, but additionally controls the volume 21. The volume 21 controls the output (amplitude) of the audio signal decoded by the decoding unit 18 to a magnitude corresponding to the set value. Therefore, the audio signal resulting from the control by the volume 21 is output from the speaker 19.
[0076]
The basic configuration of the audio processing apparatus according to the present embodiment is as shown in FIG. When the volume 21 is configured by hardware, it is necessary to add the volume 21 as hardware to the configuration illustrated in FIG. When the volume 21 is configured by software, the software is stored in the external storage device 1209 and is read out to the RAM 1202 as needed. The data may be read from a storage medium by the storage medium drive 1210.
[0077]
The audio processing performed by the audio processing apparatus according to the present embodiment having the above configuration will be described below. First, each unit is initialized, and the value set for the volume 21 is set to 10. This value indicates the size of amplifying the audio signal, that is, the volume (volume) of the sound output from the speaker 19, and the larger this value is, the larger the volume of the sound output from the speaker 19 is. . In this embodiment, the setting range of this value is 0 to 10.
[0078]
As in the second embodiment, the encoded data read from the storage device 12 is subjected to a decoding process by the decoding unit 18 to obtain an audio signal in frame units. The obtained audio signal is sent to the volume 21. Here, the control information generation unit 17 obtains the position information of the encoded data of the start frame of the digest part and the encoded data of the end frame as digest information from the digest information reproducing unit 16, and the decoding unit 18 determines the end frame of the digest part. At the time of decoding and reproduction, a control signal for controlling to reduce the value set in the volume 21 by one is sent to the control unit 1502.
[0079]
The speed at which the volume is reduced by one is, for example, a speed at which the value set in the volume 21 becomes 0 within the time for reproducing the end frame. By performing the reproduction of the end frame while reducing the volume in this manner, the last part of the digest part can be faded out.
[0080]
When this value becomes 0, the control unit 1502 controls each unit so as to reproduce the next digest part. Then, when the reproduction of the next digest portion starts, the value set in the volume 21 is 0, so the control information generating unit 17 sends a signal for performing control to increase the value by 1 until the value becomes 10 by the control unit 1502. To send to.
[0081]
The speed at which the volume is increased by one is, for example, a speed at which the value set in the volume 21 becomes 10 within the time for reproducing the leading frame. By performing the reproduction of the front end frame while increasing the volume in this manner, the head portion of the digest portion can be faded in. Further, since the volume of the sound is reduced only for the digest portion by the above-described fade-in and fade-out, it is possible to suppress the output of noise that is likely to occur between the digest portions.
[0082]
FIG. 14 shows a flowchart of audio processing performed by the audio processing device in the present embodiment. Note that the processing in steps S1600 to S1609 is the same as the processing in steps S301 to S310 shown in FIG. 11 of the first embodiment, and a description thereof will be omitted. That is, the processing from step S1609, which is a feature of the audio processing according to the present embodiment, will be described.
[0083]
If the frame decoded in step S1607 is the end frame of the digest part, the process proceeds to step S1613. In step S1613, it is determined whether the value (Volume) set in the volume 21 is 0. If the value is 0, the process proceeds to step S1616, and the control unit 1502 controls each unit to read the encoded data of the next digest part frame in the same manner as in steps S312 and S313.
[0084]
On the other hand, if Volume is not 0, the process proceeds to step S1614, the control information generation unit 17 sends a control signal to the control unit 1502, and the control unit 1502 reduces the value of Volume by one in step S1615.
[0085]
On the other hand, if the frame decoded in step S1607 is not the end frame of the digest portion, the process proceeds from step S1609 to step S1610, and in step S1610, it is determined whether Volume is 10, that is, the maximum value. If Volume is 10, the process returns to step S1601. If Volume is not 10, the process proceeds to step S1611, and the control information generation unit 17 sends a control signal to the control unit 1502, and the control information generation unit 17 sends the control signal to the control unit 1502. Adds the Volume values one by one in step S1612.
[0086]
As described above, the audio processing apparatus according to the present embodiment can suppress the output of noise that is likely to occur between the digest portions.
[0087]
In the present embodiment, when lowering the value of Volume, the value is lowered to 0. However, the present invention is not limited to this. A level such as 3 may be used.
[0088]
Further, in the present embodiment, when the value of Volume is increased, the value is increased to 10. However, the present invention is not limited to this, and may be increased to a preset level or a predetermined level.
[0089]
[Fifth Embodiment]
In addition to the above-described embodiment, it is of course possible to describe in a new file format at the time of output. At this time, the digest information may be described in, for example, an ancillary data area.
[0090]
Also, in the above embodiment, as a method of embedding a watermark, a method of embedding the information of the end position in the digest start frame and the information of the next digest start position in the digest end frame is adopted. It doesn't matter. For example, a method of embedding the same predetermined watermark in all the frames to be digests may be adopted. Further, the watermark may be divided and embedded in a plurality of frames at the beginning and end.
[0091]
[Other embodiments]
Furthermore, the present invention is not limited to only the apparatus and method for realizing the above-described embodiment, and the computer (CPU or MPU) in the system or the apparatus is provided with software for realizing the above-described embodiment. The present invention also includes a case where the above-described embodiment is implemented by supplying a program code and causing the computer of the system or apparatus to operate the various devices according to the program code.
[0092]
In this case, the software program code itself implements the functions of the above-described embodiment, and the program code itself and means for supplying the program code to a computer, specifically, the program code The stored storage medium is included in the scope of the present invention.
[0093]
As a storage medium for storing such a program code, for example, a floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, CD-ROM, magnetic tape, nonvolatile memory card, ROM, or the like can be used.
[0094]
In addition to the case where the computer controls the various devices according to only the supplied program code to realize the functions of the above-described embodiment, the computer may execute an OS (Operating System) on which the program code operates. ) Or when the above embodiment is realized in cooperation with other application software or the like, the program code is included in the scope of the present invention.
[0095]
Further, after the supplied program code is stored in the memory provided in the function expansion board of the computer or the function expansion unit connected to the computer, the program code is stored in the function expansion board or the function storage unit based on the instruction of the program code. The present invention also includes a case where a provided CPU or the like performs part or all of the actual processing, and the processing realizes the above-described embodiment.
[0096]
【The invention's effect】
As described above, according to the present invention, the digest part can be reproduced regardless of the file format. Further, according to the present invention, when reproducing between the digest portions, the generation of noise can be suppressed.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a functional configuration of an audio processing device according to a first embodiment of the present invention.
FIG. 2 is a block diagram illustrating a functional configuration of an audio processing device according to a second embodiment of the present invention.
FIG. 3 is a block diagram illustrating a functional configuration of an audio processing device according to a third embodiment of the present invention.
FIG. 4 is a block diagram illustrating a basic configuration of the audio processing device according to the first embodiment of the present invention.
FIG. 5 is a diagram illustrating a calculated head position.
FIG. 6 is a diagram showing an example of embedding in a digest part in encoded data of each frame.
FIG. 7 is a diagram illustrating a configuration example of a list as digest information.
FIG. 8 shows a configuration of a RAM 1202 of the audio processing device according to the first embodiment of the present invention.
FIG. 9 shows a configuration of a RAM 1202 of the audio processing device according to the second embodiment of the present invention.
FIG. 10 is a flowchart of audio processing according to the first embodiment of the present invention.
FIG. 11 is a flowchart of audio processing according to the second embodiment of the present invention.
FIG. 12 is a block diagram illustrating a basic configuration of an audio processing device 11 according to a second embodiment of the present invention.
FIG. 13 is a block diagram illustrating a functional configuration of an audio processing device according to a fourth embodiment of the present invention.
FIG. 14 is a flowchart of audio processing according to a fourth embodiment of the present invention.

Claims

An encoding step of encoding the input audio signal for each frame and generating encoded data for each frame;
An embedding step of embedding information for specifying encoded data of a frame group including a digest portion in encoded data of a predetermined frame among the input audio signals in an input audio signal; An audio processing method comprising: