JP3960646B2

JP3960646B2 - Information storage device and information storage method

Info

Publication number: JP3960646B2
Application number: JP35561096A
Authority: JP
Inventors: 哲市村
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1996-12-24
Filing date: 1996-12-24
Publication date: 2007-08-15
Anticipated expiration: 2016-12-24
Also published as: JPH10191245A

Description

【０００１】
【発明の属する技術分野】
この発明は、例えば会議録記録システムや取材記録システムのように、会議や取材での会話音声や会議や取材風景の画像などの情報を記憶蓄積する情報蓄積装置および情報蓄積方法に関する。
【０００２】
【従来の技術】
会議や講演、取材、インタビュー、電話やテレビ電話を使用した会話、テレビ映像等の要点を記録する場合、通常、記録者が筆記で記録することが一般に行われていた。例えば、会議を記録する場合、会議出席者の一人が書記となって会議参加者全員の発言を逐一記録したり、重要な項目のみを選んで記録している。しかし、書き留めたメモから会議内容や話しの経緯を思い出せなかったり、誰の発言だったかを記録していなかったりするために、後で会議議事録を作成するのが困難になる状況がしばしば発生する。
【０００３】
そこで、従来から、会議や講演、取材、インタビュー、電話やテレビ電話を使用した会話、テレビ映像、監視カメラ映像等の記録を、デジタルディスク、デジタルスチルカメラ、ビデオテープ、半導体メモリなどに記憶蓄積し、再生する装置が提案されている。これらの情報蓄積装置を用いて情報の蓄積を行なえば、記録すべき情報の要点のみを記録者が筆記等して記録する方法に比べ、入力情報である音声や画像を漏らさず記録できるという利点がある。
【０００４】
これらの装置には、コンピュータネットワークを介して伝送されたデジタル信号を蓄積媒体に記録するものや、ビデオカメラやマイクロホンからのアナログ入力信号をそのまま蓄積媒体に記録するものや、符号化してデジタル信号に変換し、記録するものなどがある。
【０００５】
しかしながら、限られた記録容量の蓄積媒体中に、長時間の音声信号および／または画像信号を記録することは困難であるという問題があった。一般に、順次入力される音声信号または画像信号などの時系列データを長時間に渡って記録する場合に必要な記憶容量は膨大なものになるからである。
【０００６】
この問題点に対して、例えば、Video for Windows （”Microsoft Video for Windows 1.0 ユーザーズガイド”ｐｐ．５７−５９，ｐｐ．１０２−１０８）のように、音声信号や画像信号を常に圧縮しながら記憶媒体に記憶する方法が提案されている。しかし、その場合、入力されたすべての音声信号または画像信号は同じ圧縮率で記憶されるのが一般的である。このため、記録後に参照される可能性の少ない比較的重要でない情報を大量に記録してしまったり、重要な情報にもかかわらず記憶容量の関係で、品質の良い状態で記録できないという問題があった。
【０００７】
例えばインタビューの風景を、前記Video for Windows を用いて長時間記録しているような場合において、記憶容量を節約する目的で画像信号を５秒間に１フレームだけ記憶するように間引き圧縮率を設定していたとする。このとき、記録者が、記録時に重要だと感じた部分を後から再生したいと思ったとしても、５秒間に１フレームの画像信号しか再生できないため、話者が話しながら行なった動き（ジェスチャなど）や、話しぶりや、微妙なニュアンスを再現できないという問題がある。
【０００８】
しかし、逆に、入力される画像信号を、１秒間３０フレームですべて記憶しようとした場合、長時間のインタビューを記憶するのは、記憶容量として膨大なものが必要になり、実現が非常に困難である。
【０００９】
そこで、特開平５−６４１４４号公報と特開平５−１３４９０７号公報には、画像記憶媒体に記憶したデータ量が予め定めた量を超えた場合に、既に記憶されている画像情報の古いフレームから順に圧縮したり、フレームを間引いたりして、記憶容量を節約しようとする情報蓄積装置が記載されている。これらは、後に記憶された情報ほど重要な情報であると見なすことによって、先に記憶された情報を新しい入力情報によって上書きしたり、先に記憶された情報ほど圧縮率を高くしたりして、記憶容量を節約する装置である。
【００１０】
また、特開平２−３０５０５３号公報と特開平７−１５５１９号公報には、記憶媒体の空き容量が、ある量以下になったと認識された場合に、既に記憶されている音声情報を再圧縮することによって、記憶媒体の空き領域を確保する音声情報蓄積装置が述べられている。
【００１１】
また、特開平６−１４９９０２号公報記載の動画像記録装置は、自動シーンチェンジ検出を行ない、長いシーンほど重要なシーンであると見なすことによって、ダイジェストを生成する際には、ユーザが指定した時間長になるように、高い重要度を持ったシーンから順に抽出する装置である。
【００１２】
この公報記載の装置で生成されたダイジェストに含まれたシーンのみを残し、ダイジェストに含まれなかったシーンを削除するように構成すれば、重要情報を失うことなく記憶容量を節約することができる。
【００１３】
一方、特開平３−９０９６８号公報と特開平６−１４９９０２号公報には、ユーザが指定した時間長になるように映像のダイジェストを自動生成する装置が提案されている。特開平３−９０９６８号公報記載の装置は、シーン毎の重要度をユーザが予めエディタから入力しておき、ダイジェストを生成する際には、ユーザが指定した時間長になるように、高い重要度を持ったシーンから順に抽出する装置である。この装置の場合も、生成されたダイジェストに含まれたシーンのみを残すように構成すれば、重要情報を失うことなく記憶容量を節約できる。
【００１４】
【発明が解決しようとする課題】
しかしながら、特開平２−３０５０５３号公報と特開平７−１５５１９号公報に記載の装置は、記憶されていた音声情報を、全体に渡って同じ圧縮率で再圧縮する装置であり、記録している内容の重要箇所のみを部分的に圧縮率を低くして高音質で記録するというようなことはできないという問題がある。
【００１５】
また、会議、講演、取材、インタビュー等を記憶蓄積する情報蓄積装置において、特開平５−６４１４４号公報または特開平５−１３４９０７号公報に記載されているように、ただ単に、新しい記録を重要情報として残し、古い記録を不要情報として消去するように構成したとすると、重要な会議や重要な取材等の記録が、先に記録されたという理由だけで、新しい入力情報によって上書きされてしまうという問題がある。一般に、会議や取材が行なわれた日時だけに基づいて、その会議内容や取材内容の重要度を判定することはできないからである。
【００１６】
また、シーンの長さによってシーンの重要度を判定する特開平６−１４９９０２号公報記載の装置については、会議や講演を無人カメラで撮影しているような時には、カットチェンジやシーンチェンジによってシーンを切り分けることが非常に困難であり、シーンの長さを検出できないという問題があった。くわえて、会議や講演を撮影しているような場合には、短いシーンの中にでも重要な発言が含まれることがあるため、シーンの長さだけに基づいて、その会議内容や取材内容の重要度を判定することはできないという問題がある。
【００１７】
さらに、ユーザが予めシーン毎の重要度をエディタから入力するという特開平３−９０９６８号公報記載の装置についても、会議や講演を無人カメラで撮影しているような時には、カットチェンジやシーンチェンジによってシーンを切り分けることが非常に困難であるという問題がある。くわえて、撮影が終了した後に、エディタから重要度を入力するという作業は非常に煩わしく、会議や講演を記録するという用途には適さないという問題がある。
【００１８】
ところで、公知の技術として、記録時に情報の取捨選択を行ない、重要と認識された情報のみを記録したり、圧縮率を変化させて記録する装置が知られている。たとえば、特開平７−１２９１８７号公報には、音声取り込みキーを押したときの前後の音声を一定時間分だけ記録する装置が記載されている。また、市販されているテープレコーダの中には、無音区間は音声を記憶しないという無音区間検出機能を持ったものがある。
【００１９】
しかしながら、これらの装置は、一旦記録した後の情報を再圧縮するための手段を持たないため、情報の保存期間の長さによって段階的に圧縮率を変化させたり、記憶媒体の空き記憶容量の変化に応じて動的に圧縮率を変えたりといったことができず、記憶されている画像または音声情報を再圧縮する方法に比べて、圧縮効率が極めて悪いという問題があった。
【００２０】
また、特開平７−１２９１８７号公報記載の装置のように記録時に、リアルタイムに情報の取捨選択を行なう方法では、例えば、会議の中で最も数多く発言した人を特定し、この特定した人の発言部分の音声情報または画像情報のみを高品質で保存するといったようなことや、ユーザが指定した時間長になるように高い重要度を持ったシーンから順に抽出してダイジェストを作成するといったようなことはできない。すなわち、この公報記載の装置では、音声情報または画像情報の記録終了後に初めて得られる情報、または、記録しながらでは得られない情報を元にして、音声情報または画像情報の圧縮を行なうことができないという問題がある。
【００２１】
さらに、特開平７−１２９１８７号公報に述べられているように、トリガを検出した時の少し前の時系列情報を記録するためには、入力された時系列情報を一時記録するための記録用バッファメモリが必要となるため、装置が複雑かつ高価になるという問題があった。
【００２２】
この発明は、上記の問題点に鑑みて、入力される音声または画像信号のうち、特徴的な事象が起こっている重要期間の音声信号または画像信号のみを、記憶容量が限られた蓄積媒体の中に数多く記憶し、重要部分以外の音声信号または画像信号であっても少ないデータ量で長時間記憶できるようにし、さらに重要部分の最初から最後までを確実に再生できるようにすることを目的としている。
【００２３】
【課題を解決するための手段】
上記課題を解決するため、この発明による情報蓄積装置においては、
記憶すべき音声情報および／または画像情報を入力するための情報入力手段と、
前記情報入力手段から入力された前記音声情報および／または前記画像情報を圧縮して記憶する時系列情報記憶手段と、
前記時系列情報記憶手段に記憶された前記音声情報が、予め設定された所定の条件に合致する条件一致区間を検出する条件一致区間検出手段と、
前記音声情報および／または前記画像情報が前記時系列情報記憶手段に記憶された時刻を示す時刻情報を記憶すると共に、前記記憶された前記時刻情報により示される時刻からの経過時間が、予め設定された時間以上になったときに圧縮処理開始指示を出力する経過時間測定手段と、
前記経過時間測定手段からの前記圧縮処理開始指示により起動され、前記条件一致区間検出手段で検出された前記条件一致区間と他の区間とで、圧縮率あるいは圧縮方式を変更して、前記時系列情報記憶手段に記憶されている、前記圧縮された音声情報および／または画像情報のデータ量を、再圧縮する圧縮手段と、
を備えることを特徴とする。
【００２４】
条件一致区間検出手段での条件一致検出動作は、時系列情報記憶手段に音声情報を記憶する際に行うこともできるし、時系列情報記憶手段の記憶情報を圧縮する際に、この時系列情報記憶手段から音声情報を読み出して行うこともできる。
【００２５】
前者の場合には、請求項２の発明のように、条件一致区間検出手段で検出された条件一致区間を示す区間情報と、当該区間情報に対応する前記音声情報または前記画像情報の前記時系列情報記憶手段における記憶位置との対応関係を記憶する対応関係記憶手段を設ける。
【００２６】
圧縮処理は、時系列情報記憶手段に音声情報または画像情報を記憶した後の任意の時点で開始することができるが、予め定めた時点で自動的に実施するようにすることもできる。その場合には、請求項１２の発明のように、音声情報または画像情報が時系列情報記憶手段に記憶された時刻を示す時刻情報を記憶する時刻情報記憶手段を設け、圧縮手段は、時刻情報記憶手段に記憶された時刻情報によって定められる時刻からの経過時間が、予め定められた時間以上になったときに、圧縮処理を実行する。
【００２７】
また、時系列情報記憶手段の蓄積媒体の使用容量が所定量を越えたとき、あるいは前記蓄積媒体の空き容量が所定量以下になったときに、圧縮を実行するようにすることもできる。
【００２８】
【作用】
上述の構成のこの発明においては、一旦、時系列情報記憶手段に記憶された音声情報または画像情報は、後の時点において、圧縮されて、蓄積媒体の使用容量の削減が計られるが、予め設定された所定の条件に合致する条件一致区間の音声情報または画像情報は、高品質を保って、圧縮される。このため、重要な情報は高品質を保って保存しながら、時系列情報記憶手段の使用容量を削減することができる。
【００２９】
【実施例】
以下、図を参照して、この発明の実施の形態について説明する。
［第１の実施の形態］
第１の実施の形態は、この発明による情報蓄積装置を会議記録に適用した場合である。
【００３０】
概して、数日前に行なわれた会議を後から参照する可能性に比べて、１ヶ月前に行なわれた会議を参照する可能性は極めて低い。参照する可能性の小さくなった映像情報などの会議情報を高品質で蓄積したままにしておくことは、メモリ容量を節約するという面で非常に非効率的であり、適当なタイミングで、情報を削除または間引き圧縮等を施し、情報量を削減することが望ましい。
【００３１】
しかし、昔の会議記録であっても、重要な場面については、話者が話しながら行なった動き（ジェスチャなど）や、話しぶりや、微妙なニュアンスを再現したいという要求がある。したがって、このような特徴的な事象が起こっている重要期間の音声信号または画像信号は、高品質のままで保存しておくようにすることが要求される。
【００３２】
第１の実施の形態では、会議について音声情報および映像情報を記録し、その記録時点から１ヶ月が経過したときに、記録した会議映像の中で、活発に議論が交わされていた会議の重要部分の映像だけを高品質のまま残し、その他の部分は高圧縮率で圧縮するという、圧縮処理を施す例について説明する。
【００３３】
この第１の実施の形態によれば、後述するように、活発に議論が交わされていた部分の映像を再生した場合には、スムースな動きの高品質の動画が再生され、その他の部分を再生した場合には、いわゆる駒落しであって、動きがぎこちない動画となる。しかし、活発に議論が交わされていた場面以外のあまり重要でない部分を高圧縮率で圧縮できるので、蓄積保存すべき情報量は非常に少なくなる。
【００３４】
図２は、この実施の形態の場合の会議風景を示すもので、複数の会議参加者１６のそれぞれに対してマイクロホン１５が設けられて、各会議参加者１６毎の発声音声がそれぞれのマイクロホン１５で収音される。そして、複数の会議参加者１６間での討議風景や、会議資料として提供された紙文書がビデオカメラ１７で撮影されるものである。
【００３５】
１０は、情報蓄積装置である。この実施の形態の場合には、この情報蓄積装置１０は、音声信号解析器１１と、会議情報蓄積処理装置１２と、蓄積媒体１３と、再生部１４とを備えている。
【００３６】
会議情報蓄積処理装置１２は、会議風景を撮影するカメラ１７からの画像信号と、マイクロホン１５からの音声信号とを、例えばディスクや半導体メモリからなる蓄積媒体１３に蓄積し、また、この蓄積媒体１３に蓄積された音声信号または画像信号を圧縮することができるものである。この実施の形態では、説明をわかりやすくするために、会議情報蓄積処理装置１２は、蓄積媒体１３に蓄積された画像情報のみを圧縮し、蓄積媒体１３に蓄積された音声情報は圧縮しないこととする。なお、この会議情報蓄積処理装置１２は、パーソナルコンピュータであってもよい。
【００３７】
会議情報蓄積処理装置１２は、音声入力端子および画像入力端子を備える。この実施の形態においては、マイクロホン１５により収音された複数の会議出席者１６の発言の音声信号は、一旦、音声信号解析器１１に入力され、この音声信号解析器１１の出力が会議情報蓄積処理装置１２の音声入力端子に入力される。
【００３８】
音声信号解析器１１は、複数のマイクロホン１５から入力された音声信号を解析し、入力音声信号がどのマイクロホンから入力されたのかを識別して、その識別結果を音声信号と共に会議情報蓄積処理装置１２に対し出力する。
【００３９】
また、カメラ１７で撮影された紙文書や会議風景の画像信号は、会議情報蓄積処理装置１２の画像入力端子に入力される。
【００４０】
会議情報蓄積処理装置１２は、図示しないユーザインターフェースを備えており、このユーザインターフェースを介したユーザからの再生要求に応じて、蓄積媒体１３に蓄積された画像信号による画像を、再生部１４の表示画面に表示させるとともに、蓄積媒体１３に蓄積された音声信号による再生音声を再生部１４に取り付けられたスピーカから放音させるようにする機能も備えている。
【００４１】
なお、会議情報蓄積処理装置１２をパーソナルコンピュータで構成した場合、このパーソナルコンピュータを介して、例えばＩＳＤＮによるネットワークに接続することにより、会議の音声情報や画像情報を遠隔地間で同時に共有し、あたかも同じ部屋で会議を行なっているような環境を実現することも可能である。
【００４２】
図１は、この実施の形態の情報蓄積装置１０を、その機能を中心にして示したブロック図である。すなわち、この実施の形態の情報蓄積装置は、システムバスに対して、音声情報入力部１、画像情報入力部２、条件一致区間検出部３、時系列情報記憶部４、対応関係記憶部５、圧縮部６、時刻情報記憶部７、再生部８、制御部９がそれぞれ接続されて構成される。また、この例では、音声情報入力部１は、条件一致区間検出部３にも接続される。制御部９は、全体の処理動作を制御するものである。
【００４３】
各部はそれぞれ別のブロックとして構成されていてもよいし、１つのブロックが幾つかの部を含むように構成されていてもよい。また、１つの部が、幾つかのブロックに分割されて実装されていても構わない。
【００４４】
音声情報入力部１は、マイクロホン１５からの音声信号を受けてデジタル音声信号に変換し、システムバスに送出すると共に、条件一致区間検出部３に送出する。
【００４５】
画像情報入力部２は、ビデオカメラ１７からの画像信号を受け付ける。ビデオカメラ１７からの画像信号がデジタル信号であれば、それを受け付けてシステムバスに送出する。また、入力画像信号がデジタル信号でなければ、画像情報入力部２は、入力画像信号をデジタル画像信号に変換してシステムバスに出力する。
【００４６】
条件一致区間検出部３には、音声情報入力部１からの音声信号がデジタル信号として供給される。この条件一致区間検出部３は、これに入力される音声信号を監視して、予め定められている条件に合致する音声区間を検出する。この実施の形態では、所定レベル以上の音声信号入力が有り、かつ、この入力音声信号から活発な対話のパターンを検出したことを条件として条件一致区間を検出する。これにより、会議参加者が活発に議論を交わした区間を条件一致区間として検出するようにする。この条件一致区間検出部３は、音声信号解析器１１と会議情報蓄積処理部１２の一部とがその役割を果たす。
【００４７】
所定レベル以上の音声信号の有無を検知する方法としては、図３に示すように、条件一致区間検出部３は、入力される音声レベルが所定のレベル以上になったことを検知して話者の発言の開始点を認識し、音声レベルが所定の閾値レベル以下になったことを検知して話者の発言の終了点を認識する検出機能を持つ。
【００４８】
ただし、図３に示すように、音声レベルが閾値レベルと交差する音声レベル変化点Ｆ１０１そのものを発言の開始点または終了点とすると、発言の最初の部分と最後の部分が条件一致区間に含まれないので、音声レベルが小レベルから大レベルに変化するときの変化点Ｆ１０１よりも一定時間Ｔ１だけ前の時点Ｆ１００を発言開始点とし、また、音声信号レベルが大レベルから小レベルに変化するときの変化点Ｆ１０１よりも一定時間Ｔ２だけ後の時点Ｆ１０２を発言終了点とする。
【００４９】
なお、この実施の形態において、ある時刻における音声レベルとは、その時刻の前後の音声レベルを平滑化した値であり、例えば、ある時刻の前後の２秒間の瞬間音声レベルの平均値である。
【００５０】
この実施の形態では、図２に示されるように、マイクロホン１５を発言者毎に設置し、発言者各自のマイクロホンからの音声入力レベルを音声信号解析器１１で比較することで、音声信号解析器１１が、入力音声信号を発信した話者を特定する。
【００５１】
発言者を特定する方法としては、この他にも、音声信号の特徴（声紋など）から話者を特定してもよいし、画像情報による顔や口の動きから発言者を特定してもよい。その場合には、マイクロホンは、会議出席者のすべてに対応して複数本設ける必要はなく、１本あるいは会議出席者の数よりも少ない複数本でよい。また、複数のマイクロホンを設置し、それらのマイクロホンから入力される音声信号の位相差を解析して音源の位置を検知して、発言者を特定するようにすることもできる。
【００５２】
条件一致区間検出部３は、１人の話者が発言を終了してから、他の話者が発言を開始するまでの時間が短いほど、活発な対話が行なわれていると判断する。また、１人の話者が発言を終了しないうちに、他の話者が発言を開始した場合にも、活発な対話が行なわれていると判断する。
【００５３】
図４は、条件一致区間検出部３が、対話が活発な区間を認識する処理を図示したものである。この図は、１人の話者が発言を終了してから、他の話者が発言を開始するまでの時間が短いほど、活発な対話が行なわれていると判断する場合である。各話者からの所定レベル以上の音声信号を当該話者の発言区間ＳＰと認識し、この発言区間ＳＰが、図４中点線の丸印で囲むように、複数の話者の間で、短時間の間に交替しているパターンを、活発な対話パターンとして検出する。
【００５４】
条件一致区間検出部３は、このように短時間に発言者が交替しているパターンを検出するために、１人の話者が発言を終了してから、予め定めた設定時間以内に話者が交代したかどうかを検出する。例えば、この設定時間は０．５秒とされる。この設定時間は、ユーザが変更することができるようにしてもよい。
【００５５】
また、この実施の形態では、１人の話者が発言を終了しないうちに、他の話者が発言を開始したため発言区間ＳＰが一部重なるパターンも、早い話者交代のパターンとして検出する。
【００５６】
そして、条件一致区間検出部３では、早い話者交代のパターンが所定回数、例えば３回以上継続したか否かにより、対話が活発な区間を認定するようにする。例えば、図４に示す例の場合には、区間ＰＰには、早い話者交代のパターンが４回続くので、この区間ＰＰを対話の活発な区間として検出する。すなわち、この４回続く早い話者交代のパターンを含む発言区間の始まりＦ２００を、対話が活発な区間の開始点とし、早い話者交代のパターンを含む発言区間の終わりＦ２０１を、対話が活発な区間の終了点とする。
【００５７】
時系列情報記憶部４は、音声情報および画像情報を蓄積する記憶部であり、記憶媒体として、例えばディスク記憶媒体や半導体メモリを用いる。
【００５８】
対応関係記憶部５は、条件一致区間検出部３が検出した対話が活発な区間のそれぞれと、それぞれの対話が活発な区間に対応して時系列情報記憶部４に記憶される音声情報および画像情報の、前記時系列情報記憶部４における記憶アドレスとを対応させて記憶するものである。対応関係記憶部５も、例えばディスク記憶媒体や半導体メモリ等で構成される。
【００５９】
圧縮部６は、この実施の形態においては、前記時系列情報記憶部４に蓄積された画像情報のデータ圧縮を行なう。この場合、圧縮部６は、対応関係記憶部５からの条件一致区間を示す情報に基づいて、データ圧縮率またはデータ圧縮方法を動的に可変できるように構成されている。
【００６０】
また、この実施の形態においては、圧縮部６は、動画の画像情報を想定して、この動画の画像情報を所定時間長または所定フレーム数を１つの処理単位として扱う。例えば、連続した１０フレームの画像列を１つの単位部分画像列として圧縮処理を行なうが、前記条件一致区間以外の区間の画像情報は、前記１０フレームの中の先頭の１フレームだけを残して、他のフレームの情報を破棄するという間引き圧縮処理を行ない、一方、前記条件一致区間では、画像情報についての前記の間引き処理を行なわず、前記１０フレーム全部を記憶するようにする。
【００６１】
したがって、条件一致区間以外の区間の画像情報を再生した場合、いわゆる駒落しであって、動きがぎこちない動画となるが、情報量は非常に少なくなる。一方、条件一致区間の画像情報を再生した場合、スムースな動きの高品質の動画が再生されることになる。
【００６２】
時刻情報記憶部７は、入力された音声信号および画像信号が、時系列情報記憶部４に記録開始された時刻を記憶するためのもので、例えばディスク記憶媒体や半導体メモリ等で構成される。
【００６３】
さらに、時刻情報記憶部７は、前記記録開始時刻からの経過時間を測定する機能を持つ。このため、この時刻情報記憶部７には、図示しない時計回路部からの現在時刻情報が供給される。そして、この実施の形態では、この時刻情報記憶部７は、前記記録開始時刻からの経過時間が予め定められた所定時間以上となったときに、圧縮部６で時系列情報記憶部４の画像情報の前述したような圧縮を開始する契機となる圧縮トリガタイミング信号を出力する。
【００６４】
再生部８は、前述の図２のモニター装置１４により、時系列情報記憶部４に記憶されている音声信号や画像信号を再生する機能部である。
【００６５】
制御部９は、この情報蓄積装置１０の全体の処理動作を制御するものである。
【００６６】
［記録時の動作］
次に、以上のような構成の情報蓄積装置１０における記録時の動作について説明する。図５は、この実施の形態における記録時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明する図である。
【００６７】
会議が始まり、マイクロホン１５からの音声信号およびカメラ１７からの画像信号が、情報蓄積装置１０に供給されると、音声信号および画像信号は、時系列情報記憶部４に順次に蓄積記憶される。また、音声信号は、条件一致区間検出部３に入力される。
【００６８】
条件一致区間検出部３は、前述したように、マイクロホン１５からの音声情報の音声レベルと所定の閾値レベルとを比較して、会議出席者の発言開始点と発言終了点とを検出し、その間の区間を話者の発言区間ＳＰとする。そして、この発言区間ＳＰの複数の会議出席者間の、短時間の交替や一部重なりを検出して、対話が活発な区間を条件一致区間として検出する。そして、検出した条件一致区間の開始点および終了点の情報が、対応関係記憶部５に供給される。
【００６９】
図６は、条件一致区間検出部３の動作を説明するフローチャートである。
【００７０】
条件一致区間検出部３に、音声情報入力部１からの音声信号がデジタル信号として供給されると、ステップＳ１００において、前述の発言区間ＳＰの検出と、発話者の特定が行なわれる。発話者の特定方法としては、前述したように、発言者毎に設置されたマイクロホンからの音声入力レベルを音声信号解析器１１で比較することで実施される。
【００７１】
このステップＳ１００の後、ステップＳ１０１において、一部重なりを含む短時間に発言者が交替しているパターンを認識し、早い話者交代のパターンが検出された場合には、ステップＳ１０２に進み、そのパターンが所定回数以上継続したかどうかを判別する。前述したように、早い話者交代のパターンが３回以上連続して検出されたときに、そのパターンを含む発言区間を活発な対話が行なわれている区間であると認識するように予め条件が設定されていた場合には、前述した図４の例であれば、区間ＰＰを、対話が活発な区間として検出し、ステップＳ１０３に進む。
【００７２】
ステップＳ１０３では、対話が活発な区間として検出した区間を、条件一致区間として特定する。すなわち、例えば図４の例では、対話が活発な区間の始まりを、区間ＰＰの始まりＦ２００とし、対話が活発な区間の終りを、区間ＰＰの終わりＦ２０１として、区間ＰＰを対話が活発な区間（条件一致区間）であると特定する。なお、条件一致区間を特定する情報としては、当該区間の始まりあるいは終りの一方の情報と、区間の長さの情報であってもよい。
【００７３】
続いてステップＳ１０４では、ステップＳ１０３において特定された条件一致区間を対応関係記憶部５に出力し、その後ステップＳ１００に戻って、新たな条件一致区間の検出を始める。また、ステップＳ１０２において、早い話者交代のパターンが所定回数以下であると認識された場合にも、ステップＳ１００に戻って、新たな条件一致区間の検出を始める。
【００７４】
対応関係記憶部５は、上述のような条件一致区間検出部３からの条件一致区間を特定する情報、すなわち、この例では、条件一致区間の始まりと終りの情報を受けると、以下に説明するようにして、これらの情報と、当該条件一致区間に対応する時系列情報記憶部４の記憶アドレスとを対応付けて記憶する。
【００７５】
図７は、対応関係記憶部５の動作を説明するフローチャートである。この図７において、前述した記録動作に関与するステップは、ステップＳ２００〜Ｓ２０３の部分である。ステップＳ２０４、Ｓ２０５の部分は、後述する圧縮時の動作に関与する。
【００７６】
すなわち、この記録時においては、ステップＳ２００において、条件一致区間検出部３から、条件一致区間を示す情報が入力されたかどうかを検出し、条件一致区間の入力が検出されなかった場合には、ステップＳ２０４を経由してステップＳ２００に戻り、条件一致区間を示す情報の入力有無の検出を行う。
【００７７】
ステップＳ２００において、条件一致区間検出部３からの条件一致区間の入力が検出された場合には、ステップＳ２０１に進む。ステップＳ２０１では、前記条件一致区間に対応して時系列情報記憶部４に記憶されている音声情報または画像情報の、時系列情報記憶部４における記憶アドレスを取得するために、時系列情報記憶部４に対し、前記条件一致区間を示す情報と共に、記憶アドレスの問い合わせ要求を出力し、ステップＳ２０２でその返答を待つ。
【００７８】
時系列情報記憶部４からの返答が返されるとステップＳ２０３に進み、前記条件一致区間を特定する情報と、この条件一致区間に対応して時系列情報記憶部４に記憶されている音声情報および画像情報の、時系列情報記憶部４における記憶アドレスとを対応づけて記憶する。
【００７９】
ステップＳ２０３の次には、ステップＳ２０４を経由してステップＳ２００に戻り、次の条件一致区間を示す情報の入力有無の検出を行う。
【００８０】
図８は、条件一致区間検出部３の検出結果である条件一致区間と、この条件一致区間によって特定される時刻に入力された音声情報および画像情報の、前記時系列情報記憶部４における記憶アドレスとを対応づけて説明した図である。
【００８１】
図８に示すように、対話が活発な区間の開始時刻Ｆ２００は、時系列情報記憶部４の記憶アドレスＦ３００に対応しており、また、対話が活発な区間の終了時刻Ｆ２０１は、記憶アドレスＦ３０１に対応している。
【００８２】
対応関係記憶部５は、この対応関係を、図９に示すようなテーブルの形式で記憶する。この例の場合、開始時刻Ｆ２００および終了時刻Ｆ２０１は、記憶開始時点を起点とした相対時刻である。
【００８３】
例えば、図９の例に示すように、音声情報および画像情報の記憶を開始してから２１０秒の時点に対話の活発な区間の始まりが検出され、同じく２４０秒の時点に対話の活発な区間の終わりが検出されたとすると、この記憶を開始してから２１０秒から２４０秒の間に記憶された音声情報および画像情報が対応した記憶情報となる。図９の例では、時系列情報記憶部４の４２０番地から４８０番地までが、この条件一致区間に対応した音声情報および画像情報の記憶アドレスである。
【００８４】
なお、図９において、ＩＤは、検出された条件一致区間のそれぞれを識別するための識別子であり、この例では３桁の番号とされている。
【００８５】
なお、対応関係記憶部５の記憶形式は、図９の例のテーブル形式に限られず、リスト構造やスタック構造等であっても構わない。
【００８６】
次に、このときの対応する時系列情報記憶部４の動作を、図１０のフローチャートを参照して説明する。
【００８７】
すなわち、図１０は、時系列情報記憶部４の記録時の動作を説明するフローチャートである。まず、この記録動作が開始となると、時系列情報記憶部４は、ステップＳ３００において、音声情報および画像情報の記憶開始時刻を時刻情報記憶部７に出力して、記録させる。次に、ステップＳ３０１、ステップＳ３０２と順次に進み、入力される画像情報と音声情報との入力を受け、順次記憶する。
【００８８】
そして、次のステップＳ３０３では、対応関係記憶部から、条件一致区間に対応する記憶アドレスの要求が到来したか否かを判別し、当該要求が到来したことを検出したときにはステップＳ３０４に進む。ステップＳ３０４では、条件一致区間に対応する音声情報および画像情報の記憶アドレスを、対応関係記憶部５に返答する。
【００８９】
ステップＳ３０３で条件一致区間に対応する記憶アドレスの要求は到来していないと判別された後、またステップＳ３０４の後は、ステップＳ３０１に戻り、画像情報と音声情報の記憶を続ける。
【００９０】
時刻情報記憶部７は、時系列情報記憶部４の前記ステップＳ３００での処理による記憶開始時刻の情報を受信して、当該記憶開始時刻の記憶を行う。
【００９１】
図１１は、時刻情報記憶部７の動作を説明するフローチャートであり、また、図１２は、時刻情報記憶部７の記憶構造を説明するための図である。図１１において、ステップＳ４００およびステップＳ４０１が、記録時の処理であり、時系列情報記憶部４から供給された、音声情報および画像情報の記憶開始時刻をステップＳ４００において検出し、ステップＳ４０１においてこの記憶開始時刻を時刻情報記憶部７に記憶する。
【００９２】
後述するように、時刻情報記憶部７は、音声情報および画像情報が時系列情報記憶部４に記録されてからの経過時間（すなわち情報保存時間）が、所定の時間以上になった場合に、対応関係記憶部５に対し圧縮処理開始指示を出力する。図１１のステップＳ４０２およびステップＳ４０３は、その処理部であり、この圧縮開始指示処理については後述する。
【００９３】
時刻情報記憶部７は、音声情報および画像情報を格納したファイルの名前と、記憶開始時刻との関係を、図１２のようなテーブルで管理している。この例では、１つの会議の記録が１つのファイルに記録されている。ファイル名は各会議記録に付与されたファイル名称であり、図１２のＩＤは各会議記録ファイルを識別する識別子（この例では番号）である。
【００９４】
なお、この記憶開始時刻の記憶形式はテーブル形式に限られず、リスト構造やスタック構造等であっても構わない。さらに、音声情報および画像情報を格納したファイルやファイル名の中に、記憶開始時刻を特定する情報を記憶しておくように構成しても構わない。
【００９５】
さらに、ファイルサイズ、ファイル名、ファイル作成者などのファイル属性や、ディレクトリ作成者、ディレクトリ名などのディレクトリ属性に応じて、圧縮処理開始までの時間が自動的に変わるようにしてもよい。
【００９６】
例えば、図１３に示すように、ファイルサイズが５Ｍバイトを越える場合には、圧縮処理開始までの時間を１ヶ月とし、５Ｍバイトに満たない場合には、圧縮処理開始までの時間を２ヶ月にする。また、ファイル拡張子が、.AVIのファイルの場合には、圧縮処理開始までの時間を１ヶ月とし、ファイル拡張子が、.mpgのファイルの場合には、圧縮処理開始までの時間を２ヶ月とする。これらの場合には、圧縮処理開始までの時間を各ファイル毎に指定する必要がなくなり、ユーザの手間が省けるという効果がある。
【００９７】
以上のようにして、この実施の形態においては、会議が開始され、会議記録が開始されると、その開始時点の時刻が、時刻情報記憶部７に記憶されると共に、会議開始時点（記憶開始時点に対応）から画像情報および音声情報が時系列情報記憶部４に記憶される。
【００９８】
そして、会議進行中の音声情報について、条件一致区間検出部３で、条件一致区間が、対話が活発な区間として検出され、対応関係記憶部５に、その区間を特定する情報と、対応する時系列情報記憶部の記憶アドレスとが対応付けて記憶される。
【００９９】
［圧縮時の動作］
次に、圧縮時の動作について説明する。この第１の実施の形態では、時系列情報記憶部４に記憶した画像情報および／または音声情報は、記憶してから所定期間経過したときには、重要度が小さくなるとして情報圧縮して、時系列情報記憶部４のメモリに、空き容量を形成するようにするが、条件一致区間は、重要区間として、この区間は圧縮せず、あるいは、圧縮率を低くして高品質を保つようにする。
【０１００】
図１４は、この実施の形態における情報圧縮時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明する図である。
【０１０１】
時刻情報記憶部７は、音声情報および画像情報が時系列情報記憶部４に記録されてからの経過時間が、所定の時間以上になった場合に、対応関係記憶部５に対し圧縮処理開始指示を出力する。
【０１０２】
すなわち、図１１の時刻情報記憶部７の処理ルーチンのステップＳ４０２において、図示しない時計回路部から供給される現在時刻と、時刻情報記憶部７に記憶されている記憶開始時刻とを比較し、情報の保存時間が所定の時間を経過したかどうかを判定する。所定の時間が経過したと判定されたときには、ステップＳ４０３に進み、対応関係記憶部５に圧縮処理開始を要求する。
【０１０３】
そして、この要求を出した後に、あるいはステップＳ４０２で所定の時間を経過していないと判定されたときには、ステップＳ４００に戻る。
【０１０４】
例えば、前記所定の時間が、１ヶ月と定めてあった場合には、圧縮処理開始要求が記憶開始時点から１ヶ月後に発生し、時系列情報記憶部４に新規に蓄積された情報は、１ヶ月後に圧縮処理を施されることとなる。例えば、図１２に示した１９９６年４月２５日１３時３０分に記録されたファイル名「ｆｉｌｅ１０」の音声情報および画像情報は、１９９６年５月２５日１３時３０分に前述の圧縮処理を施されることになる。
【０１０５】
この圧縮処理が実行されるまでの時間は、この例では、時刻情報記憶部７に対して固定的に与えるようにするが、この時間はユーザが変更できるようにすることができる。また、この圧縮処理の開始のタイミングは、設定された時間の近辺であればよく、システムがアイドリング状態になるのを待って圧縮処理を行なうように構成してもよい。また、圧縮を施すまでの時間を、各ファイル毎に、変えて設定してもよい。
【０１０６】
対応関係記憶部５は、時刻情報記憶部７から圧縮開始指示が入力されると、図７のステップＳ２０４でその入力を検出する。圧縮開始指示が検出された場合には、ステップＳ２０５に進み、対話の活発な区間を示す前記条件一致区間を特定する情報のそれぞれと、それぞれの条件一致区間に対応して前記時系列情報記憶部４に記憶されている音声情報および画像情報の、前記時系列情報記憶部４における記憶アドレスとを圧縮部６に出力する。すなわち、図９に示したテーブルの、一つの会議についての内容を一括して圧縮部６に出力する。
【０１０７】
なお、もちろん、各条件一致区間を特定する情報と、該条件一致区間に対応した記憶アドレスとの組を、１組１組づつ、順次圧縮部６に出力するように構成してもよい。また、音声情報および画像情報を格納した時系列情報記憶部４のファイルの中に、前記条件一致区間を特定する情報のそれぞれと、それぞれの条件一致区間に対応してファイルに記憶されている音声情報および画像情報の、該ファイルにおける記憶アドレスとを記憶しておくように構成しても構わない。
【０１０８】
対応関係記憶部５からの入力を受信した圧縮部６は、前記時系列情報記憶部４に蓄積された画像情報のデータ圧縮を行なう。この場合、圧縮部６は、対応関係記憶部５からの条件一致区間を示す情報に基づいて、データ圧縮率またはデータ圧縮方法を動的に可変して圧縮を実行する。
【０１０９】
この実施の形態の場合には、条件一致区間の情報については、データ圧縮を行わずに高品質を維持し、条件一致区間以外の区間の画像情報について、データ圧縮を行うようにする。このため、図１４に示すように、圧縮部６は、時系列情報記憶部４から、条件一致区間以外の区間の部分画像列を取得して、それをデータ圧縮し、圧縮後の圧縮画像列を時系列情報記憶部４に書き戻すようにする。
【０１１０】
図１５は、この圧縮部６の動作を説明するフローチャートである。
圧縮部６は、対応関係記憶部５からの圧縮開始要求を受信すると、ステップＳ５００によって、これを検出し、ステップＳ５０１に進む。ステップＳ５０１では、対応関係記憶部５から入力される、対話の活発な区間を示す条件一致区間のそれぞれを特定する情報と、それぞれの条件一致区間に対応して前記時系列情報記憶部４に記憶されている音声情報および画像情報の、前記時系列情報記憶部４における記憶アドレスとを入力し、圧縮部６の、図示しないワークメモリに記憶する。ワークメモリは、記憶媒体として、例えば半導体メモリを用いる。
【０１１１】
圧縮部６は、前記ワークメモリに記憶されている、条件一致区間と記憶アドレスの情報の複数組を参照して、時系列情報記憶部４に記憶されている画像情報の圧縮を行なう。
【０１１２】
ステップＳ５０２では、条件一致区間以外の１０フレームの部分画像列を、１つの単位部分画像列として、時系列情報記憶部４から圧縮部６に順次読み出す。この実施の形態においては、条件一致区間に対応した画像情報を圧縮しないために、条件一致区間以外の画像情報のみを読み出して、圧縮する。もっとも、条件一致区間に対応した画像情報も圧縮する場合には、条件一致区間の画像情報を含めて読み出して圧縮する必要があることは言うまでもない。
【０１１３】
ステップＳ５０３では、１０フレームの部分画像列の、この例では先頭の１フレームだけを残して、他の９フレームを消去するというフレーム間引き処理を行なう。そして、次のステップＳ５０４において、そのフレーム間引き後の圧縮画像列を時系列情報記憶部４に書き戻す。
【０１１４】
そして、次のステップＳ５０５では、会議の記録が蓄積されている前記ファイルに対する圧縮処理が終了したどうかを判定し、ファイル全体の圧縮処理が完了した場合には、圧縮部６の処理を終了する。圧縮すべき部分が残っている場合にはステップＳ５０２に戻って、前記圧縮処理を繰り返す。
【０１１５】
図１６は、圧縮部６の動作を説明するための図である。図１６Ａにおいて、条件一致区間に対応した時系列情報記憶部４の記憶アドレスによって指定されるメモリ領域が、ａ３からａ６、および、ａ１１からａ１２であったとすると、圧縮部６によって圧縮されるのは、記憶メモリ領域ａ１からａ２、および、記憶メモリ領域ａ７からａ１０の画像列となる。ここで、記憶メモリ領域ａ１，ａ２…のそれぞれは、所定複数フレーム、例えば１０フレーム分の容量を備えるものである。
【０１１６】
上述したように、圧縮後でも、記憶メモリ領域ａ３からａ６、および、記憶メモリ領域ａ１１からａ１２に蓄積されていた条件一致区間の画像列は、圧縮前と比べて変化がない。
【０１１７】
一方、記憶メモリ領域ａ１からａ２、および、記憶メモリ領域ａ７からａ１０に蓄積されていた画像列は、フレーム間引き圧縮の対象となるので、記憶メモリ領域ａ１、ａ７、ａ８の内容は、圧縮画像列によって置き換えられる。そして、情報量が減ったことにより、図１６Ｂに示すように、時系列情報記憶部４の記憶媒体には、空きメモリ領域ａ２、ａ９、ａ１０が生成される。
【０１１８】
なお、時系列データが、記憶媒体内に連続して記憶されていることが望ましい場合には、生成された空きメモリの部分を前後の時系列データによって詰めるようにする等して、メモリの隙間をなくすようにすることができる。
【０１１９】
次に、この第１の実施の形態において、記録または圧縮された音声情報または画像情報を再生する場合の動作について説明する。この再生は、前述したように、再生部８において、制御部９からの制御の元に行われ、図示しないユーザインターフェースを通じた再生指示に基づいて実行される。
【０１２０】
記録された音声情報または画像情報を再生するときには、再生速度を変化させたり、少し巻き戻ししてゆっくり再生したい場合がよくあるので、早送り機能、巻き戻し機能、スロー再生機能、一時停止機能を、会議情報蓄積処理装置１３は具備している。また、表示モニターの画面上に、時間軸対応のスライドバーを設け、現在再生している時刻を示すポインタをスライドバーに表示したり、ユーザの入力操作により、バーをスライドさせることによって再生位置を指定できるようにしている。
【０１２１】
また、再現する速度に関して、必ずしも記録された時刻情報の通りに再現する必要はなく、記録された順序関係だけは守って速度を上げて再現するようにすることができるし、条件一致区間のみをサーチして自動再生するようにすることもできる。たとえば、図８の時点Ｆ２００からＦ２０１の区間は、記憶された速度と同じ速度で再生し、それ以外の区間は倍速再生するようにする。
【０１２２】
なお、条件一致区間が一目でわかるように、前記時間軸上のスライドバーに、条件一致区間を示す情報を併せて表示してもよい。また、条件一致区間が検出した発言者名または発言者の顔写真など、条件一致区間が検出した検出結果を、前記時間軸上のスライドバーに重ねて表示するようにしてもよい。
【０１２３】
以上説明した第１の実施の形態では、早い話者交代のパターンが所定個数以上連続して検出されたときに、そのパターンを含む発言区間の両端を、条件一致区間の両端とするようにしているが、早い話者交代のパターンを含む発言区間の開始時点の所定時間前の時点を条件一致区間の始まりとしてもよいし、早い話者交代のパターンを含む発言区間の所定個数前の発言区間を含めて条件一致区間としてもよい。
【０１２４】
また、早い話者交代のパターンを含む発言区間の終了時点の所定時間後の時点を条件一致区間の終わりとしてもよいし、早い話者交代のパターンを含む発言区間の所定個数後の発言区間を含めて条件一致区間としてもよい。
【０１２５】
また、「扉の閉まる音」というような、単発的な音声信号を条件一致区間検出部３で検出するようにすることもできる。この場合には、単発的な音声信号を検出した時点の所定時間前の時点を条件一致区間の開始点として検出し、該単発的な音声信号を検出した時点の所定時間後の時点を条件一致区間の終了点として検出するように構成する。
【０１２６】
また、この実施の形態では、条件一致区間検出部３は、入力音声信号から活発な対話のパターンを検出する場合について説明したが、この他にも、笑い声のパターン、拍手のパターンなどの特徴的な音声パターンを登録しておき、入力音声信号からこれらのパターンを認識し、これらのパターンを含む区間を条件一致区間として検出するように構成することもできる。この場合には、条件一致区間検出部３には、公知のパターン認識技術、例えば、音声信号のパワーまたは周波数成分の時間的遷移を解析する技術などを用いて、パターン認識を行なうパターン認識手段が設けられる。
【０１２７】
また、この実施の形態では、圧縮部６は、画像の間引き圧縮を行う構成になっているが、圧縮部６の構成は、画像情報の圧縮時に、記憶時間、フレーム内圧縮の圧縮率、フレーム間圧縮の圧縮率、間欠記録の時間間隔、色情報間引き率、輝度情報間引き率等の少なくとも一つを動的に変更する装置であればよい。
【０１２８】
特に、動画像情報を圧縮する方法としては、フレーム内での圧縮法とフレーム間の圧縮法があり、フレーム内の圧縮法としてはベクトル量子化を用いた方法と離散コサイン変換を用いた方法などがある。フレーム間の圧縮法としては前後フレームの画像情報の差分のみを記録する方法などがある。すなわち、単位時間あたりの情報量をより少ない情報量に変換する装置は、いずれも本発明でいう圧縮部６に相当する。
【０１２９】
また、この第１の実施の形態では、条件一致区間以外の区間の画像情報であっても、情報量の少ない駒落し映像として保存するように構成しているが、もちろん、条件一致区間以外の区間の画像情報または音声情報を、蓄積媒体から消去するようにしても構わない。
【０１３０】
また、条件一致区間の区間の情報と、条件一致区間以外の区間の情報とを、別々の蓄積媒体に分けて保存するようにしてもよい。たとえば、情報の記録時は、条件一致区間内の情報と、条件一致区間以外の区間の情報とを同一の磁気ディスクに蓄積するようにし、情報の圧縮時に、条件一致区間内の情報のみを前記磁気ディスクに残し、条件一致区間以外の区間の情報を光磁気ディスクや磁気テープに移動するように構成する。一般的に、光磁気ディスクや磁気テープは、磁気ディスクに比べて、情報へのアクセス速度は遅いが大量の情報を安価に蓄積することができるという特徴を有しているため、情報量の少なくなった条件一致区間以外の区間の情報を蓄積するために適している。
【０１３１】
さらに、この実施の形態では、音声情報は圧縮しない構成になっているが、音声情報を圧縮する場合には、音声信号の圧縮時に、記憶時間、サンプリング周波数、符号化ビット数の少なくとも一つを動的に変更するようにすればよい。
【０１３２】
なお、上述した実施の形態の装置に入力される時系列情報は、カメラ／マイクロホン／ビデオデッキ／テープレコーダ／センサ等から入力されたアナログ信号でもよいし、それを符号化したデジタル信号でもよい。さらには、計算機ネットワーク／計算機バスを通じて入力されるデジタル信号でもよい。すなわち、時間の経過とともに順次入力される情報は、いずれも本発明でいう時系列情報に相当する。
上述の第１の実施の形態においては、条件一致区間検出部３が音声情報から条件一致区間を検出し、この検出結果に基づいて、条件一致区間の画像情報は、他の区間よりも高画質を保つように、圧縮率を動的に変更して時系列情報記憶部４に記憶された画像情報を圧縮するように構成したことにより、画像情報のうちの、特徴的な事象が起こっている重要部分を限られた蓄積媒体の中に数多く記憶でき、かつ、重要部分以外の画像情報であっても少ないデータ量で長時間記憶できる効果がある。
また、入力される音声信号の有無または音声信号レベルを条件一致区間検出部３によって検出するように構成したので、音声が発せられている区間の音声または画像情報を、最初から最後まで高音質／高画質で保存でき、かつ、音声が発せられていない区間の音声または画像情報であっても少ないデータ量で記憶できる効果がある。
また、入力される音声の発信者または発信者の交替を条件一致区間検出部３によって検出するように構成したので、特定の発信者の音声または画像情報を、最初から最後まで高音質／高画質で保存でき、かつ、その他の発信者の音声または画像情報であっても少ないデータ量で記憶できる効果がある。
【０１３３】
［第２の実施の形態］
この第２の実施の形態においても、前述と同様に、説明を簡単にするために、圧縮対象は画像のみとして、以下に説明する。
【０１３４】
この第２の実施の形態では、入力画像情報を時系列情報記憶部４に蓄積する際に、例えば低周波数帯域と、高周波数帯域というように、周波数帯域別に記憶しておき、時刻情報記憶部７から圧縮開始指示を受信したときに、画像の高周波数帯域を削除することにより、画像情報の圧縮を行うように構成する。この第２の実施の形態は、画像の高周波数帯域は、いわゆる画像のディテールに関与する成分であり、これを削除しても基本的な画像内容の把握については影響が少ないことを利用するものである。
【０１３５】
第１の実施の形態では、図１４に示したように、時系列情報記憶部４の部分画像列を読み出し、圧縮部６で圧縮処理を施した後、再び時系列情報記憶部４に書き込む構成としたが、この第２の実施の形態の場合には、時系列情報記憶部４から部分画像列を読み出したり、また、画像圧縮処理を施したり、時系列情報記憶部４に書き戻したりする必要がなくなるため、圧縮処理時のシステムの負荷を軽減することができる。
【０１３６】
また、この第２の実施の形態では、前記の周波数帯域別に入力画像情報を記憶する場合において、条件一致区間と、条件一致区間以外の区間とでは、時系列情報記憶部４に蓄積する際の周波数帯域の分け方を変えるように構成する。
【０１３７】
具体的には、条件一致区間以外の区間の画像情報のみを周波数帯域別に記憶するようにし、条件一致区間の画像情報は、周波数帯域別の記憶はしない通常の方法で記憶する。
【０１３８】
図１７は、この第２の実施の形態における記録時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明した図である。この第２の実施の形態の構成は、第１の実施の形態について説明した図１および図５と比較すると、構成要素として、周波数帯域別画像生成部２１が追加されている点が異なっている。
【０１３９】
この周波数帯域別画像生成部２１は、この例では、ハイパスフィルタと、ローパスフィルタとで構成される。そして、この第２の実施の形態の場合、条件一致区間検出部３は、第１の実施の形態の場合と同様な方法で、入力音声情報から条件一致区間を検出し、その条件一致区間を特定する情報を、対応関係記憶部５に供給すると共に、周波数帯域別画像生成部２１に供給する。
【０１４０】
周波数帯域別画像生成部２１は、条件一致区間検出部３からの条件一致区間を特定する情報を受けて、その条件一致区間と、他の区間とで、時系列情報記憶部４に出力する画像信号を変更するようにする。
【０１４１】
図１８は、この第２の実施の形態における周波数帯域別画像生成部２１での処理を説明するフローチャートである。
【０１４２】
図１８に示すように、周波数帯域別画像生成部２１は、ステップＳ６００で音声情報または画像情報の入力を受ける。そして、次のステップＳ６０１において、条件一致区間検出部３からの条件一致区間を特定する情報、すなわち、この例では、条件一致区間の始めの時点の情報と終りの時点の情報により、現時点は、条件一致区間内であるか否か判断する。
【０１４３】
条件一致区間内であると判別された区間の画像情報に関しては、ステップ６０１からステップＳ６０３に進んで、そのまま時系列情報記憶部４に画像情報を出力し、入力画像情報を通常の、周波数帯域別に分けない記憶フォーマットで時系列情報記憶部４に記憶させるようにする。
【０１４４】
一方、ステップＳ６０１において、条件一致区間以外の区間であると判別された場合には、ステップＳ６０２に進み、入力画像情報を高周波数帯域の情報と、低周波数帯域の情報とに分けて、周波数帯域別画像情報を生成する処理を実行する。生成された周波数帯域別画像情報は、ステップＳ６０３において、時系列情報記憶部４に対して出力され、記憶される。以下、ステップＳ６００〜Ｓ６０３を繰り返す。
【０１４５】
図１９は、この第２の実施の形態における画像情報圧縮前の時系列情報記憶部４の記憶状態を説明する図である。図１９に示されるように、この第２の実施の形態の場合の時系列情報記憶部４は、条件一致区間の画像情報を記憶するメモリ部４Ｍａと、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂとを備える。これらメモリ部４Ｍａおよび４Ｍｂは、それぞれ別々の記憶媒体であっても良いし、一つの記憶媒体のメモリ領域を分割したものであってもよい。
【０１４６】
時系列情報記憶部４の、条件一致区間の画像情報を記憶するメモリ部４Ｍａには、画像情報は周波数帯域分割されずに記憶されている。そして、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂは、さらに、高域部記憶メモリと、低域部記憶メモリとに領域分割され、それぞれ画像情報の高周波数帯域成分と、低周波数帯域成分とが、対応付けられて記憶されている。すなわち、図１９において、高域部記憶メモリの領域ａ１〜ａ６の記憶内容と、低域部記憶メモリの領域ａ１〜ａ４の記憶内容とは、同一区間の画像信号の高周波数成分と、低周波数成分とを示している。時系列情報記憶部４は、この周波数帯域成分の対応関係も管理している。
【０１４７】
図１９において、各メモリ領域ａ１，ａ２，…内に「・」が付与されているのは、画像列が記憶されていることを示しており、「・」が無いメモリ領域は空きメモリ領域を意味している。
【０１４８】
この第２の実施の形態においても、時刻情報記憶部７で時系列情報記憶部４の記憶内容の保存期間を監視し、例えば１ヶ月のような所定の期間を保存期間が経過したときに、時刻情報記憶部７は圧縮処理開始指示を出力し、画像情報圧縮を実行させる。
【０１４９】
図２０は、この第２の実施の形態における情報圧縮時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明した図である。また、図２１は、この第２の実施の形態における圧縮処理を説明するフローチャートである。
【０１５０】
すなわち、この実施の形態の場合、時刻情報記憶部７からの圧縮開始指示は、圧縮部６に直接供給される。そして、圧縮部６では、図２１のフローチャートに示すように、この圧縮開始指示を受け取って、音声情報および画像情報が時系列情報記憶部４に記録されてからの経過時間、すなわち情報保存時間が、所定の時間以上になったことをステップＳ７００において検出した場合には、ステップＳ７０１に進む。ステップＳ７０１では、時系列情報記憶部４に対して、条件一致区間以外の区間を記憶しているメモリから、高周波数帯域を削除する処理を行なう指示を送る。
【０１５１】
この例では、時系列情報記憶部４は、この圧縮部６からの高周波数成分削除指示を受けて、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂの高域部記憶メモリの記憶内容をすべて削除する。
【０１５２】
図２２は、画像情報圧縮後の時系列情報記憶部４の記憶状態を説明した図である。圧縮処理前の記憶状態が、前述の図１９に示したような状態であった場合には、圧縮部６からの高周波数成分削除指示を受けて、時系列情報記憶部４は、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂの中の、高周波数帯域画像成分を記憶する高域部記憶メモリの領域ａ１〜ａ６から画像情報を全て削除する。この結果、時系列情報記憶部４においては、図２２において、網点で示した領域ａ１〜ａ６の部分が、空きメモリ領域として生成される。
【０１５３】
この生成された空きメモリ領域は、条件一致区間以外の区間を記憶する記憶メモリ領域として再利用されても構わないし、条件一致区間を記憶する記憶メモリ領域として充当するようにしても構わない。
【０１５４】
以上に説明した処理によって、圧縮処理を施した後には、活発に議論が交わされていた部分の映像を再生した場合には、スムースな動きの高品質の動画が再生され、その他の部分を再生した場合には、いわゆる低画質映像であって、画質の低い動画となる。しかし、活発に議論が交わされていた場面以外のあまり重要でない部分のみを高圧縮率で圧縮できるので、蓄積すべき情報量は非常に少なくなる。
【０１５５】
この第２の実施の形態の以上の説明では、周波数帯域別に入力画像情報を記憶する方法について説明したが、例えば特開平６−１７８２５０号公報に述べられているように、入力画像情報を時系列情報記憶部４に蓄積する際に、画像信号を輝度信号成分と、色差信号や搬送色信号（色副搬送波信号）などの色信号成分とに分けて、別々の領域に記憶しておき、時刻情報記憶部７から圧縮開始指示が発生したときに、色信号成分のみを消去するように構成してもよい。この場合には、時系列情報記憶部３から部分画像列を読み出したり、書き戻したりする必要がなくなるため、圧縮処理時のシステムの負荷を軽減でき、圧縮処理の速度を早めることができる。
【０１５６】
また、入力画像情報を周波数帯域別に時系列情報記憶部４に蓄積する際に、条件一致区間の画像情報と、条件一致区間以外の区間の画像情報の低周波数帯域成分とを、記憶媒体の連続した領域に記憶し、条件一致区間以外の区間の画像情報の高周波数帯域成分を、記憶媒体の別の領域に記憶するように構成してもよい。この場合には、圧縮時に条件一致区間以外の区間の画像情報の高周波数帯域成分を消去しても、圧縮後の時系列データが、記憶媒体内で連続するため、再生速度の低下を防ぐことができる。
【０１５７】
さらに、この第２の実施の形態では、音声情報は圧縮しない構成になっているが、音声情報を同様にして圧縮することもできる。例えば、特開平７−１５５１９号公報に記載されているように、入力音声情報を時系列情報記憶部４に蓄積する際に周波数帯別に記憶しておき、圧縮部６が、時刻情報記憶部７から圧縮開始指示を受信したときに、音声の高周波数帯域を削除するように構成してもよい。この場合も、条件一致区間以外の区間の音声情報の高周波数帯域成分を優先的に削除するように構成するとよい。
上述の第２の実施の形態においては、入力音声情報または入力画像情報を時系列情報記憶手段に蓄積する際に周波数帯域別に記憶しておき、圧縮時に、高周波数帯域を削除するように構成することにより、圧縮のために時系列情報記憶手段から情報を読み出したり、時系列情報記憶手段に情報を書き戻したりする必要がなくなるため、圧縮処理時のシステムの負荷を軽減できる効果がある。
また、入力音声情報または入力画像情報を周波数帯域別に時系列情報記憶手段に蓄積する際に、条件一致区間検出手段が検出した条件一致区間と、条件一致区間以外の区間とで、周波数帯域の分け方を変えて記憶するように構成した場合には、条件一致区間以外の区間の画像情報のみを周波数帯域別に記憶し、条件一致区間の画像情報は通常の方法（周波数帯域別の記憶はしない）で記憶するというように、入力音声情報または入力画像情報を周波数帯域別に分ける処理を少なくすることができるので、システムの負荷を軽減できる効果がある。
【０１５８】
［第３の実施の形態］
第１の実施の形態および第２の実施の形態では、音声情報または画像情報が時系列情報記憶部４に記録されてからの経過時間が、例えば１ヶ月というような所定の時間以上になった場合に、１度だけ圧縮処理を行う例について説明した。
【０１５９】
しかしながら、１度だけ圧縮処理を行うよりも、段階的に複数回に分けて圧縮を施した方が、より効果的に蓄積媒体を節約できる場合がある。例えば、会議を記録する場合、１週間前に行われた会議を後から参照する可能性に比べて、１ヶ月前に行われた会議を参照する可能性は低く、また同様に、１ヶ月前に行われた会議を後から参照する可能性に比べると、半年前に行われた会議を参照する可能性は低い。このように、後から参照される可能性がより低くなった場合に、より少ない情報量で蓄積するように構成すれば、効果的に蓄積媒体を節約できるようにすることができる。
【０１６０】
この第３の実施の形態では、画像情報が時系列情報記憶部４に記録されてからの経過時間に応じて、圧縮率または圧縮方法を変更し、情報を段階的に圧縮する例について説明する。ただし、昔の会議記録映像であっても、重要な場面については、画像信号を高品質のままで保存しておく必要があるため、第１の実施の形態または第２の実施の形態と同様、記録した会議映像の中で、この例では、活発に議論が交わされていた会議の重要部分の映像だけを高品質のまま残し、その他の部分を高圧縮率で圧縮するようにする。
【０１６１】
また、この第３の実施の形態でも、入力画像情報を時系列情報記憶部４に蓄積する際に周波数帯域別に記憶しておく。このため、この第２の実施の形態と同様に、周波数帯域別画像生成部２１が設けられるが、この第３の実施の形態の場合においては、周波数帯域は、高周波数帯域と、中周波数帯域と、低周波数帯域との３帯域に画像情報を分割して、時系列情報記憶部４に記憶するようにする。この場合の周波数帯域別画像生成部２１は、高域用のハイパスフィルタと、中域用のバンドパスフィルタと、低域用のローパスフィルタとで構成される。
【０１６２】
また、この第３の実施の形態においては、条件一致区間と、条件一致区間以外の区間との区別なく、画像信号は周波数帯域を３帯域に分けて記憶するようにする。
【０１６３】
図２３は、画像情報記録時（画像情報圧縮前）の時系列情報記憶部４の記憶状態を説明した図である。すなわち、この例では、時系列情報記憶部４の条件一致区間の画像情報を記憶するメモリ部４Ｍａおよび条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂのそれぞれは、図２３に示すように、高域部記憶メモリ、中域部記憶メモリ、低域部記憶メモリを有し、それぞれのメモリ領域に、該当区間の画像情報の高域成分、中域成分、低域成分が、それぞれ記憶されるものである。
【０１６４】
そして、この第３の実施の形態においても、時刻情報記憶部７は、記憶時からの時間経過を監視して、所定時間経過したときに圧縮開始指示を、圧縮部６に出力するようにするが、圧縮開始指示は、予め設定された複数の経過時間、例えば１週間後、１ヶ月後、半年後、のそれぞれの時点で出力するようにする。このとき、各圧縮開始指示に、それがどの時点の圧縮開始指示であって、いずれの周波数帯域成分を圧縮するかのデータを付加して圧縮部６に供給するようにする。
【０１６５】
図２４は、時刻情報記憶部７に記憶された圧縮時刻管理テーブルの例を示す図である。図２４に示されるように、例えば、一番先に消去される画像データは、条件一致区間以外の区間の高周波数帯域部分であり、情報の記録後、１週間経過時に消去される。また、情報の記録後、１ヶ月が経過したときには、条件一致区間以外の区間の中周波数帯域部分と、条件一致区間の高周波数帯域部分とが消去される。また、情報の記録後、半年が経過したときには、条件一致区間以外の区間の低周波数帯域部分と、条件一致区間の中周波数帯域部分とが消去される。なお、条件一致区間の低周波数帯域部分に関しては、ユーザからの明示的な消去指示が与えられない限り自動的には消去されないようになっている。
【０１６６】
圧縮部６は、時刻情報記憶部７からの前記圧縮開始指示を受信したときに、その指示内容を解析し、その解析結果により時系列情報記憶部４に対して、いずれの記憶メモリの内容を削除するかの圧縮指示を出す。時系列情報記憶部４は、この圧縮指示に応じて、画像情報の段階的な圧縮処理を実行する。具体的には、前記図２４のテーブルの消去時期にしたがった各部記憶メモリの内容消去を実行する。
【０１６７】
図２５、図２６、図２７は、画像情報圧縮前に図２３のように画像情報が記録されている状態から、それぞれ、１週間、１ヶ月、半年が経過したときの、時系列情報記憶部４の記憶状態を説明した図である。これらの図２３、図２５、図２６、図２７において、各メモリ領域ａ１，ａ２，…内に「・」が付与されているのは、画像列が記憶されていることを示しており、「・」が無いメモリ領域は空きメモリ領域を意味している。
【０１６８】
すなわち、１週間経過のときには、図２５に示すように、時系列情報記憶部４では、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂの高域部記憶メモリの内容がすべて消去されて、空きメモリ領域とされる。
【０１６９】
また、１ヶ月経過のときには、図２６に示すように、条件一致区間の画像情報を記憶するメモリ部４Ｍａの高域部記憶メモリの内容と、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂの中域部記憶メモリの内容がすべて消去される。
【０１７０】
さらに、半年経過のときには、図２７に示すように、条件一致区間の画像情報を記憶するメモリ部４Ｍａの中域部記憶メモリの内容と、条件一致区間以外の区間の画像情報を記憶するメモリ部４Ｍｂの低域部記憶メモリの内容がすべて消去される。この結果、時系列情報記憶部４の記憶内容は、条件一致区間の画像情報を記憶するメモリ部４Ｍａの低域部記憶メモリの内容のみが残る。
【０１７１】
こうして、これらの図２５〜図２７で示されるように、時系列情報記憶部４には、時間の経過に伴い、より少ない情報量で画像情報が蓄積されるようになる。
【０１７２】
なお、この第３の実施の形態の上述の例では、時刻情報記憶部７の圧縮時刻の管理にテーブルを使用したが、もちろん、管理テーブルの代わりに、リストやスタックの構造であっても構わない。
【０１７３】
さらに、時刻情報記憶部７で、テーブルやリスト等で圧縮時刻および圧縮対象を管理するのではなく、情報の保存時間をパラメータとした数式演算により、任意の時刻における情報の圧縮率を算出し、この圧縮率に関する情報を圧縮部６に送って情報圧縮を実行させるように構成することもできる。
【０１７４】
例えば、ｙを情報量保持率（％）、ｘを時刻（経過日数）とすると、時刻情報記憶部７では、
ｙ＝９０ｅｘｐ（−Ａｘ）＋１０ …（１）
ただしＡは定数で、Ａ＞０である
で表される上記演算式（１）によって、特定の時刻における情報量保持率を求め、この情報量保持率の情報を圧縮率に関する情報として、圧縮部６に供給する。ここで、情報量保持率は、特定の時刻における情報量の、その情報が始めに記録されたときの情報量に対する割合を指す。
【０１７５】
圧縮部６は、時刻情報記憶部７からの、この情報量保持率に基づき、圧縮率を設定し、その圧縮率により時系列情報記憶部４に蓄積されている画像情報を圧縮する。この場合、時刻情報記憶部７は、ある周期で段階的に再圧縮を実行させるように圧縮開始指示を、前記周期で繰り返し発生する。
【０１７６】
上述の第３の実施の形態のように、音声情報または画像情報が時系列情報記憶手段に記録されてからの経過時間（すなわち情報保存時間）が、所定の時間以上になった場合に圧縮処理を開始するように構成した場合には、参照する可能性の大きい最近の音声または画像情報を、高音質／高画質で保存でき、かつ、昔に記録された音声または画像情報であっても少ないデータ量で記憶できる効果がある。
また、音声情報または画像情報が時系列情報記憶手段に記録されてからの経過時間（すなわち情報保存時間）に応じて段階的に圧縮を施すように構成されている場合には、後から参照される可能性がより低くなった情報は、より少ない情報量で蓄積できるので、より効果的に蓄積媒体を節約できる効果がある。
また、音声情報または画像情報のデータ量が予め定められた記憶容量に収まるように、圧縮量または圧縮方式を設定できるように構成したことにより、圧縮後のデータは、入力された画像情報の中の重要な部分のみを高画質で記憶した、所望記憶サイズのダイジェストとなる効果がある。
なお、この第３の実施の形態は、第２の実施の形態の変形として説明したが、第１の実施の形態の変形として実施することも、もちろんできる。
【０１７７】
［第４の実施の形態］
この第４の実施の形態は、条件一致区間検出部３での検出条件が、入力される音声信号の中に予め登録されたキーワードが出現したこと、あるいは予め登録された音声パターンが出現したこと、である場合である。
【０１７８】
まず、条件一致区間検出部３での検出条件が、入力される音声信号の中に予め登録されたキーワードが出現したことである場合について説明する。
【０１７９】
この場合、条件一致区間検出部３は、音声認識手段と、登録されたキーワードを記憶するメモリと、音声認識結果とメモリに予め登録されたキーワードとを比較して両者の一致を検出するキーワード一致検出手段とを備える。メモリには、ユーザが予めキーワードを登録しておく。
【０１８０】
そして、情報記録時には、条件一致区間検出部３は、入力された音声信号を音声認識手段により順次文字列情報に変換し、形態素解析するなどして、その文字列情報から語句を抽出する。そして、この抽出した語句を、「宿題」「アクションアイテム」「課題」「結論」「決定」「重要」「まとめ」などの、メモリに予め登録されている文字列キーワードと比較する。
【０１８１】
入力音声信号から抽出された語句が、予め登録されていた文字列キーワードのいずれかと一致した場合には、この文字列キーワードの検出時点は、条件一致区間の開始点となる。
【０１８２】
条件一致区間の終了点を定めるために、この例の場合の条件一致区間検出部３には、各キーワード文字列毎に、キーワードが検出されたときから、どれだけの時間の画像信号を高画質で記録するかを決めるキーワード有効期間が、図２８のようなテーブルに設定されている。キーワード有効期間は、重要なキーワードほど、長い時間が割り当てられる。
【０１８３】
そして、時系列情報記憶部４に蓄積されている画像情報を、前述したように所定時間経過後に圧縮する際には、前記条件一致区間の開始点から終了点までの区間の画像情報を高画質で保存し、それ以外の区間を高圧縮率で圧縮するようにする。圧縮方式は、第１〜第３の実施の形態のいずれの方法も採用できる。
【０１８４】
また、さらに、各キーワード文字列毎に重要度を設定できるようにしておけば、各キーワード文字列の重要度に応じた異なる圧縮率で画像信号を圧縮することが可能である。
【０１８５】
次に、入力される音声信号の中に、予め登録された音声パターンが出現したことを検出条件として、条件一致区間検出部３が条件一致区間を検出する場合について説明する。
【０１８６】
音声認識によってキーワードを検出することが困難な場合にも、笑い声のパターン、拍手のパターン、活発な対話のパターンなどの特徴的な音声信号のパターンであれば、これらのパターンを認識できる場合がある。そこで、この特徴的な音声パターンが出現したことをも、検出条件として条件一致区間検出部３は検出するようにする。
【０１８７】
この場合には、条件一致区間検出部３には、笑い声のパターン、拍手のパターン、活発な対話のパターンなどの特徴的な音声信号パターンが予め登録されて記憶されるメモリが設けられる。そして、公知のパターン認識技術、例えば、音声信号のパワーまたは周波数成分の時間的遷移を解析する技術などを用いて、パターン認識を行なうパターン認識手段が設けられる。
【０１８８】
予め登録されている特徴的な音声信号のパターンと、順次入力される音声信号から抽出される音声信号のパターンとを比較して、その一致あるいはその類似度から、当該特徴パターンを認識するようにする。パターン認識の認識率を上げるために、話者毎に、音声パターンを登録しておいてもよい。
【０１８９】
入力音声信号から抽出された音声信号のパターンが、予め登録されている特徴的な音声信号パターンのいずれかと一致したと判定された場合には、音声信号パターンの検出時点は、条件一致区間の開始点となる。
【０１９０】
また、条件一致区間の終了点を定めるために、この例の場合の条件一致区間検出部３には、各音声信号パターン毎に、パターンが検出された時からどれだけの時間の画像信号を高画質で保存するかを決める音声信号パターン有効期間が、図２９のようなテーブルに設定されており、前記条件一致区間の開始点から、この終了点までの区間の画像情報を、高画質で保存すべき情報と判定する。
【０１９１】
そして、時系列情報記憶部４に蓄積されている画像情報を、前述したように所定時間経過後に圧縮する際に、前記条件一致区間の開始点から終了点までの区間の画像情報は、高画質で保存し、その他の区間の画像情報は、情報量を大幅に削減するように圧縮する。圧縮方式は、第１〜第３の実施の形態のいずれの方法も採用できる。
【０１９２】
この例の場合では、入力音声信号から抽出された音声信号のパターンが、予め登録されている特徴的な音声信号パターンのいずれかと一致したと判定された時点を、条件一致区間の開始点と判定したが、音声信号のパターンが検出された時点より前の画像情報を含めて高画質で保存することもできる。そのようにした場合には、例えば、笑い声のパターンや、拍手のパターンが出現する時点の前には、そのパターンが出現する原因が存在することが普通であるので、その原因となる事象を高画質で保存するようにすることができる。
【０１９３】
この場合には、特徴的な音声信号パターンが出現した時点より所定時間前の時点を、条件一致区間の開始点とすることで、パターンが出現した原因となる事象を高画質で保存し、それ以外の区間を高圧縮率で圧縮するように構成する。
以上のように、第４の実施の形態によれば、入力される音声情報の中に予め登録されたキーワードまたはパターンが出現したことを条件一致区間検出手段によって検出するように構成されるので、予め登録されたキーワードまたはパターンが頻繁に出現した期間に記憶された音声または画像情報を、最初から最後まで高音質／高画質で保存でき、かつ、その他の部分の音声または画像情報であっても少ないデータ量で記憶できる効果がある。
【０１９４】
［第５の実施の形態］
この第５の実施の形態は、条件一致区間検出部３が、外部センサによって予め定めた状態変化を検出する場合である。すなわち、この実施の形態では、音声信号からの条件一致区間の検出では困難な事象を条件として、条件一致区間を検出する場合や、入力される音声信号に含まれない情報に状態変化が起きた場合を条件として条件一致区間を検出するために、外部センサを設ける。
【０１９５】
以下に説明するこの実施の形態では、外部センサが、場所を検出する場合について説明する。すなわち、以下の例では、役員会議室、応接会議室、一般会議室のように、会議室に応じた重要度を与えておき、重要な会議室の会議記録ほど、高品質で情報保全を計るようにする。
【０１９６】
音声信号または画像信号が入力された場所が、どの会議室で会議が行なわれたかという情報は、例えばＧＰＳ（グローバル・ポジショニング・システム）などの位置測定装置から出力される位置情報を解析することにより得られる。ＧＰＳを用いた場合には、音声信号または画像信号が入力された場所の緯度／経度を測定し、その情報と、各会議室が存在する予め記憶されている緯度／経度とを比較することによって、前記音声信号または画像信号が入力された会議室を特定できる。
【０１９７】
また、ＧＰＳ以外にも、会議室や廊下などの任意の場所に、それぞれの場所に固有のビットパターンを発振する赤外線送信機を設置するという、特開平７−１４１３８９号公報記載の赤外線送受方式を用いることもできる。この場合には、音声信号または画像信号が入力されたときに、近くの赤外線送信機が発振するビットパターンを受信し、そのパターンから会議室を識別する。
【０１９８】
以下に説明する例では、赤外線送受方式を用いる場合について説明する。この場合、条件一致区間検出部３は、赤外線信号認識手段と、登録された場所名を記憶するメモリと、赤外線信号を認識した結果から判定された場所名とメモリに予め登録された場所名とを比較して両者の一致を検出する場所一致検出手段とを備える。メモリには、ユーザが予め場所名を登録しておく。
【０１９９】
そして、情報の記録時には、条件一致区間検出部３は、入力された赤外線信号を、赤外線信号認識手段により場所名に変換する。そして、この変換した場所名を、メモリに予め登録されている場所名と比較する。そして、条件一致区間検出部３は、場所を検出する場合においては、同じ場所に留まっていると認識された期間の最初を条件一致区間の開始点として検出し、同じ場所に留まっていると認識された期間の最後を条件一致区間の終了点として検出する。
【０２００】
対応関係記憶部５には、条件一致区間を特定する情報として、当該区間の開始点および終了点と、場所名とを記憶する。場所名に代えて対応する識別子を記憶するようにすることもできる。また、対応関係記憶部５は、それぞれの条件一致区間と、その区間に時系列情報記憶部４に記憶される音声信号および画像信号の記憶アドレスとを対応付けて記憶する。
【０２０１】
この例においては、時刻情報記憶部７は、記憶保持期間が所定期間以上となったときに、対応関係記憶部５に圧縮開始指示を出力する。対応関係記憶部５は、この圧縮開始指示を受けて、条件一致区間を特定する情報として、当該区間の情報と場所名あるいは場所の識別子を圧縮部６に送る。
【０２０２】
圧縮部６は、予め登録された会議室名（場所名）と、各会議室の重要度とを対応させて記憶しているテーブルを備える。図３０は、このテーブルの例である。圧縮部６は、対応関係記憶部５からの場所名あるいは識別子を用いて、このテーブルを参照し、当該条件一致区間の会議室名を検出する。そして、その会議室名に割り当てられた重要度を抽出し、この重要度に応じた圧縮率で、対応する条件一致区間の画像信号を圧縮する。すなわち、重要度の高い場所で記録された情報ほど、圧縮時に、高品質を保って保存する。
【０２０３】
このようにすることで、例えば、役員会議室で行なわれた重要会議の記録映像を、他の会議室で行なわれた会議映像よりも、高画質で、より長く保存しておくというようなことができる。
【０２０４】
以上では、外部センサにより場所を検出する場合について説明したが、センサが人を判別するようにしてもよい。例えば、会議出席者に微弱無線発信機を取り付けると共に、会議室に無線受信機を取り付ける。そして、会議出席者が会議室に入室している期間を、前記無線受信機によって検知し、この期間だけを高画質で保存するように構成する。
【０２０５】
さらに、会議出席者ごとに異なる信号を前記微弱無線発信機によって発信するようにして、誰が入室しているかを識別することができるようにすれば、特定の人物が入室している期間だけを高画質で保存するように構成することもできる。
【０２０６】
また、単に物理的な場所、人名だけでなく、「ある会議に出席していた」、「ある人と一緒にいた」など、複数のセンサの検出結果を組み合せて得られる事象から、前記条件一致区間を特定するようにしてもよい。
【０２０７】
さらに、「扉の開閉」というような、単発的なセンサ入力信号（トリガ）を条件一致区間検出部３によって検出する場合には、トリガを検出した時点の所定時間前の時点を条件一致区間の開始点として検出し、該トリガを検出した時点の所定時間後の時点を条件一致区間の終了点として検出するように構成する。扉の開閉を検出するためには、扉に開閉検出センサを取り付けることで実施できる。
以上のように、この第５の実施の形態の場合、音声信号または画像信号が入力された場所を条件一致区間検出手段によって検出するように構成した場合には、重要な会議を特定の会議室で行なっているような場合、重要な場所で撮影された重要事象の音声または画像を、高音質／高画質で保存でき、かつ、それ以外の場所で撮影された音声または画像情報であっても少ないデータ量で記憶できる効果がある。
また、外部センサによって特定の人を検出するように構成した場合には、特定の人の音声または画像情報を、最初から最後まで高音質／高画質で保存でき、かつ、その他の人の音声または画像情報であっても少ないデータ量で記憶できる効果がある。
【０２０８】
［第６の実施の形態］
この第６の実施の形態は、条件一致区間検出部３が、ビデオカメラ１７の動き（以下、カメラワークという）を検出する場合である。
【０２０９】
例えば、人物をズームアップで撮影しているような場合には、重要な画像を撮っていることが多く、カメラ１７がズームインしている期間の音声信号または画像信号は、高音質／高画質で記憶したいことが多い。そこで、以下に説明する例では、同じ倍率で撮影している区間を条件一致区間として、その倍率と共に検出するようにする。そして、倍率の高い条件一致区間ほど重要な画像であるとして、重要度を定め、倍率に応じて高品質になるように、後の情報圧縮を行うようにする。これにより、倍率の高い、カメラ１７がズームアップしている区間の画像は、高品質に保たれる。
【０２１０】
以下に、この第６の実施の形態の場合の一例について説明する。
【０２１１】
この例の場合のビデオカメラ１７は、カメラの倍率として、１倍、５倍、１０倍の３つの倍率モードに設定でき、ズームリングの操作に応じて、カメラ操作情報として倍率を示す情報を出力する。このカメラ操作情報は、条件一致区間検出部３に供給される。上述したように、条件一致区間検出部３は、このカメラ操作情報から、カメラの倍率を同じ区間を条件一致区間とし、その倍率と共に検出する。
【０２１２】
すなわち、条件一致区間検出部３は、カメラワークを検出する場合においては、カメラ操作信号の倍率の変化点の時刻を条件一致区間の開始点として検出し、次にカメラ操作信号の倍率が変化した時刻を条件一致区間の終了点として検出する。したがって、この条件一致区間の終了点は、次の条件一致区間の開始点と同一時刻である。この条件一致区間の情報と、倍率の情報とは、対応関係記憶部５に、当該条件一致区間の画像情報および音声情報の時系列情報記憶部の記憶アドレスと対応つけて記憶される。
【０２１３】
図３１は、カメラ倍率と、条件一致区間の関係を示す図である。この図３１において、Ｔ０，Ｔ１，Ｔ２，Ｔ３はそれぞれ条件一致区間である。倍率が１倍である区間Ｔ０，Ｔ３は、ズームリングが操作されないノーマル倍率の区間である。図３１の例では、時点ｔ１でズームリングが操作されて、ズームイン操作開始となり、その始め区間Ｔ１は、倍率が５倍であり、時点ｔ２で倍率が１０倍にアップし、時点ｔ３で倍率が１倍となって、ズームイン操作終了となる。
【０２１４】
この実施の形態では、圧縮部６では、カメラの倍率の、１倍、５倍、１０倍の３つの倍率モードに対して、それぞれの倍率モード時の画像間引き圧縮率を、１フレーム／秒、５フレーム／秒、１０フレーム／秒、に設定している。
【０２１５】
前述の実施の形態と同様に、画像情報が時系列情報記憶部４に記録されてからの経過時間が、所定の時間以上になった場合に、対応関係記憶部５に時刻情報記憶部から圧縮開始指示が与えられて実行される。このとき、対応関係記憶部５は、それぞれの条件一致区間の情報と、倍率と、時系列情報記憶部の記憶アドレスに組を、圧縮部６に送る。圧縮部６は、取得した倍率の情報から、この場合、図２３の区間Ｔ１は５フレーム／秒で、区間Ｔ２は１０フレーム／秒で、それ以外の区間Ｔ０，Ｔ３は１フレーム／秒で圧縮する。
【０２１６】
以上のようにして、この実施の形態の場合には、カメラワークまたはカメラワークの変化に応じて、重要場面の画像信号と、重要でない場面の画像信号の圧縮率を変えて情報を保存することができる。
そして、カメラ操作信号またはカメラ操作信号の変化を条件一致区間検出部３によって検出するように構成した場合には、重要な音声または画像をアップでカメラ撮影しているような場合、ズームインしている期間の音声または画像を、高音質／高画質で保存でき、かつ、それ以外の期間の音声または画像情報であっても少ないデータ量で記憶できる効果がある。
【０２１７】
なお、条件一致区間検出部３は、カメラの操作情報からカメラワークまたはその変化を検知する場合に限らず、カメラからの画像信号から検知するようにすることもできる。
【０２１８】
カメラからの画像信号から検出することができるカメラワークとしては、パンニング、チルティング、ズーミング、ブーミング、トリミング、ドリーイング、カット開始、カット終了などがあり、これらのカメラワークを検出する際には、入力される画像信号を画像認識して検出するようにする。また、これらのカメラワークも、特開平６−１６５００９号公報や特開平７−２４５７５４号公報に記載されているように、カメラ操作に使用したボタンなどの操作信号を検出しても、もちろんよい。
【０２１９】
［第７の実施の形態］
この第７の実施の形態では、時系列情報記憶部４に記憶された音声情報または画像情報が、ユーザによって参照（アクセス）されたか否かという参照状態に基づいて、圧縮率または圧縮方法を変えて情報を圧縮する場合について説明する。
【０２２０】
一般に、頻繁に参照された情報は重要な情報であることから、頻繁に参照された区間の画像情報は高画質で保存し、参照される頻度が低かった画像情報は高圧縮率で圧縮し、少ない情報量で保存するようにする。
【０２２１】
記憶媒体に蓄積された音声情報または画像情報が、ユーザによってどれくらいの頻繁でアクセスされたかという参照状態を記憶しておき、その参照状態に基づいて、圧縮率を変更する。このために、第７の実施の形態では、ユーザからの参照頻度を記憶するための参照状態記憶部を備えている。
【０２２２】
この実施の形態では、前記参照状態記憶部は、前記時系列情報記憶部４に記憶された画像情報の、ユーザによって映像再生された区間と、その区間の映像再生された回数とを、情報の参照状態として記憶する。
【０２２３】
図３２は、時系列情報記憶部４の記憶状態を説明した図である。図において、区間Ｔ２、Ｔ４、Ｔ６は、前述したいくつかの実施の形態の条件一致区間検出部３によって検出された条件一致区間である。すなわち、例えば第１の実施の形態の場合であれば、対話の活発な区間である。その外の区間Ｔ１、Ｔ３、Ｔ５、Ｔ７は、条件一致区間以外の区間である。これらの区間に関する情報は、前述したように対応関係記憶部５に記憶されている。
【０２２４】
図３３は、参照状態記憶部の記憶状態の例を示した図である。参照状態記憶部は、時系列情報記憶部４に画像情報が記憶されてから現時点までに、何回、前記区間Ｔ１〜Ｔ７の画像情報がアクセスされたか、すなわち、何回その区間の映像が再生されたかを記憶している。
【０２２５】
そして、画像情報が時系列情報記憶部４に記録されてからの経過時間（すなわち情報保存時間）が、所定の時間以上になった場合に、時刻情報記憶部７から圧縮開始指示が発生すると、参照状態記憶部は、図３３の参照回数のテーブルの情報を圧縮部６に送る。圧縮部６は、図３４に示される圧縮率設定テーブルを備える。この圧縮率設定テーブルは、参照回数に対して、条件一致区間および条件一致区間以外の区間のそれぞれに設定される圧縮率の対応テーブルである。
【０２２６】
圧縮部６は、この圧縮率設定テーブルを参照して、各区間毎の画像圧縮率を決定する。そして、圧縮部６は、この圧縮率で時系列情報記憶部４に記憶された画像情報を圧縮するようにする。
【０２２７】
例えば、図３２の区間Ｔ１は、条件一致区間以外の区間であり、図３３のテーブルで示されるように参照された回数が０回であるので、図３４の圧縮率設定テーブルに基づいて、画像圧縮率は９０％に設定される。すなわち、区間Ｔ１は、条件一致区間以外の区間であり、かつ、ユーザからアクセスされたことがない区間であるので、重要でない区間であることがわかる。したがって、圧縮時に、９０％の高圧縮率で圧縮される。
【０２２８】
一方、図３２の区間Ｔ６は、条件一致区間であり、図３３のテーブルで示されるように参照された回数が５回であるので、図３４の圧縮率設定テーブルに基づいて、画像圧縮率は１０％に設定される。すなわち、区間Ｔ６は、条件一致区間であり、かつ、ユーザから５回もアクセスされた区間であるので、非常に重要な区間であると見なせる。したがって、圧縮時には、ほとんど圧縮を行なわず、高画質で画像情報が保存される。
【０２２９】
この第７の実施の形態では、参照状態記憶部は、区間と参照回数の関係をテーブルの形式で記憶したが、もちろん、リストやスタック等、他の形式で記憶しても構わない。また、圧縮率設定テーブルの中の数値は、ユーザが設定できるようにしてもよい。
そして、以上説明した第７の実施形態のように、時系列情報記憶部３に記憶された音声情報または画像情報が、ユーザによってどのくらいの頻度で参照（アクセス）されたかという参照状態に基づいて、圧縮量または圧縮方法を変えて情報を圧縮するように構成した場合には、頻繁に参照された情報は重要な情報であることから、頻繁に参照された区間の音声情報または画像情報は高品質で保存し、参照される頻度が低かった音声情報または画像情報は高圧縮率で圧縮し少ない情報量で保存することができる効果がある。
【０２３０】
［第８の実施の形態］
上述した第１、第２、第３、第４、第５、第６、および、第７の実施の形態では、音声情報または画像情報が時系列情報記憶部４に記録されてからの経過時間（すなわち情報保存時間）が、所定の時間以上になった場合に圧縮処理を開始するようにしたが、この第８の実施の形態では、時系列情報記憶部４における空き領域がある値以下になったと認識されたタイミング、または、時系列情報記憶部４における記憶量がある値以上になったと認識されたタイミングで、前記圧縮処理を開始するようにする。
【０２３１】
したがって、記憶時の処理動作は、前述の各実施の形態と同様であるが、情報圧縮時の動作が異なる。
【０２３２】
図３５は、第８の実施の形態における情報圧縮時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明する図である。この実施の形態の情報蓄積装置は、記憶量検出部３１を備えており、この記憶量検出部３１は、画像情報が予め登録した記憶容量を超えて記録されたことを検出した場合に、前記対応関係記憶部５に対し、圧縮処理開始指示を出力する。この圧縮処理開始指示の後の動作については、前述の各実施の形態と同様に行うことができる。
【０２３３】
図３６は、第８の実施の形態における記憶量検出部３１の処理のフローチャートである。ステップＳ８００において、情報記憶量が所定の量を超えたことが検出された場合には、ステップＳ８０１に進んで、対応関係記憶部５に、圧縮処理開始指示を出力する。例えば、記憶媒体の記憶容量の９０％を超えて情報を記録しようとしたときに、前記圧縮処理を実行するように、記憶量検出部３１に設定しておいた場合には、記憶量が記憶媒体の９０％に達したときに、記憶量検出部３１は、圧縮処理開始指示を出力する。
【０２３４】
記憶量検出部３１から圧縮開始指示を受信した対応関係記憶部５は、条件一致区間のそれぞれと、それぞれの条件一致区間に対応して時系列情報記憶部４に記憶されている画像情報の、当該時系列情報記憶部４における記憶アドレスとを圧縮部６に出力する。前述と同様にして、圧縮部６は、時系列情報記憶部４に蓄積された画像情報のデータ圧縮を行なう。もちろん、この場合、新たな画像信号を記録しながら、圧縮処理をバックグラウンドで実行しても構わない。
【０２３５】
また、この実施の形態の場合、画像情報のデータ量が予め定められた記憶容量に収まるように、圧縮率あるいは圧縮方式を設定するようにしてもよい。例えば、記憶媒体の記憶容量の９０％を超えて情報を記録しようとしたときに、前記圧縮処理によって、記憶媒体の使用量が３０％にまで減少するように設定しておく。この設定値から、条件一致区間および条件一致区間以外の区間の圧縮率を算出するように構成する。
【０２３６】
たとえば、時系列情報記憶部４に非圧縮画像が１００００フレーム分、蓄積されていたとする。この１００００フレームの内分けは、条件一致区間が２０００フレーム、条件一致区間以外の区間が８０００フレームとする。
【０２３７】
このときに、３０００フレームにまで画像情報を減らすようにフレーム間引き圧縮処理を施す場合について説明する。また、条件として、条件一致区間の圧縮率と条件一致区間以外の区間の圧縮率の比が、１：１０になるように予め定めてあったとする。
【０２３８】
この場合、条件一致区間の圧縮率をａとすれば、条件一致区間以外の区間の圧縮率は１０ａである。
【０２３９】
２０００ａ＋８０００×１０ａ＝３０００
を満たす圧縮率ａは０．０３６６であるから、条件一致区間の圧縮率と条件一致区間以外の区間の圧縮率は、それぞれ、３．６６％と３６．６％となる。
【０２４０】
時系列情報記憶部４に記憶されている非圧縮画像１００００フレームを、条件一致区間と、条件一致区間以外の区間とで分けて、それぞれの圧縮率でフレーム間引き圧縮を施せば、所望の３０００フレームにまで画像情報を減らすことができる。
以上のように、第８の実施の形態によれば、順次入力される音声または画像情報が、蓄積媒体の記憶容量を超えて入力される場合に、時系列情報記憶手段における空き領域がある値以下になったと認識されたタイミング、または、時系列情報記憶手段における記憶量がある値以上になったと認識されたタイミングで圧縮処理を開始するように構成したので、新たに入力される音声または画像情報が時系列情報記憶手段の記憶容量を超えて入力される場合にでも、入力を継続できる効果がある。
【０２４１】
［第９の実施の形態］
公知の技術として、記録時に情報の取捨選択を行ない、重要と認識された情報のみを記録したり、圧縮率を変化させて記録する装置が知られている。例えば、特開平７−１２９１８７号公報には、音声取り込みキーを押したときの前後の音声を一定時間分だけ記録する装置が記載されている。また、市販されているテープレコーダの中には、無音区間は音声を記憶しないという無音区間検出機能を持ったものがある。
【０２４２】
しかしながら、特開平７−１２９１８７号公報記載の装置のように記録時に情報の取捨選択を行なう方法では、例えば、会議の中で最も数多く発言した人を特定し、この特定した人の発言部分の音声情報または画像情報のみを高品質で保存するといったようなことや、ユーザが指定した時間長になるように高い重要度を持ったシーンから順に抽出してダイジェストを作成するといったようなことができない。すなわち、音声情報または画像情報の記録終了後に初めて得られる情報、または、記録しながらでは得られない情報を元にして、音声情報または画像情報の圧縮を行なうことができないという問題がある。
【０２４３】
この実施の形態では、音声情報または画像情報の記録終了後に初めて得られる情報に基づいて圧縮方法や圧縮率を設定する場合について説明する。
【０２４４】
例えば、会議の場面において、話者の発話が長く継続している場面は、連絡事項を伝達している場面であったり、まとまった意見を発言している場面であったり、議論のまとめを行なっている場面であったりと、重要な発言が述べられている場面であることが多い。そこで、１つの会議を撮影した後に、発言時間の長い場面から順に高い重要度を割り当て、情報圧縮時には、高い重要度を割り当てられた発言部分を高音質／高画質で保存し、重要度の低いその他の部分を高圧縮率で圧縮するようにする。
【０２４５】
また、他の例として、例えば、予め登録された音声キーワードが、会議の中でどの位の時間に渡って用いられたかを記憶しておき、使用時間の長いキーワードから順に高い重要度を割り当てるようにしてもよい。例えば、会議の場面においては、長時間議論された議論は重要な議論であることが多い。そこで、議論の内容を推定できるようなキーワードを予め登録しておき、そのキーワードを入力音声信号の中から検出するようにする。
【０２４６】
そして、特定のキーワードが長時間に渡って使用されたことを検出することにより、そのキーワードに対応した議論が長時間なされたと認識し、このキーワード出現区間を重要部分と見なす。情報圧縮時には、高い重要度が割り当てられた区間を高音質／高画質で保存し、重要度の低いその他の区間を高圧縮率で圧縮するようにする。
以上説明した第７の実施形態のように、条件一致区間検出部３が検出した検出結果を組み合わせて音声情報または画像情報の重要度を決定し、この重要度に基づき、条件一致区間と他の区間とで圧縮量あるいは圧縮方式を変更して、音声情報または画像情報のデータ圧縮を行なうように構成した場合には、様々な事象が組み合わさった複雑な事象に応じた圧縮率または間欠記録の時間間隔で音声または画像情報を記憶できる効果がある。
【０２４７】
［第１０の実施の形態］
以上に説明した第１の実施の形態〜第９の実施の形態は、情報の入力時に、条件一致区間検出部３によって条件一致区間を検出し、その検出結果と、それぞれの条件一致区間に対応する音声情報または画像情報の時系列情報記憶部４における記憶位置とを対応付けて対応関係記憶部５に記憶し、圧縮部６は、対応関係記憶部５に記憶された対応関係情報に基づいて時系列情報記憶部４に記憶された前記音声情報または画像情報を圧縮するように構成していた。
【０２４８】
この第１０の実施の形態では、情報の入力時には条件一致区間を検出せず、情報の圧縮時に、条件一致区間検出部３によって条件一致区間を検出するようにする。この実施の形態の場合には、対応関係記憶部５が不要となる。
【０２４９】
図３７は、第１０の実施の形態における記録時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明する図である。この第１０の実施の形態の場合の記憶時には、入力音声情報および画像情報を、時系列情報記憶部４に順次に記憶する。時系列情報記憶部４は、記憶開始時刻を時刻情報記憶部７に記憶させるようにする。
【０２５０】
第１の実施の形態における記憶時の動作を説明した図３と、この第１０の実施の形態における記憶時の動作を説明した図３７とを比較すると、図３７では、対応関係記憶部５を備えず、また、記憶時に条件一致区間検出処理を行なわないために、処理が非常に単純化されている。
【０２５１】
図３８は、この第１０の実施の形態における記録時の処理のフローチャートである。まず、ステップＳ９００において、時刻情報記憶部７に記録開始時間が記憶され、ステップＳ９０１に進む。ステップＳ９０１では、音声情報と画像情報とを音声情報入力部１と画像情報入力部２によって入力し、ステップＳ９０２で、この入力された音声情報と画像情報とを時系列情報記憶部４に記憶する。そして、ステップＳ９０１とＳ０２とを繰り返す。
【０２５２】
図３９は、この第１０の実施の形態における圧縮時の動作を、その際の各種情報の流れ、および、各部の出力の流れと共に説明する図である。この第１０の実施の形態においては、例えば記憶時から所定時間経過して、時刻情報記憶部７から圧縮処理開始指示が発生すると、条件一致区間検出部３は、時系列情報記憶部４から音声情報を読み出して、前述したような条件一致区間の検出を行う。この実施の形態では、第１の実施の形態と同様に、対話の活発な区間を条件一致区間として検出する。そして、条件一致区間検出部３は、検出した条件一致区間の情報を時系列情報記憶部４に送ると共に、圧縮部６に送る。
【０２５３】
時系列情報記憶部４は、条件一致区間検出部３からの条件一致区間情報を受け取り、この条件一致区間に対応して時系列情報記憶部４に記憶されている音声情報または画像情報の、時系列情報記憶部４における記憶アドレスを計算する。そして、この記憶アドレスを圧縮部６に出力する。
【０２５４】
圧縮部６は、条件一致区間検出部３からの条件一致区間情報と、時系列情報記憶部４からの前記記憶アドレスとを受け取り、これらの入力情報に基づいて、条件一致区間および条件一致区間以外の区間の画像情報の圧縮率を決定する。すなわち、条件一致区間の画像情報の圧縮率は低く、条件一致区間以外の区間の画像情報の圧縮率は高く決定する。例えば、条件一致区間は圧縮せず、条件一致区間以外の区間の部分画像列を圧縮部６は、時系列情報記憶部４から受け取り、第１の実施の形態と同様に、１／１０に圧縮する。そして、その圧縮後の部分画像列を時系列情報記憶部４に書き戻すようにする。
【０２５５】
図４０は、この第１０の実施の形態の場合の圧縮時の処理のフローチャートである。このフローチャートに示す処理は、図２の会議情報蓄積処理部１２で実行される処理内容に相当する。
【０２５６】
すなわち、この第１０の実施の形態においては、音声情報および画像情報が時系列情報記憶部４に記録されてからの経過時間が所定の時間以上になったことをステップＳ１０００において検出する。これは、図３９の機能ブロックでは時刻情報記憶部７にて行われる処理である。情報の保存期間が所定の時間を超過したときには、ステップＳ１００１に進み、圧縮処理開始指示を発生し、前述したように、条件一致区間検出部３に供給する。
【０２５７】
圧縮処理開始指示を受け取った条件一致区間検出部３は、ステップＳ１００２において、条件一致区間の検出を実行する。そして、ステップＳ１００３において、条件一致区間を検出したか否か判別し、その判別結果を時系列情報記憶部４に送る。条件一致区間検出部３が条件一致区間を検出したときにはステップＳ１００４に進み、低圧縮率で圧縮を実行する。第１の実施の形態と同様に、時系列情報記憶部４の条件一致区間の情報は圧縮しないようにしてもよい。
【０２５８】
一方、ステップＳ１００３で条件一致区間以外の区間であると判別されたときには、ステップＳ１００５に進み、時系列情報記憶部４に記憶された当該条件一致区間以外の区間の画像情報を読み出し、高圧縮率の圧縮、例えば１０フレームの内の最初の１フレームのみを残す間引きによる圧縮を施す。
【０２５９】
そして、ステップＳ１００４およびＳ１００５の後には、ステップＳ１００６に進み、圧縮後の画像情報を時系列情報記憶部４に書き戻す。そして、ステップＳ１００７に進み、時系列情報記憶部４に記憶した情報のすべてについて圧縮処理を終了したか否か判別し、終了していなければステップＳ１００３に戻って、このステップＳ１００３以降を繰り返す。また、すべての情報について圧縮処理を終了していれば、この圧縮処理ルーチンを終了する。
【０２６０】
［その他の変形例］
以上説明した第１の実施の形態から第１０の実施の形態は、各実施の形態の説明中に明記した態様のほかにも、適宜、各実施の形態を組み合わせて、この発明の情報蓄積装置を構成することも可能である。
【０２６１】
例えば、第３の実施の形態で説明した段階的圧縮方法を、第７の実施の形態で説明した参照状態記憶方法と組み合わせて用いることで、画像情報の参照状態に応じて、段階的に画像情報の圧縮率を高めるようにすることができる。
【０２６２】
また、第３の実施の形態は、第２の実施の形態の変形として説明したが、第１の実施の形態の変形として実施してもよい。
【０２６３】
さらに、第８の実施の形態で説明した記憶量検出部は、第１の実施の形態から第７の実施の形態まで、および、第９の実施の形態から第１０の実施の形態までと組み合わせて実施することができる。
【０２６４】
また、第１の実施の形態から第９の実施の形態までの実施の形態は、第１０の実施の形態で説明したように、対応関係記憶部を備えないようにして構成することもできる。
【０２６５】
【発明の効果】
以上説明したように、請求項１の発明による情報蓄積装置によれば、音声情報または画像情報のうちの、特徴的な事象が起こっている重要部分を限られた蓄積媒体の中に数多く記憶でき、かつ、重要部分以外の音声情報または画像情報であっても少ないデータ量で長時間記憶できる効果がある。
【０２６６】
そして、請求項２の発明の場合には、情報圧縮時における条件一致区間検出操作が不要となり、音声情報または画像情報を圧縮する際のシステムの負荷を軽減できる効果がある。
【０２６７】
また、請求項３の発明の場合には、音声信号の状態変化の検出が困難な事象が起きた場合や、入力される音声信号に含まれない情報に状態変化が起きた場合にでも、音声情報または画像情報のうちの、特徴的な事象が起こっている重要部分を限られた蓄積媒体の中に数多く記憶でき、かつ、重要部分以外の音声情報または画像情報であっても少ないデータ量で長時間記憶できる効果がある。
【０２６８】
また、請求項４の発明の場合には、情報圧縮時における条件一致区間検出操作が不要となり、音声情報または画像情報を圧縮する際のシステムの負荷を軽減できる効果がある。
【図面の簡単な説明】
【図１】この発明による情報蓄積装置の第１の実施の形態の全体を示す機能ブロックである。
【図２】この発明による情報蓄積装置の実施の形態が適用されるシステムの概要を説明する図である。
【図３】第１の実施の形態の条件一致区間検出部の、音声レベル検出動作の説明のための図である。
【図４】第１の実施の形態の条件一致区間検出部の、活発な対話区間を検出する動作の説明のための図である。
【図５】第１の実施の形態の情報記録時の動作の流れを示す図である。
【図６】第１の実施の形態における条件一致区間検出部の動作のフローチャートである。
【図７】第１の実施の形態における対応関係記憶部の動作のフローチャートである。
【図８】第１の実施の形態における、条件一致区間検出部の検出結果と、時系列情報記憶部のメモリアドレスとの対応関係を説明するための図である。
【図９】第１の実施の形態における、条件一致区間検出部の検出結果と、時系列情報記憶部のメモリアドレスとの対応関係を管理するテーブルである。
【図１０】第１の実施の形態における時系列情報記憶部の動作のフローチャートである。
【図１１】第１の実施の形態における時刻情報記憶部の動作のフローチャートである。
【図１２】第１の実施の形態における時刻情報記憶部の記憶構造を説明する図である。
【図１３】第１の実施の形態における時刻情報記憶部の記憶構造の他の例を説明する図である。
【図１４】第１の実施の形態の情報圧縮時の動作の流れを示す図である。
【図１５】第１の実施の形態における圧縮部の動作のフローチャートである。
【図１６】第１の実施の形態における、圧縮前と、圧縮後の時系列情報記憶部の記憶状態を、比較して説明するための図である。
【図１７】第２の実施の形態の情報記録時の動作の流れを示す図である。
【図１８】第２の実施の形態における周波数帯域別画像生成部の処理のフローチャートである。
【図１９】第２の実施の形態における時系列情報記憶部の、圧縮処理前のメモリ記憶状態を説明する図である。
【図２０】第２の実施の形態の情報圧縮時の動作の流れを示す図である。
【図２１】第２の実施の形態における圧縮処理のフローチャートである。
【図２２】第２の実施の形態における時系列情報記憶部の、圧縮処理後のメモリ記憶状態を説明する図である。
【図２３】第３の実施の形態における時系列情報記憶部の、情報記録時点のメモリ記憶状態を説明する図である。
【図２４】第３の実施の形態における、段階的な圧縮を実行する時刻を管理する、圧縮時刻管理テーブルである。
【図２５】第３の実施の形態における時系列情報記憶部の、１週間経過後のメモリ記憶状態を説明する図である。
【図２６】第３の実施の形態における時系列情報記憶部の、１ヶ月経過後のメモリ記憶状態を説明する図である。
【図２７】第３の実施の形態における時系列情報記憶部の、半年後のメモリ記憶状態を説明する図である。
【図２８】第４の実施の形態における、音声信号の中に予め登録されたキーワードが出現したことを検出する場合の、キーワード有効期間を管理するテーブルである。
【図２９】第４の実施の形態における、音声信号の中に予め登録された音声信号パターンが出現したことを検出する場合の、パターン有効期間を管理するテーブルである。
【図３０】第５の実施の形態における、条件一致区間検出部が場所を検出する場合の、場所と場所の重要度を対応づけて管理するテーブルである。
【図３１】第６の実施の形態における、条件一致区間検出部がカメラワークを検出する場合の、圧縮率設定処理を説明する図である。
【図３２】第７の実施の形態における時系列情報記憶部の条件一致区間を示す図である。
【図３３】第７の実施の形態における参照状態記憶部の記憶状態を管理するテーブルである。
【図３４】第７の実施の形態における圧縮率設定テーブルの記憶状態を管理するテーブルである。
【図３５】第８の実施の形態の情報圧縮時の動作の流れを示す図である。
【図３６】第８の実施の形態における記憶量検出部の処理のフローチャートである。
【図３７】第１０の実施の形態における情報記録時の動作の流れを示す図である。
【図３８】第１０の実施の形態における情報記録処理のフローチャートである。
【図３９】第１０の実施の形態における情報圧縮時の動作の流れを示す図である。
【図４０】第１０の実施の形態における情報圧縮処理のフローチャートである。
【符号の説明】
１音声情報入力部
２画像情報入力部
３条件一致区間検出部
４時系列情報記憶部
５対応関係記憶部
６圧縮部
７時刻情報記憶部
８再生部
９制御部
１０情報蓄積装置
１１音声信号解析器
１２会議情報蓄積処理装置
１３蓄積媒体
１４モニター装置
１５マイクロホン
１６会議参加者
２１周波数帯域別画像生成部
３１記憶量検出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information storage device and an information storage method for storing and storing information such as conversation voices in conferences and interviews and images of conferences and interviews, such as conference record recording systems and interview recording systems.
[0002]
[Prior art]
When recording key points such as conferences, lectures, interviews, interviews, conversations using telephone and videophone, and video images, recording is usually performed by a recorder. For example, when recording a meeting, one of the attendees of the meeting serves as a clerk and records the remarks of all the meeting participants one by one, or selects and records only important items. However, there are often situations where it is difficult to create meeting minutes later because the notes that you wrote down do not recall the details of the meeting or how it was spoken, or you did not record who spoke. .
[0003]
Therefore, records such as conferences, lectures, interviews, interviews, telephone and videophone conversations, TV images, surveillance camera images, etc. have been stored and stored in digital disks, digital still cameras, video tapes, semiconductor memories, etc. An apparatus for reproducing has been proposed. When information is stored using these information storage devices, it is possible to record the input information, such as voice and images, without leaking, compared to the method in which the recording person writes only the main points of the information to be recorded. There is.
[0004]
These devices include those that record a digital signal transmitted via a computer network on a storage medium, those that record an analog input signal from a video camera or microphone as it is on a storage medium, or those that are encoded into a digital signal. There are things to convert and record.
[0005]
However, there is a problem that it is difficult to record a long-time audio signal and / or image signal in a storage medium having a limited recording capacity. This is because, in general, the storage capacity required for recording time-series data such as sequentially input audio signals or image signals over a long period of time becomes enormous.
[0006]
To solve this problem, for example, as in Video for Windows ("Microsoft Video for Windows 1.0 User's Guide" pp. 57-59, pp. 102-108), audio signals and image signals are always compressed and stored in a storage medium. A method of storing has been proposed. In that case, however, all input audio signals or image signals are generally stored at the same compression rate. For this reason, there is a problem that a relatively large amount of relatively unimportant information that is less likely to be referenced after recording is recorded, or that the information cannot be recorded in good quality due to the storage capacity despite the important information. It was.
[0007]
For example, when interview scenes are recorded for a long time using the Video for Windows, the compression rate is set so that only one frame of image signals is stored for 5 seconds in order to save storage capacity. Suppose that At this time, even if the recording person wants to reproduce the part that is important at the time of recording later, only one frame of image signal can be reproduced in 5 seconds. ), Talking, and subtle nuances cannot be reproduced.
[0008]
However, conversely, when trying to store all input image signals in 30 frames per second, storing a long interview requires a huge amount of storage capacity, which is very difficult to implement. It is.
[0009]
Therefore, Japanese Patent Application Laid-Open No. 5-64144 and Japanese Patent Application Laid-Open No. 5-134907 disclose an old frame of image information already stored when the amount of data stored in the image storage medium exceeds a predetermined amount. An information storage device that attempts to save storage capacity by sequentially compressing or thinning out frames is described. By regarding the information stored later as important information, the previously stored information is overwritten with new input information, or the information stored earlier is increased in compression rate. It is a device that saves storage capacity.
[0010]
Japanese Patent Laid-Open Nos. 2-305053 and 7-15519 recompress audio information that has already been stored when it is recognized that the free capacity of the storage medium has fallen below a certain amount. Thus, an audio information storage device that secures a free space in a storage medium is described.
[0011]
In addition, the moving image recording apparatus described in Japanese Patent Laid-Open No. 6-149902 performs automatic scene change detection, and considers that a longer scene is a more important scene. This is an apparatus that extracts in order from scenes having a high importance so as to be long.
[0012]
If only the scene included in the digest generated by the apparatus described in this publication is left and the scene not included in the digest is deleted, the storage capacity can be saved without losing important information.
[0013]
On the other hand, Japanese Patent Application Laid-Open No. 3-90968 and Japanese Patent Application Laid-Open No. 6-149902 have proposed devices that automatically generate a video digest so as to have a time length specified by a user. In the apparatus described in Japanese Patent Laid-Open No. 3-90968, when a user inputs an importance level for each scene in advance from an editor and generates a digest, the high importance level is set so that the time length specified by the user is reached. This is an apparatus for extracting in order from a scene having Also in this apparatus, if only the scene included in the generated digest is left, the storage capacity can be saved without losing important information.
[0014]
[Problems to be solved by the invention]
However, the devices described in Japanese Patent Laid-Open Nos. 2-305053 and 7-15519 are devices that re-compress stored audio information at the same compression rate throughout and record it. There is a problem that it is not possible to record the high-quality sound by partially reducing the compression rate of only the important part of the content.
[0015]
Further, in an information storage device for storing and storing conferences, lectures, interviews, interviews, etc., as described in Japanese Patent Laid-Open No. 5-64144 or Japanese Patent Laid-Open No. 5-134907, new records are simply recorded as important information. If the old record is erased as unnecessary information, the record of important meetings and important interviews will be overwritten by new input information just because it was recorded first. There is. This is because, in general, it is not possible to determine the importance of the content of the meeting or the content of the coverage based only on the date and time when the conference or the coverage was held.
[0016]
In addition, regarding the device described in Japanese Patent Laid-Open No. 6-149902 that determines the importance of a scene based on the length of the scene, when a meeting or lecture is taken with an unmanned camera, the scene can be changed by a cut change or a scene change. There is a problem that it is very difficult to cut and the length of the scene cannot be detected. In addition, if you are shooting a meeting or lecture, important statements may be included even in a short scene, so based on the length of the scene alone, There is a problem that the importance cannot be determined.
[0017]
Furthermore, the device described in Japanese Patent Laid-Open No. 3-90968, in which the user inputs the importance level for each scene from the editor in advance, can be used by a cut change or a scene change when a conference or a lecture is taken with an unmanned camera. There is a problem that it is very difficult to separate scenes. In addition, the task of inputting the importance level from the editor after shooting is very troublesome, and there is a problem that it is not suitable for the purpose of recording a meeting or a lecture.
[0018]
By the way, as a known technique, there is known an apparatus that performs selection of information at the time of recording and records only information recognized as important or records by changing the compression rate. For example, Japanese Patent Application Laid-Open No. 7-129187 describes an apparatus that records audio before and after a voice capture key is pressed for a predetermined time. In addition, some commercially available tape recorders have a silent section detection function in which silent sections do not store voice.
[0019]
However, since these devices do not have means for re-compressing information once recorded, the compression rate is changed stepwise depending on the length of the information storage period, or the free storage capacity of the storage medium is reduced. There has been a problem that the compression efficiency cannot be changed dynamically according to the change, and the compression efficiency is very poor compared to the method of recompressing the stored image or audio information.
[0020]
Further, in the method of selecting information in real time at the time of recording as in the apparatus described in Japanese Patent Application Laid-Open No. 7-129187, for example, the person who has made the most speech in the conference is identified, and the speech of the identified person is identified. For example, to save only part of audio information or image information with high quality, or to create a digest by extracting in order from the scene with high importance so that the time length specified by the user is reached I can't. In other words, the device described in this publication cannot compress audio information or image information based on information obtained for the first time after recording of audio information or image information, or information that cannot be obtained while recording. There is a problem.
[0021]
Further, as described in Japanese Patent Laid-Open No. 7-129187, in order to record time series information just before the trigger is detected, the recording is performed for temporarily recording the input time series information. Since a buffer memory is required, there is a problem that the apparatus is complicated and expensive.
[0022]
In view of the above-described problems, the present invention is a storage medium having a limited storage capacity for only an audio signal or an image signal in an important period in which a characteristic event occurs among input audio or image signals. The purpose is to store a large number of audio signals or video signals other than the important part, and to store them for a long time with a small amount of data, and to ensure that the important part is reproduced from the beginning to the end. Yes.
[0023]
[Means for Solving the Problems]
In order to solve the above problems, in the information storage device according to the present invention,
Information input means for inputting audio information and / or image information to be stored;
The audio information and / or the image information input from the information input means Compress Time series information storage means for storing;
Condition matching section detection means for detecting a condition matching section in which the voice information stored in the time series information storage means matches a predetermined condition set in advance;
Time information indicating the time when the audio information and / or the image information is stored in the time-series information storage unit is stored, and an elapsed time from the time indicated by the stored time information is set in advance. Elapsed time measuring means for outputting a compression processing start instruction when the time is over,
The time series is started by the compression processing start instruction from the elapsed time measuring means, and the condition matching section detected by the condition matching section detecting means and another section are changed in compression rate or compression method. Stored in the information storage means The amount of compressed audio information and / or image information is Compression means for compressing;
It is characterized by providing.
[0024]
The condition coincidence detection operation in the condition coincidence section detecting means can be performed when the voice information is stored in the time series information storage means, or the time series information is compressed when the stored information in the time series information storage means is compressed. Audio information can also be read from the storage means.
[0025]
In the former case, as in the invention of claim 2, section information indicating the condition matching section detected by the condition matching section detecting means, and the time series of the audio information or the image information corresponding to the section information Correspondence relation storage means for storing correspondence relations with storage positions in the information storage means is provided.
[0026]
The compression process can be started at an arbitrary time after the audio information or the image information is stored in the time-series information storage means, but can also be automatically executed at a predetermined time. In that case, as in the invention of claim 12, there is provided time information storage means for storing time information indicating the time at which audio information or image information is stored in the time-series information storage means, and the compression means When the elapsed time from the time determined by the time information stored in the storage means becomes equal to or longer than a predetermined time, the compression process is executed.
[0027]
Further, the compression can be executed when the used capacity of the storage medium of the time-series information storage means exceeds a predetermined amount, or when the free capacity of the storage medium becomes less than the predetermined amount.
[0028]
[Action]
In the present invention having the above-described configuration, the audio information or the image information once stored in the time-series information storage unit is compressed at a later time to reduce the use capacity of the storage medium. Audio information or image information in a condition matching section that matches the predetermined condition is compressed with high quality. For this reason, it is possible to reduce the used capacity of the time-series information storage means while preserving important information with high quality.
[0029]
【Example】
Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment]
The first embodiment is a case where the information storage device according to the present invention is applied to conference recording.
[0030]
In general, it is very unlikely to refer to a meeting that took place a month ago compared to the possibility of referring to a meeting that took place a few days ago. Keeping conference information such as video information that is less likely to be referenced in high quality is extremely inefficient in terms of saving memory space, and information is stored at an appropriate timing. It is desirable to reduce the amount of information by performing deletion or thinning compression.
[0031]
However, even in the case of old conference records, there are demands to reproduce the movements made by the speaker while speaking (gestures, etc.), talking, and subtle nuances. Therefore, it is required that an audio signal or an image signal in an important period in which such a characteristic event occurs is stored in high quality.
[0032]
In the first embodiment, audio information and video information are recorded for a conference, and when one month has elapsed from the recording time, the conference that was actively discussed in the recorded conference video is important. A description will be given of an example in which compression processing is performed in which only the video of a part is left in high quality and the other part is compressed at a high compression rate.
[0033]
According to the first embodiment, as will be described later, when a video of a part that has been actively discussed is reproduced, a high-quality moving image with smooth motion is reproduced, and the other part is When it is played back, it becomes a so-called frame dropping and a moving image with awkward movement. However, since the insignificant part other than the scene in which discussions were actively exchanged can be compressed at a high compression rate, the amount of information to be stored and stored becomes very small.
[0034]
FIG. 2 shows a meeting scene in the case of this embodiment. A microphone 15 is provided for each of a plurality of conference participants 16, and the uttered voice of each conference participant 16 is the respective microphone 15. Sound is picked up. The video camera 17 captures a scene of discussion among a plurality of conference participants 16 and a paper document provided as conference material.
[0035]
Reference numeral 10 denotes an information storage device. In the case of this embodiment, the information storage device 10 includes an audio signal analyzer 11, a conference information storage processing device 12, a storage medium 13, and a playback unit 14.
[0036]
The conference information storage processing device 12 stores the image signal from the camera 17 that captures the conference scene and the audio signal from the microphone 15 in a storage medium 13 made of, for example, a disk or a semiconductor memory. The audio signal or the image signal stored in can be compressed. In this embodiment, in order to make the explanation easy to understand, the conference information storage processing device 12 compresses only the image information stored in the storage medium 13 and does not compress the audio information stored in the storage medium 13. To do. The conference information storage processing device 12 may be a personal computer.
[0037]
The conference information storage processing device 12 includes an audio input terminal and an image input terminal. In this embodiment, the speech signals of the speeches of a plurality of conference attendees 16 collected by the microphone 15 are once input to the speech signal analyzer 11, and the output of the speech signal analyzer 11 is stored as conference information. Input to the audio input terminal of the processing device 12.
[0038]
The audio signal analyzer 11 analyzes the audio signal input from the plurality of microphones 15, identifies which microphone the input audio signal is input from, and outputs the identification result together with the audio signal to the conference information storage processing device 12. Is output.
[0039]
Also, the paper document photographed by the camera 17 and the image signal of the meeting scenery are input to the image input terminal of the meeting information storage processing device 12.
[0040]
The conference information storage processing device 12 includes a user interface (not shown). In response to a playback request from the user via this user interface, an image based on the image signal stored in the storage medium 13 is displayed on the playback unit 14. In addition to being displayed on the screen, it also has a function of emitting reproduced sound based on the sound signal stored in the storage medium 13 from a speaker attached to the playback unit 14.
[0041]
When the conference information storage processing device 12 is configured by a personal computer, the conference audio information and image information are simultaneously shared between remote locations by connecting to the ISDN network via this personal computer, for example. It is also possible to realize an environment where a meeting is held in the same room.
[0042]
FIG. 1 is a block diagram showing the information storage device 10 of this embodiment, focusing on its functions. That is, the information storage device according to the present embodiment includes a voice information input unit 1, an image information input unit 2, a condition matching interval detection unit 3, a time series information storage unit 4, a correspondence relationship storage unit 5, The compression unit 6, the time information storage unit 7, the reproduction unit 8, and the control unit 9 are connected to each other. In this example, the voice information input unit 1 is also connected to the condition matching section detection unit 3. The control unit 9 controls the entire processing operation.
[0043]
Each part may be configured as a separate block, or one block may be configured to include several parts. Moreover, one part may be divided into several blocks and mounted.
[0044]
The voice information input unit 1 receives the voice signal from the microphone 15, converts it into a digital voice signal, sends it to the system bus, and sends it to the condition matching section detection unit 3.
[0045]
The image information input unit 2 receives an image signal from the video camera 17. If the image signal from the video camera 17 is a digital signal, it is received and sent to the system bus. If the input image signal is not a digital signal, the image information input unit 2 converts the input image signal into a digital image signal and outputs it to the system bus.
[0046]
The audio signal from the audio information input unit 1 is supplied to the condition matching section detection unit 3 as a digital signal. The condition matching section detection unit 3 monitors a voice signal input thereto, and detects a voice section that matches a predetermined condition. In this embodiment, a condition matching section is detected on the condition that there is an input of an audio signal of a predetermined level or higher and an active conversation pattern is detected from the input audio signal. Thereby, the section in which the conference participants actively exchanged discussions is detected as the condition matching section. In the condition matching section detection unit 3, the audio signal analyzer 11 and a part of the conference information accumulation processing unit 12 play a role.
[0047]
As a method for detecting the presence / absence of an audio signal of a predetermined level or higher, as shown in FIG. 3, the condition matching section detection unit 3 detects that the input audio level is higher than the predetermined level and detects a speaker. And a detection function for recognizing the end point of the speaker's speech by detecting that the voice level is below a predetermined threshold level.
[0048]
However, as shown in FIG. 3, if the speech level change point F101 itself at which the speech level intersects the threshold level is the speech start point or end point, the first part and the last part of the speech are included in the condition matching section. Therefore, when the speech level changes from the high level to the low level, the time point F100 before the change point F101 when the audio level changes from the low level to the high level is set as the speech start point, and the audio signal level changes from the high level to the low level. The point of time F102 after a certain time T2 from the change point F101 of the above is defined as the speech end point.
[0049]
In this embodiment, the sound level at a certain time is a value obtained by smoothing the sound level before and after the time, for example, an average value of instantaneous sound levels for 2 seconds before and after the certain time.
[0050]
In this embodiment, as shown in FIG. 2, a microphone 15 is installed for each speaker, and the voice signal analyzer 11 compares the voice input level from the microphone of each speaker. 11 specifies the speaker who has transmitted the input voice signal.
[0051]
In addition to this, as a method for specifying a speaker, a speaker may be specified from the characteristics of an audio signal (such as a voiceprint), or the speaker may be specified from a face or mouth movement based on image information. . In that case, it is not necessary to provide a plurality of microphones corresponding to all the meeting attendees, and one or a plurality of microphones smaller than the number of meeting attendees may be used. It is also possible to install a plurality of microphones, analyze the phase difference of the audio signals input from these microphones, detect the position of the sound source, and specify the speaker.
[0052]
The condition matching section detection unit 3 determines that the active conversation is performed as the time from when one speaker ends speaking until the other speaker starts speaking is shorter. Also, if another speaker starts speaking before one speaker finishes speaking, it is determined that an active conversation is taking place.
[0053]
FIG. 4 illustrates a process in which the condition matching section detection unit 3 recognizes a section where conversation is active. This figure shows a case where it is determined that an active conversation is taking place as the time from when one speaker finishes speaking until another speaker starts speaking is shorter. A speech signal of a predetermined level or more from each speaker is recognized as the speech section SP of the speaker, and the speech section SP is short among a plurality of speakers so as to be surrounded by a dotted circle in FIG. Patterns that change over time are detected as active conversation patterns.
[0054]
The condition matching section detection unit 3 detects the pattern in which the speaker is changed in such a short time in this manner, and the speaker within a predetermined set time after one speaker finishes speaking. Detect if has changed. For example, the set time is 0.5 seconds. This set time may be changed by the user.
[0055]
In this embodiment, a pattern in which speech sections SP partially overlap because another speaker starts speaking before one speaker finishes speaking is also detected as an early speaker change pattern.
[0056]
Then, the condition matching section detecting unit 3 recognizes a section where the conversation is active depending on whether the pattern of early speaker change has continued a predetermined number of times, for example, three times or more. For example, in the example shown in FIG. 4, since the pattern of early speaker change continues four times in the section PP, this section PP is detected as an active section of dialogue. That is, the beginning F200 of the speech section including the early speaker change pattern that continues four times is set as the starting point of the active conversation section, and the end F201 of the speech section including the early speaker change pattern is set as the active conversation. The end point of the section.
[0057]
The time-series information storage unit 4 is a storage unit that accumulates audio information and image information, and uses, for example, a disk storage medium or a semiconductor memory as a storage medium.
[0058]
The correspondence relationship storage unit 5 includes each of the active conversations detected by the condition matching interval detection unit 3, and voice information and images stored in the time-series information storage unit 4 corresponding to the respective active conversations. The information is stored in association with the storage address in the time-series information storage unit 4. The correspondence relationship storage unit 5 is also composed of, for example, a disk storage medium or a semiconductor memory.
[0059]
In this embodiment, the compression unit 6 performs data compression of the image information stored in the time series information storage unit 4. In this case, the compression unit 6 is configured to be able to dynamically change the data compression rate or the data compression method based on the information indicating the condition matching section from the correspondence relationship storage unit 5.
[0060]
In this embodiment, the compression unit 6 assumes moving image information, and handles the moving image information as a single processing unit with a predetermined time length or a predetermined number of frames. For example, compression processing is performed on a continuous 10-frame image sequence as one unit partial image sequence, but the image information of the sections other than the condition matching section leaves only the first frame in the 10 frames, The thinning compression process of discarding information of other frames is performed. On the other hand, in the condition matching section, the thinning process for image information is not performed, and all the 10 frames are stored.
[0061]
Therefore, when the image information of the sections other than the condition matching section is reproduced, the moving image is a so-called frame dropping and awkward movement, but the amount of information is very small. On the other hand, when the image information in the condition matching section is reproduced, a high-quality moving image with smooth movement is reproduced.
[0062]
The time information storage unit 7 is used to store the time when the input audio signal and image signal are recorded in the time-series information storage unit 4, and is configured by, for example, a disk storage medium or a semiconductor memory.
[0063]
Further, the time information storage unit 7 has a function of measuring an elapsed time from the recording start time. Therefore, the time information storage unit 7 is supplied with current time information from a clock circuit unit (not shown). In this embodiment, the time information storage unit 7 uses the compression unit 6 to store the image of the time series information storage unit 4 when the elapsed time from the recording start time is equal to or longer than a predetermined time. A compression trigger timing signal that triggers the start of compression of information as described above is output.
[0064]
The reproduction unit 8 is a functional unit that reproduces an audio signal and an image signal stored in the time-series information storage unit 4 by the monitor device 14 of FIG.
[0065]
The control unit 9 controls the overall processing operation of the information storage device 10.
[0066]
[Operation during recording]
Next, an operation during recording in the information storage device 10 having the above configuration will be described. FIG. 5 is a diagram for explaining the recording operation in this embodiment, together with the flow of various information and the flow of output of each unit at that time.
[0067]
When the conference starts and the audio signal from the microphone 15 and the image signal from the camera 17 are supplied to the information storage device 10, the audio signal and the image signal are sequentially stored in the time-series information storage unit 4. The audio signal is input to the condition matching section detection unit 3.
[0068]
As described above, the condition matching section detection unit 3 compares the voice level of the voice information from the microphone 15 with a predetermined threshold level, detects the speech start point and the speech end point of the conference attendee, Is set as a speaker's speech section SP. Then, a short turn or partial overlap between a plurality of conference attendees in the speech section SP is detected, and a section where conversation is active is detected as a condition matching section. Then, information on the start point and end point of the detected condition matching section is supplied to the correspondence storage unit 5.
[0069]
FIG. 6 is a flowchart for explaining the operation of the condition matching section detection unit 3.
[0070]
When the voice signal from the voice information input unit 1 is supplied as a digital signal to the condition matching section detection unit 3, in step S100, the above-described speech section SP is detected and the speaker is specified. As described above, the speaker identification method is performed by comparing the voice input level from the microphone installed for each speaker with the voice signal analyzer 11.
[0071]
After step S100, in step S101, when a pattern in which a speaker is changed in a short time including a partial overlap is recognized and an early speaker change pattern is detected, the process proceeds to step S102. It is determined whether or not the pattern has continued for a predetermined number of times. As described above, when an early speaker change pattern is detected three or more times in succession, a condition is set in advance so as to recognize a speech section including the pattern as an active conversation section. If it has been set, in the example of FIG. 4 described above, the section PP is detected as a section where the conversation is active, and the process proceeds to step S103.
[0072]
In step S103, the section detected as the section where the conversation is active is specified as the condition matching section. That is, in the example of FIG. 4, for example, the beginning of the section where the dialogue is active is the beginning F200 of the section PP, the end of the section where the conversation is active is the end F201 of the section PP, and the section PP is the section where the dialogue is active ( (Condition matching section). Note that the information specifying the condition matching section may be information on one of the beginning or end of the section and information on the length of the section.
[0073]
Subsequently, in step S104, the condition matching section specified in step S103 is output to the correspondence storage unit 5, and then the process returns to step S100 to start detection of a new condition matching section. In step S102, when it is recognized that the pattern of early speaker change is equal to or less than the predetermined number of times, the process returns to step S100 and detection of a new condition matching section is started.
[0074]
When the correspondence relationship storage unit 5 receives information specifying the condition matching section from the condition matching section detection unit 3 as described above, that is, in this example, information on the start and end of the condition matching section, it will be described below. In this way, these pieces of information are stored in association with the storage addresses of the time-series information storage unit 4 corresponding to the condition matching section.
[0075]
FIG. 7 is a flowchart for explaining the operation of the correspondence storage unit 5. In FIG. 7, the steps involved in the above-described recording operation are steps S200 to S203. Steps S204 and S205 are related to the operation at the time of compression described later.
[0076]
That is, at the time of recording, in step S200, it is detected whether or not the information indicating the condition matching section is input from the condition matching section detection unit 3. If the input of the condition matching section is not detected, The process returns to step S200 via S204, and the presence or absence of input of information indicating a condition matching section is detected.
[0077]
If the input of the condition matching section from the condition matching section detection unit 3 is detected in step S200, the process proceeds to step S201. In step S201, in order to obtain the storage address in the time series information storage unit 4 of the voice information or the image information stored in the time series information storage unit 4 corresponding to the condition matching section, the time series information storage unit 4, a storage address inquiry request is output together with information indicating the condition matching interval, and a response is awaited in step S202.
[0078]
When a response from the time series information storage unit 4 is returned, the process proceeds to step S203, where information specifying the condition matching section, voice information stored in the time series information storage unit 4 corresponding to the condition matching section, and The image information is stored in association with the storage address in the time-series information storage unit 4.
[0079]
After step S203, the process returns to step S200 via step S204, and the presence or absence of input of information indicating the next condition matching section is detected.
[0080]
FIG. 8 shows a storage address in the time-series information storage unit 4 of the condition matching section which is the detection result of the condition matching section detection unit 3 and the voice information and the image information input at the time specified by the condition matching section. FIG.
[0081]
As shown in FIG. 8, the start time F200 of the section where the conversation is active corresponds to the storage address F300 of the time-series information storage unit 4, and the end time F201 of the section where the dialog is active is the storage address F301. It corresponds to.
[0082]
The correspondence relationship storage unit 5 stores this correspondence relationship in the form of a table as shown in FIG. In this example, the start time F200 and the end time F201 are relative times starting from the storage start time.
[0083]
For example, as shown in the example of FIG. 9, the start of the active section of the dialog is detected at 210 seconds after the start of the storage of the voice information and the image information, and the active section of the dialog is also at the time of 240 seconds. If the end of is detected, audio information and image information stored between 210 seconds and 240 seconds after the start of the storage become corresponding storage information. In the example of FIG. 9, addresses 420 to 480 of the time-series information storage unit 4 are storage addresses for audio information and image information corresponding to this condition matching section.
[0084]
In FIG. 9, ID is an identifier for identifying each of the detected condition matching sections, and is a three-digit number in this example.
[0085]
Note that the storage format of the correspondence relationship storage unit 5 is not limited to the table format of the example of FIG. 9, and may be a list structure, a stack structure, or the like.
[0086]
Next, the operation of the corresponding time-series information storage unit 4 at this time will be described with reference to the flowchart of FIG.
[0087]
That is, FIG. 10 is a flowchart for explaining the operation at the time of recording of the time-series information storage unit 4. First, when this recording operation is started, the time-series information storage unit 4 outputs the recording start time of the audio information and the image information to the time information storage unit 7 for recording in step S300. Next, the process proceeds sequentially to step S301 and step S302, and the input image information and audio information are received and stored sequentially.
[0088]
Then, in the next step S303, it is determined whether or not a request for a storage address corresponding to the condition matching section has arrived from the correspondence storage unit, and when it is detected that the request has arrived, the process proceeds to step S304. In step S <b> 304, the storage address of the audio information and the image information corresponding to the condition matching section is returned to the correspondence storage unit 5.
[0089]
After it is determined in step S303 that a request for a storage address corresponding to the condition matching section has not arrived, and after step S304, the process returns to step S301 to continue storing image information and audio information.
[0090]
The time information storage unit 7 receives the information of the storage start time by the process in the step S300 of the time series information storage unit 4, and stores the storage start time.
[0091]
FIG. 11 is a flowchart for explaining the operation of the time information storage unit 7, and FIG. 12 is a diagram for explaining the storage structure of the time information storage unit 7. In FIG. 11, steps S400 and S401 are processing at the time of recording. In step S400, the storage start time of audio information and image information supplied from the time-series information storage unit 4 is detected, and this storage is performed in step S401. The start time is stored in the time information storage unit 7.
[0092]
As will be described later, the time information storage unit 7 is configured such that the elapsed time (that is, the information storage time) since the audio information and the image information are recorded in the time-series information storage unit 4 is equal to or longer than a predetermined time. A compression processing start instruction is output to the correspondence relationship storage unit 5. Steps S402 and S403 in FIG. 11 are processing units, and the compression start instruction process will be described later.
[0093]
The time information storage unit 7 manages the relationship between the name of the file storing the audio information and the image information and the storage start time using a table as shown in FIG. In this example, one conference record is recorded in one file. The file name is a file name given to each conference record, and the ID in FIG. 12 is an identifier (number in this example) for identifying each conference record file.
[0094]
The storage format of the storage start time is not limited to the table format, and may be a list structure or a stack structure. Further, information specifying the storage start time may be stored in a file or file name storing audio information and image information.
[0095]
Furthermore, the time to start the compression process may be automatically changed according to file attributes such as file size, file name, and file creator, and directory attributes such as directory creator and directory name.
[0096]
For example, as shown in FIG. 13, when the file size exceeds 5 Mbytes, the time to start the compression process is 1 month. When the file size is less than 5 Mbytes, the time to the compression process to be started is 2 months. To do. If the file extension is .AVI, the time to start compression processing is 1 month. If the file extension is .mpg, the time to start compression processing is 2 months. And In these cases, there is no need to specify the time until the compression process is started for each file, and there is an effect that the labor of the user can be saved.
[0097]
As described above, in this embodiment, when a conference is started and conference recording is started, the time at the start time is stored in the time information storage unit 7 and the conference start time (storage start). The image information and the sound information are stored in the time-series information storage unit 4 from the time point).
[0098]
Then, for the audio information in progress of the conference, the condition matching section detecting unit 3 detects the condition matching section as a section where the conversation is active, and the correspondence storage unit 5 corresponds to the information specifying the section. Stored in association with the storage address of the sequence information storage unit.
[0099]
[Operation during compression]
Next, the operation during compression will be described. In the first embodiment, the image information and / or audio information stored in the time-series information storage unit 4 is information-compressed so that the degree of importance is reduced when a predetermined period has elapsed since storage, and the time-series information is stored. An empty capacity is formed in the memory of the information storage unit 4, but the condition matching section is an important section, and this section is not compressed, or the compression rate is lowered to maintain high quality.
[0100]
FIG. 14 is a diagram for explaining the operation at the time of information compression in this embodiment, together with the flow of various information and the flow of output of each unit at that time.
[0101]
The time information storage unit 7 instructs the correspondence storage unit 5 to start the compression process when the elapsed time since the audio information and the image information are recorded in the time-series information storage unit 4 exceeds a predetermined time. Is output.
[0102]
That is, in step S402 of the processing routine of the time information storage unit 7 in FIG. 11, the current time supplied from a clock circuit unit (not shown) is compared with the storage start time stored in the time information storage unit 7, and the information It is determined whether or not a storage time of a predetermined time has passed. When it is determined that the predetermined time has elapsed, the process proceeds to step S403, and the correspondence storage unit 5 is requested to start the compression process.
[0103]
And after issuing this request | requirement or when it determines with predetermined time not having passed by step S402, it returns to step S400.
[0104]
For example, when the predetermined time is determined to be one month, the compression processing start request is generated one month after the storage start time, and the information newly accumulated in the time series information storage unit 4 is 1 After a month, compression will be applied. For example, the audio information and image information of the file name “file10” recorded at 13:30 on April 25, 1996 shown in FIG. 12 are subjected to the above-described compression processing at 13:30 on May 25, 1996. Will be given.
[0105]
In this example, the time until the compression processing is executed is fixedly given to the time information storage unit 7, but this time can be changed by the user. The timing of starting the compression process may be in the vicinity of the set time, and the compression process may be performed after the system enters the idling state. Further, the time until compression is performed may be set differently for each file.
[0106]
When a compression start instruction is input from the time information storage unit 7, the correspondence relationship storage unit 5 detects the input in step S <b> 204 in FIG. 7. If a compression start instruction is detected, the process proceeds to step S205, where each of the information specifying the condition matching section indicating the active section of the dialogue and the time series information storage unit corresponding to each condition matching section 4, the audio information and the image information stored in the time series information storage unit 4 are output to the compression unit 6. That is, the contents of one table in the table shown in FIG.
[0107]
Of course, a set of information specifying each condition matching section and a storage address corresponding to the condition matching section may be sequentially output to the compression unit 6 one by one. Further, in the file of the time-series information storage unit 4 storing the sound information and the image information, each piece of information for specifying the condition matching section and the sound stored in the file corresponding to each condition matching section are stored. You may comprise so that the storage address in this file of information and image information may be memorize | stored.
[0108]
The compression unit 6 that has received the input from the correspondence relationship storage unit 5 performs data compression of the image information stored in the time-series information storage unit 4. In this case, the compression unit 6 executes compression by dynamically changing the data compression rate or the data compression method based on the information indicating the condition matching section from the correspondence relationship storage unit 5.
[0109]
In the case of this embodiment, the condition matching section information is maintained at a high quality without being subjected to data compression, and data compression is performed for image information in sections other than the condition matching section. For this reason, as shown in FIG. 14, the compression unit 6 acquires a partial image sequence in a section other than the condition matching section from the time-series information storage unit 4, compresses the data, and compresses the compressed image sequence. Is written back to the time-series information storage unit 4.
[0110]
FIG. 15 is a flowchart for explaining the operation of the compression unit 6.
When receiving the compression start request from the correspondence storage unit 5, the compression unit 6 detects this in step S500, and proceeds to step S501. In step S501, information specifying each of the condition matching sections indicating active sections of the dialogue input from the correspondence storage unit 5 and stored in the time series information storage unit 4 corresponding to each condition matching section. The stored voice information and image information in the time-series information storage unit 4 are input and stored in a work memory (not shown) of the compression unit 6. The work memory uses, for example, a semiconductor memory as a storage medium.
[0111]
The compression unit 6 compresses the image information stored in the time series information storage unit 4 with reference to a plurality of sets of condition matching section and storage address information stored in the work memory.
[0112]
In step S502, the partial image sequence of 10 frames other than the condition matching section is sequentially read from the time series information storage unit 4 to the compression unit 6 as one unit partial image sequence. In this embodiment, in order not to compress the image information corresponding to the condition matching section, only the image information other than the condition matching section is read and compressed. Needless to say, when image information corresponding to a condition matching section is also compressed, it is necessary to read and compress the image information including the condition matching section.
[0113]
In step S503, frame thinning processing is performed in which only the first frame in this example of the 10-frame partial image sequence is left and the other 9 frames are deleted. In the next step S504, the compressed image sequence after the frame thinning is written back to the time-series information storage unit 4.
[0114]
In the next step S505, it is determined whether or not the compression process for the file in which the conference record is accumulated is completed. When the compression process for the entire file is completed, the process of the compression unit 6 is terminated. If there remains a portion to be compressed, the process returns to step S502 to repeat the compression process.
[0115]
FIG. 16 is a diagram for explaining the operation of the compression unit 6. In FIG. 16A, if the memory areas designated by the storage addresses of the time-series information storage unit 4 corresponding to the condition matching section are a3 to a6 and a11 to a12, the compression unit 6 compresses , Storage memory areas a1 to a2 and storage memory areas a7 to a10. Here, each of the storage memory areas a1, a2,... Has a capacity for a predetermined number of frames, for example, 10 frames.
[0116]
As described above, even after compression, the image sequences in the condition matching section accumulated in the storage memory areas a3 to a6 and the storage memory areas a11 to a12 are not changed compared to before compression.
[0117]
On the other hand, since the image sequences stored in the storage memory areas a1 to a2 and the storage memory areas a7 to a10 are subjected to frame thinning compression, the contents of the storage memory areas a1, a7, and a8 are compressed image sequences. Replaced by Then, as the amount of information decreases, free memory areas a2, a9, and a10 are generated in the storage medium of the time-series information storage unit 4 as shown in FIG. 16B.
[0118]
If it is desirable that the time series data is continuously stored in the storage medium, the generated empty memory portion is filled with the previous and next time series data, for example, so that there is no gap in the memory. Can be eliminated.
[0119]
Next, an operation when reproducing recorded or compressed audio information or image information in the first embodiment will be described. As described above, this reproduction is performed in the reproduction unit 8 under the control of the control unit 9, and is executed based on a reproduction instruction through a user interface (not shown).
[0120]
When playing back recorded audio information or image information, it is often necessary to change the playback speed or to play back slowly by rewinding a little, so the fast forward function, rewind function, slow playback function, pause function, The conference information storage processing device 13 is provided. In addition, a slide bar corresponding to the time axis is provided on the screen of the display monitor, and a pointer indicating the current playback time is displayed on the slide bar, or the playback position is set by sliding the bar by a user input operation. It can be specified.
[0121]
In addition, regarding the speed to be reproduced, it is not always necessary to reproduce it according to the recorded time information, it is possible to reproduce by increasing the speed while keeping only the recorded order relation, and only the condition matching section is reproduced. You can also search for automatic playback. For example, the section from time F200 to F201 in FIG. 8 is played back at the same speed as the stored speed, and the other sections are played back at double speed.
[0122]
In addition, information indicating the condition matching section may be displayed together on the slide bar on the time axis so that the condition matching section can be seen at a glance. Further, the detection result detected by the condition matching section such as the name of the speaker detected by the condition matching section or the face photograph of the speaker may be displayed on the slide bar on the time axis.
[0123]
In the first embodiment described above, when a predetermined number or more of early speaker change patterns are continuously detected, both ends of the speech section including the patterns are set as both ends of the condition matching section. However, a point in time before the start point of the speech segment including the early speaker change pattern may be set as the start of the condition matching interval, or a speech segment before the predetermined number of speech segments including the early speaker change pattern It is good also as a condition coincidence section including.
[0124]
Also, the end point of the condition matching section may be a time point after a predetermined time after the end time of the speech section including the early speaker change pattern, or the speech section after the predetermined number of the speech sections including the early speaker change pattern may be selected. It may be included as a condition matching section.
[0125]
In addition, a single voice signal such as “door closing sound” can be detected by the condition matching section detection unit 3. In this case, a time point that is a predetermined time before the point in time when the single sound signal is detected is detected as the start point of the condition matching section, and a time point that is a predetermined time after the point in time when the single sound signal is detected is the condition match. It is configured to detect as the end point of the section.
[0126]
Further, in this embodiment, the case where the condition matching section detection unit 3 detects an active dialogue pattern from the input voice signal has been described, but other features such as a laughter pattern and a clapping pattern are also included. It is also possible to register various voice patterns, recognize these patterns from the input voice signal, and detect a section including these patterns as a condition matching section. In this case, the condition matching section detection unit 3 includes pattern recognition means for performing pattern recognition using a known pattern recognition technique, for example, a technique for analyzing the temporal transition of the power or frequency component of an audio signal. Provided.
[0127]
In this embodiment, the compression unit 6 is configured to perform thinning-out compression of the image. However, the configuration of the compression unit 6 includes a storage time, a compression rate of intra-frame compression, and a frame when compressing image information. Any device that dynamically changes at least one of the compression rate of intermittent compression, the time interval of intermittent recording, the color information thinning rate, the luminance information thinning rate, and the like may be used.
[0128]
In particular, methods for compressing moving image information include intra-frame compression methods and inter-frame compression methods, and intra-frame compression methods include a method using vector quantization and a method using discrete cosine transform. There is. As a compression method between frames, there is a method of recording only a difference between image information of previous and subsequent frames. That is, any device that converts the information amount per unit time into a smaller information amount corresponds to the compression unit 6 in the present invention.
[0129]
In the first embodiment, image information of sections other than the condition matching section is configured to be stored as a frame dropping video with a small amount of information. You may make it delete the image information or audio | voice information of an area from a storage medium.
[0130]
Further, the information on the section of the condition matching section and the information on the section other than the condition matching section may be stored separately in different storage media. For example, when recording information, the information in the condition matching section and the information in the section other than the condition matching section are stored on the same magnetic disk, and only the information in the condition matching section is stored when the information is compressed. It is configured to leave the information on the magnetic disk and move the information of the sections other than the condition matching section to the magneto-optical disk or the magnetic tape. In general, magneto-optical disks and magnetic tapes have a feature that a large amount of information can be stored at a low cost although the access speed to information is slower than that of magnetic disks. This is suitable for storing information of sections other than the condition matching section.
[0131]
Furthermore, in this embodiment, the audio information is not compressed. However, when the audio information is compressed, at least one of the storage time, the sampling frequency, and the number of encoded bits is set when the audio signal is compressed. It should be changed dynamically.
[0132]
Note that the time-series information input to the apparatus according to the above-described embodiment may be an analog signal input from a camera / microphone / video deck / tape recorder / sensor, or a digital signal obtained by encoding the analog signal. Further, it may be a digital signal input through a computer network / computer bus. That is, any information that is sequentially input with the passage of time corresponds to time-series information in the present invention.
In the first embodiment described above, the condition matching section detection unit 3 detects a condition matching section from the voice information, and based on the detection result, the image information of the condition matching section has a higher image quality than the other sections. So that the compression rate is dynamically changed and the image information stored in the time-series information storage unit 4 is compressed so that a characteristic phenomenon occurs in the image information. Many important portions can be stored in a limited storage medium, and even image information other than the important portions can be stored for a long time with a small amount of data.
Further, since the presence / absence of the input audio signal or the audio signal level is detected by the condition matching interval detection unit 3, the audio or image information of the interval in which the audio is being emitted is recorded from the beginning to the end. There is an effect that it can be stored with a small amount of data even if it is voice or image information of a section that can be stored with high image quality and no voice is emitted.
Also, since the condition matching section detection unit 3 detects the input voice sender or the change of the sender, the voice or image information of a specific sender can be obtained from the beginning to the end with high sound quality / high image quality. In addition, there is an effect that even a voice or image information of other callers can be stored with a small amount of data.
[0133]
[Second Embodiment]
Also in the second embodiment, in the same manner as described above, the following description will be made assuming that only the image is the compression target in order to simplify the description.
[0134]
In the second embodiment, when the input image information is stored in the time-series information storage unit 4, it is stored for each frequency band, such as a low frequency band and a high frequency band, and a time information storage unit is stored. When the compression start instruction is received from 7, the image information is compressed by deleting the high frequency band of the image. This second embodiment uses the fact that the high frequency band of an image is a component related to so-called image detail, and even if this is deleted, there is little influence on grasping the basic image content. It is.
[0135]
In the first embodiment, as shown in FIG. 14, the partial image sequence in the time-series information storage unit 4 is read out, subjected to compression processing by the compression unit 6, and then written to the time-series information storage unit 4 again. However, in the case of the second embodiment, the partial image sequence is read from the time-series information storage unit 4, is subjected to image compression processing, and is written back to the time-series information storage unit 4. Since it is not necessary, the load on the system during the compression process can be reduced.
[0136]
In the second embodiment, when the input image information is stored for each frequency band, the condition matching section and the sections other than the condition matching section are stored in the time-series information storage unit 4. It is configured to change the way of dividing the frequency band.
[0137]
Specifically, only the image information of the sections other than the condition matching section is stored for each frequency band, and the image information of the condition matching section is stored by a normal method that does not store by frequency band.
[0138]
FIG. 17 is a diagram for explaining the recording operation in the second embodiment, together with the flow of various information and the flow of output of each unit. The configuration of the second embodiment is different from the configuration of FIG. 1 and FIG. 5 described in the first embodiment in that a frequency band-specific image generation unit 21 is added as a component. .
[0139]
In this example, the frequency band-specific image generation unit 21 includes a high-pass filter and a low-pass filter. In the case of the second embodiment, the condition matching section detection unit 3 detects the condition matching section from the input voice information in the same manner as in the first embodiment, and the condition matching section is detected. Information to be specified is supplied to the correspondence storage unit 5 and is also supplied to the frequency band-specific image generation unit 21.
[0140]
The frequency band-specific image generation unit 21 receives the information for specifying the condition matching section from the condition matching section detection unit 3 and outputs to the time-series information storage unit 4 in the condition matching section and other sections. Change the signal.
[0141]
FIG. 18 is a flowchart for explaining processing in the frequency band-specific image generation unit 21 in the second embodiment.
[0142]
As shown in FIG. 18, the image generating unit 21 classified by frequency band receives input of audio information or image information in step S600. Then, in the next step S601, information specifying the condition matching section from the condition matching section detection unit 3, that is, in this example, information at the beginning and end of the condition matching section, It is determined whether it is within the condition matching section.
[0143]
Regarding the image information of the section determined to be within the condition matching section, the process proceeds from step 601 to step S603, and the image information is output to the time-series information storage unit 4 as it is, and the input image information is divided into normal, frequency bands. It is made to memorize | store in the time-sequential information storage part 4 with the storage format which cannot be divided.
[0144]
On the other hand, if it is determined in step S601 that it is a section other than the condition matching section, the process proceeds to step S602, where the input image information is divided into high frequency band information and low frequency band information. A process of generating different image information is executed. The generated frequency band-specific image information is output to the time-series information storage unit 4 and stored in step S603. Thereafter, steps S600 to S603 are repeated.
[0145]
FIG. 19 is a diagram for explaining the storage state of the time-series information storage unit 4 before compression of image information in the second embodiment. As shown in FIG. 19, the time-series information storage unit 4 in the case of the second embodiment includes a memory unit 4Ma that stores image information of a condition matching section, and image information of sections other than the condition matching section. A memory unit 4Mb for storage. These memory units 4Ma and 4Mb may be separate storage media, or may be obtained by dividing the memory area of one storage medium.
[0146]
In the time series information storage unit 4, the memory unit 4 Ma that stores the image information of the condition matching section stores the image information without dividing the frequency band. The memory unit 4Mb that stores the image information of the section other than the condition matching section is further divided into a high-frequency storage memory and a low-frequency storage memory. Frequency band components are stored in association with each other. That is, in FIG. 19, the storage contents of the areas a1 to a6 of the high band storage memory and the storage contents of the areas a1 to a4 of the low band storage memory are the high frequency component of the image signal and the low frequency of the same section. Ingredients are shown. The time-series information storage unit 4 also manages the correspondence between the frequency band components.
[0147]
In FIG. 19, “·” is given in each of the memory areas a 1, a 2,... Indicates that an image sequence is stored, and a memory area without “·” is an empty memory area. I mean.
[0148]
Also in the second embodiment, the time information storage unit 7 monitors the storage period of the stored contents of the time series information storage unit 4, and when the storage period has passed a predetermined period such as one month, for example, The time information storage unit 7 outputs a compression processing start instruction and causes image information compression to be executed.
[0149]
FIG. 20 is a diagram for explaining the operation at the time of information compression in the second embodiment, together with the flow of various information and the flow of output of each unit at that time. FIG. 21 is a flowchart for explaining the compression processing in the second embodiment.
[0150]
That is, in the case of this embodiment, the compression start instruction from the time information storage unit 7 is directly supplied to the compression unit 6. Then, as shown in the flowchart of FIG. 21, the compression unit 6 receives this compression start instruction, and the elapsed time since the audio information and the image information are recorded in the time-series information storage unit 4, that is, the information storage time. If it is detected in step S700 that the predetermined time has elapsed, the process proceeds to step S701. In step S701, an instruction to perform processing for deleting the high frequency band is sent from the memory storing the sections other than the condition matching section to the time-series information storage unit 4.
[0151]
In this example, the time-series information storage unit 4 receives the high frequency component deletion instruction from the compression unit 6 and stores the image information of the section other than the condition matching section in the high frequency storage memory of the memory unit 4Mb. Delete all contents.
[0152]
FIG. 22 is a diagram illustrating the storage state of the time-series information storage unit 4 after image information compression. When the storage state before the compression process is as shown in FIG. 19 described above, the time-series information storage unit 4 receives the high frequency component deletion instruction from the compression unit 6 and the time series information storage unit 4 All of the image information is deleted from the areas a1 to a6 of the high frequency band storage memory for storing the high frequency band image components in the memory unit 4Mb for storing the image information of the other sections. As a result, in the time-series information storage unit 4, areas a1 to a6 indicated by halftone dots in FIG. 22 are generated as free memory areas.
[0153]
The generated free memory area may be reused as a storage memory area for storing a section other than the condition matching section, or may be used as a storage memory area for storing the condition matching section.
[0154]
After performing compression processing by the above-described processing, if the video of the part that was actively discussed is played back, the high-quality video with smooth motion is played back, and the other parts are played back. In this case, the video is a so-called low-quality video and low-quality video. However, since only the insignificant parts other than the scenes that have been actively discussed can be compressed with a high compression rate, the amount of information to be stored is very small.
[0155]
In the above description of the second embodiment, the method of storing input image information for each frequency band has been described. For example, as described in Japanese Patent Laid-Open No. 6-178250, input image information is time-sequentially described. When accumulating in the information storage unit 4, the image signal is divided into a luminance signal component and a color signal component such as a color difference signal or a carrier color signal (color subcarrier signal) and stored in separate areas, When a compression start instruction is generated from the information storage unit 7, only the color signal component may be deleted. In this case, since it is not necessary to read out or write back the partial image sequence from the time-series information storage unit 3, the system load during the compression process can be reduced, and the speed of the compression process can be increased.
[0156]
Further, when the input image information is accumulated in the time-series information storage unit 4 for each frequency band, the image information of the condition matching section and the low frequency band component of the image information of the section other than the condition matching section are continuously stored in the storage medium. The high frequency band component of the image information in the section other than the condition matching section may be stored in another area of the storage medium. In this case, even if the high frequency band component of the image information in the section other than the condition matching section is deleted during compression, the time-series data after compression continues in the storage medium, so that the reproduction speed is prevented from being lowered. Can do.
[0157]
Furthermore, in the second embodiment, the audio information is not compressed, but the audio information can be compressed in the same manner. For example, as described in Japanese Patent Application Laid-Open No. 7-15519, when the input voice information is stored in the time-series information storage unit 4, it is stored for each frequency band, and the compression unit 6 is connected to the time information storage unit 7. When a compression start instruction is received from, a high frequency band of audio may be deleted. In this case as well, it may be configured to preferentially delete the high frequency band component of the audio information in the section other than the condition matching section.
In the second embodiment described above, the input audio information or the input image information is stored for each frequency band when accumulating in the time-series information storage means, and the high frequency band is deleted during compression. This eliminates the need to read information from the time-series information storage means for writing or to write back information to the time-series information storage means, and thus has an effect of reducing the load on the system during the compression process.
Further, when the input audio information or the input image information is stored in the time-series information storage unit by frequency band, the frequency band is divided into the condition matching section detected by the condition matching section detection unit and the section other than the condition matching section. In the case of storing differently, only the image information of the section other than the condition matching section is stored for each frequency band, and the image information of the condition matching section is stored in a normal method (not stored for each frequency band). As described above, since it is possible to reduce the processing for dividing the input voice information or the input image information by frequency band, there is an effect that the load on the system can be reduced.
[0158]
[Third Embodiment]
In the first embodiment and the second embodiment, the elapsed time since the audio information or the image information was recorded in the time-series information storage unit 4 has become a predetermined time or more, for example, one month. In this case, the example in which the compression process is performed only once has been described.
[0159]
However, there is a case where the storage medium can be saved more effectively by performing the compression in a plurality of steps in stages rather than performing the compression process only once. For example, when recording a meeting, it is less likely to refer to a meeting that took place a month ago than to a meeting that took place a week ago, and similarly, Compared to the possibility of referring to a meeting held later in the year, the possibility of referring to a meeting held six months ago is low. As described above, when the possibility of being referred to later becomes lower, it is possible to effectively save the storage medium by storing the information with a smaller amount of information.
[0160]
In the third embodiment, an example will be described in which the compression rate or the compression method is changed in accordance with the elapsed time since the image information is recorded in the time-series information storage unit 4, and the information is compressed stepwise. . However, even in the case of an old conference recording video, since it is necessary to store the image signal in high quality for important scenes, it is the same as in the first embodiment or the second embodiment. In this example, in the recorded conference video, only the video of the important part of the conference that has been actively discussed remains in high quality, and the other part is compressed at a high compression rate.
[0161]
Also in the third embodiment, the input image information is stored for each frequency band when the input image information is stored in the time-series information storage unit 4. For this reason, as in the second embodiment, an image generating unit 21 for each frequency band is provided. In the case of the third embodiment, the frequency band includes a high frequency band and a medium frequency band. Then, the image information is divided into three bands, a low frequency band, and stored in the time-series information storage unit 4. The frequency band-specific image generation unit 21 in this case includes a high-pass high-pass filter, a mid-band pass-pass filter, and a low-pass low-pass filter.
[0162]
Further, in the third embodiment, the image signal is stored with the frequency band divided into three bands without distinction between the condition matching section and the section other than the condition matching section.
[0163]
FIG. 23 is a diagram for explaining the storage state of the time-series information storage unit 4 when image information is recorded (before image information compression). That is, in this example, each of the memory unit 4Ma that stores the image information of the condition matching section of the time-series information storage unit 4 and the memory unit 4Mb that stores the image information of the section other than the condition matching section are as shown in FIG. In addition, a high-frequency storage memory, a mid-frequency storage memory, and a low-frequency storage memory are stored, and the high-frequency component, the mid-frequency component, and the low-frequency component of the image information of the corresponding section are stored in each memory region It is what is done.
[0164]
Also in the third embodiment, the time information storage unit 7 monitors the passage of time from the time of storage, and outputs a compression start instruction to the compression unit 6 when a predetermined time has elapsed. However, the compression start instruction is output at each of a plurality of preset elapsed times, for example, one week later, one month later, and six months later. At this time, each compression start instruction is added to the compression unit 6 by adding data indicating which compression start instruction is at which time point and which frequency band component is to be compressed.
[0165]
FIG. 24 is a diagram illustrating an example of a compressed time management table stored in the time information storage unit 7. As shown in FIG. 24, for example, the image data to be erased first is a high frequency band portion of a section other than the condition matching section, and is erased when one week elapses after the information is recorded. In addition, when one month has passed since the information was recorded, the middle frequency band portion other than the condition matching section and the high frequency band portion of the condition matching section are deleted. In addition, when half a year has passed after recording the information, the low frequency band portion of the section other than the condition matching section and the middle frequency band section of the condition matching section are deleted. It should be noted that the low frequency band portion of the condition matching section is not automatically deleted unless an explicit deletion instruction is given from the user.
[0166]
When the compression unit 6 receives the compression start instruction from the time information storage unit 7, the compression unit 6 analyzes the content of the instruction, and determines the content of any storage memory to the time-series information storage unit 4 based on the analysis result. Issue compression instructions for deleting. The time series information storage unit 4 executes stepwise compression processing of image information in response to the compression instruction. More specifically, the contents of each storage memory are erased according to the erase time of the table shown in FIG.
[0167]
25, 26, and 27 show time-series information storage units when one week, one month, and half a year have passed since the image information is recorded as shown in FIG. 23 before image information compression. FIG. In FIG. 23, FIG. 25, FIG. 26, and FIG. 27, “.” Is given in each of the memory areas a1, a2,... Indicates that an image sequence is stored. A memory area without "" means an empty memory area.
[0168]
That is, when one week has elapsed, as shown in FIG. 25, the time-series information storage unit 4 erases all the contents of the high-frequency storage memory of the memory unit 4Mb that stores image information of sections other than the condition matching section. The free memory area.
[0169]
In addition, when one month has elapsed, as shown in FIG. 26, the contents of the high-frequency storage memory of the memory unit 4Ma that stores the image information of the condition matching section and the image information of the sections other than the condition matching section are stored. All the contents of the middle area storage memory of the section 4Mb are erased.
[0170]
Furthermore, when half a year has elapsed, as shown in FIG. 27, the memory unit 4Ma that stores the image information of the condition matching section and the memory unit that stores the contents of the middle area memory of the memory 4Ma and the image information of the sections other than the condition matching section All contents of the 4Mb low-frequency storage memory are erased. As a result, only the content of the low-frequency storage memory of the memory unit 4Ma that stores the image information of the condition matching section remains as the storage content of the time-series information storage unit 4.
[0171]
In this way, as shown in FIGS. 25 to 27, the time-series information storage unit 4 accumulates image information with a smaller amount of information as time elapses.
[0172]
In the above-described example of the third embodiment, a table is used for managing the compression time of the time information storage unit 7. Of course, a list or stack structure may be used instead of the management table. Absent.
[0173]
Furthermore, in the time information storage unit 7, instead of managing the compression time and the compression target in a table or list, the compression rate of information at an arbitrary time is calculated by mathematical calculation using the information storage time as a parameter, Information regarding the compression rate may be sent to the compression unit 6 to perform information compression.
[0174]
For example, when y is an information amount retention rate (%) and x is time (elapsed days), the time information storage unit 7
y = 90exp (-Ax) +10 (1)
Where A is a constant and A> 0.
The information amount retention rate at a specific time is obtained by the above-described arithmetic expression (1), and information on the information amount retention rate is supplied to the compression unit 6 as information on the compression rate. Here, the information amount retention rate refers to the ratio of the information amount at a specific time to the information amount when the information is first recorded.
[0175]
The compression unit 6 sets a compression rate based on the information amount retention rate from the time information storage unit 7 and compresses the image information accumulated in the time-series information storage unit 4 with the compression rate. In this case, the time information storage unit 7 repeatedly generates a compression start instruction in the cycle so as to execute recompression step by step in a certain cycle.
[0176]
As in the third embodiment described above, the compression processing is performed when the elapsed time (that is, the information storage time) from when the audio information or the image information is recorded in the time-series information storage means becomes a predetermined time or more. If it is configured so as to start, recent audio or image information that is likely to be referenced can be stored with high sound quality / high image quality, and even audio or image information recorded in the past is small There is an effect that can be memorized by the data amount.
In addition, when the audio information or the image information is configured to be compressed in stages according to the elapsed time (that is, the information storage time) since it was recorded in the time-series information storage unit, it is referred to later. Since the information that is less likely to be stored can be stored with a smaller amount of information, the storage medium can be more effectively saved.
Further, since the compression amount or the compression method can be set so that the data amount of the audio information or the image information fits in a predetermined storage capacity, the compressed data is included in the input image information. There is an effect that a digest of a desired storage size is obtained, in which only the important part is stored with high image quality.
In addition, although this 3rd Embodiment was demonstrated as a deformation | transformation of 2nd Embodiment, of course, it can also implement as a deformation | transformation of 1st Embodiment.
[0177]
[Fourth Embodiment]
In the fourth embodiment, the detection condition in the condition matching section detection unit 3 is that a pre-registered keyword appears in an input audio signal or a pre-registered voice pattern appears. This is the case.
[0178]
First, a case will be described in which the detection condition in the condition matching section detection unit 3 is that a keyword registered in advance appears in the input audio signal.
[0179]
In this case, the condition matching section detection unit 3 compares the speech recognition means, the memory for storing the registered keyword, the speech recognition result and the keyword registered in advance in the memory, and detects the match between them. Detecting means. A keyword is registered in advance in the memory by a user.
[0180]
At the time of information recording, the condition matching section detection unit 3 sequentially converts the input voice signal into character string information by the voice recognition means, and extracts words from the character string information by performing morphological analysis. Then, the extracted word / phrase is compared with character string keywords registered in advance in the memory, such as “homework”, “action item”, “task”, “conclusion”, “decision”, “important”, and “summary”.
[0181]
When the word / phrase extracted from the input speech signal matches any of the character string keywords registered in advance, the time point when the character string keyword is detected becomes the start point of the condition matching section.
[0182]
In order to determine the end point of the condition matching section, the condition matching section detection unit 3 in this example uses an image signal for how long the image signal is detected for each keyword character string after the keyword is detected. 28 is set in a table as shown in FIG. 28. The keyword validity period is assigned to a longer time for an important keyword.
[0183]
When the image information stored in the time-series information storage unit 4 is compressed after a predetermined time as described above, the image information of the section from the start point to the end point of the condition matching section is high-quality. The other section is compressed at a high compression rate. As the compression method, any one of the first to third embodiments can be adopted.
[0184]
Further, if the importance level can be set for each keyword character string, the image signal can be compressed at different compression rates depending on the importance level of each keyword character string.
[0185]
Next, a case will be described in which the condition matching section detection unit 3 detects a condition matching section on the assumption that a pre-registered voice pattern appears in the input voice signal.
[0186]
Even when it is difficult to detect keywords by voice recognition, these patterns may be recognized if they are characteristic voice signal patterns such as laughter patterns, applause patterns, and active conversation patterns. . Therefore, the condition matching section detection unit 3 also detects the appearance of this characteristic voice pattern as a detection condition.
[0187]
In this case, the condition matching section detection unit 3 is provided with a memory in which characteristic voice signal patterns such as a laughter pattern, a clap pattern, and an active conversation pattern are registered and stored in advance. Then, there is provided pattern recognition means for performing pattern recognition using a known pattern recognition technique, for example, a technique for analyzing the temporal transition of the power or frequency component of an audio signal.
[0188]
Compare the pattern of the characteristic audio signal registered in advance with the pattern of the audio signal extracted from the sequentially input audio signal, and recognize the characteristic pattern from the coincidence or similarity To do. In order to increase the recognition rate of pattern recognition, a voice pattern may be registered for each speaker.
[0189]
When it is determined that the pattern of the audio signal extracted from the input audio signal matches one of the characteristic audio signal patterns registered in advance, the detection time of the audio signal pattern is the start of the condition matching section It becomes a point.
[0190]
In addition, in order to determine the end point of the condition matching section, the condition matching section detection unit 3 in this example increases the image signal for how long from the time when the pattern is detected for each audio signal pattern. The audio signal pattern effective period for determining whether to save the image quality is set in a table as shown in FIG. 29, and the image information of the section from the start point of the condition matching section to the end point is stored with high image quality. It is determined that it should be information.
[0191]
When the image information stored in the time series information storage unit 4 is compressed after a predetermined time as described above, the image information of the section from the start point to the end point of the condition matching section is high-quality. The image information of other sections is compressed so as to greatly reduce the amount of information. As the compression method, any one of the first to third embodiments can be adopted.
[0192]
In this example, the time point when it is determined that the sound signal pattern extracted from the input sound signal matches one of the pre-registered characteristic sound signal patterns is determined as the start point of the condition matching section. However, it is possible to store the image information including the image information before the time when the pattern of the audio signal is detected with high image quality. In such a case, for example, there is usually a cause for the appearance of the pattern before the time when the pattern of laughter or applause appears. You can make it save with image quality.
[0193]
In this case, by setting the time point a predetermined time before the time point when the characteristic audio signal pattern appears as the start point of the condition matching section, the event causing the pattern appearance can be stored with high image quality. The other sections are configured to be compressed at a high compression rate.
As described above, according to the fourth embodiment, it is configured so that the condition matching section detecting means detects that a keyword or pattern registered in advance appears in the input voice information. Voice or image information stored during a period in which keywords or patterns registered in advance frequently appear can be stored with high sound quality / high image quality from the beginning to the end. There is an effect that data can be stored with a small amount of data.
[0194]
[Fifth Embodiment]
In the fifth embodiment, the condition matching section detection unit 3 detects a predetermined state change by an external sensor. That is, in this embodiment, when a condition matching section is detected on the condition that it is difficult to detect the condition matching section from the audio signal, a state change occurs in information not included in the input audio signal. An external sensor is provided in order to detect a condition matching section on the condition.
[0195]
In this embodiment described below, a case where an external sensor detects a location will be described. In other words, in the following example, importance levels are given according to the conference rooms, such as boardrooms, reception rooms, and general meeting rooms. Like that.
[0196]
The information on which conference room the meeting was held at where the audio signal or the image signal was input is obtained by analyzing position information output from a position measuring device such as a GPS (Global Positioning System). can get. When GPS is used, the latitude / longitude of the place where the audio signal or the image signal is input is measured, and the information is compared with the latitude / longitude stored in advance where each conference room exists. The conference room to which the audio signal or the image signal is input can be specified.
[0197]
In addition to GPS, the infrared transmission / reception system described in Japanese Patent Application Laid-Open No. 7-141389, in which an infrared transmitter that oscillates a bit pattern specific to each location, is installed in an arbitrary location such as a conference room or hallway. It can also be used. In this case, when an audio signal or an image signal is input, a bit pattern oscillated by a nearby infrared transmitter is received, and a conference room is identified from the pattern.
[0198]
In the example described below, a case where an infrared transmission / reception system is used will be described. In this case, the condition matching section detection unit 3 includes an infrared signal recognition means, a memory that stores the registered location name, a location name determined from the result of recognizing the infrared signal, and a location name registered in the memory in advance. And a place coincidence detecting means for detecting coincidence of the two. A user registers a location name in advance in the memory.
[0199]
At the time of recording information, the condition matching section detection unit 3 converts the input infrared signal into a place name by the infrared signal recognition means. Then, the converted place name is compared with a place name registered in advance in the memory. Then, when detecting the place, the condition matching section detection unit 3 detects the beginning of the period recognized as staying at the same place as the start point of the condition matching section, and recognizes that it remains at the same place. The end of the set period is detected as the end point of the condition matching section.
[0200]
The correspondence storage unit 5 stores the start point and end point of the section and the place name as information for specifying the condition matching section. A corresponding identifier may be stored instead of the place name. Further, the correspondence relationship storage unit 5 stores each condition matching section and the storage addresses of the audio signal and the image signal stored in the time series information storage unit 4 in association with the section.
[0201]
In this example, the time information storage unit 7 outputs a compression start instruction to the correspondence relationship storage unit 5 when the storage retention period is equal to or longer than a predetermined period. In response to this compression start instruction, the correspondence relationship storage unit 5 sends information on the section and a place name or a place identifier to the compression unit 6 as information for specifying the condition matching section.
[0202]
The compressing unit 6 includes a table that stores a conference room name (location name) registered in advance and an importance level of each conference room in association with each other. FIG. 30 is an example of this table. The compression unit 6 refers to this table using the place name or identifier from the correspondence relationship storage unit 5 and detects the meeting room name in the condition matching section. Then, the importance assigned to the conference room name is extracted, and the image signal in the corresponding condition matching section is compressed at a compression rate corresponding to the importance. In other words, information recorded at a location of higher importance is stored with higher quality during compression.
[0203]
In this way, for example, the recorded video of an important meeting held in a boardroom meeting room is stored for a longer time with higher image quality than the meeting video held in another meeting room. Can do.
[0204]
Although the case where a place is detected by an external sensor has been described above, the sensor may determine a person. For example, a weak wireless transmitter is attached to a conference attendant and a wireless receiver is attached to a conference room. A period during which a conference attendee enters the conference room is detected by the wireless receiver, and only this period is stored with high image quality.
[0205]
Furthermore, if the weak wireless transmitter transmits a different signal for each attendee of the conference so that it is possible to identify who has entered the room, only the period during which a specific person has entered the room can be increased. It can also be configured to save with image quality.
[0206]
In addition, not only the physical location and the name of the person, but also from the event obtained by combining the detection results of multiple sensors such as "I was attending a certain meeting" and "I was with a certain person" A section may be specified.
[0207]
Furthermore, when a single sensor input signal (trigger) such as “opening / closing of a door” is detected by the condition matching section detection unit 3, a time point a predetermined time before the trigger is detected is set as a condition matching section. It is configured to detect as a start point, and to detect a time point after a predetermined time after the trigger is detected as an end point of the condition matching section. In order to detect opening / closing of the door, it can be implemented by attaching an opening / closing detection sensor to the door.
As described above, in the case of this fifth embodiment, when the place where the audio signal or the image signal is input is detected by the condition matching section detecting means, an important meeting is designated as a specific meeting room. For example, if the sound or image of an important event taken at an important place can be saved with high sound quality / high image quality, and the sound or image information is taken at any other place, There is an effect that data can be stored with a small amount of data.
In addition, when configured to detect a specific person by an external sensor, the sound or image information of the specific person can be stored from the beginning to the end with high sound quality / high image quality, and the sound of other persons or Even image information can be stored with a small amount of data.
[0208]
[Sixth Embodiment]
In the sixth embodiment, the condition matching section detection unit 3 detects the motion of the video camera 17 (hereinafter referred to as camera work).
[0209]
For example, when a person is photographed with zoom-in, important images are often taken, and the audio signal or image signal during the period when the camera 17 is zoomed in has high sound quality / high image quality. I often want to remember. Therefore, in the example described below, a section that is captured at the same magnification is detected as a condition matching section together with the magnification. Then, assuming that the condition matching section with a higher magnification is the more important image, the importance is determined, and subsequent information compression is performed so that the quality becomes high according to the magnification. Thereby, the image of the section where the camera 17 is zoomed up with high magnification is maintained in high quality.
[0210]
An example in the case of this sixth embodiment will be described below.
[0211]
In this example, the video camera 17 can set three magnification modes of 1 ×, 5 ×, and 10 × as the camera magnification, and outputs information indicating the magnification as camera operation information in accordance with the operation of the zoom ring. To do. This camera operation information is supplied to the condition matching section detection unit 3. As described above, the condition coincidence section detection unit 3 detects, from this camera operation information, a section with the same camera magnification as the condition coincidence section, together with the magnification.
[0212]
That is, when detecting the camera work, the condition matching section detecting unit 3 detects the time of the change point of the magnification of the camera operation signal as the start point of the condition matching section, and then the magnification of the camera operation signal is changed. The time is detected as the end point of the condition matching section. Therefore, the end point of this condition matching section is the same time as the start point of the next condition matching section. The information on the condition matching section and the magnification information are stored in the correspondence storage unit 5 in association with the storage addresses of the time-series information storage unit for the image information and the voice information of the condition matching section.
[0213]
FIG. 31 is a diagram illustrating the relationship between the camera magnification and the condition matching section. In FIG. 31, T0, T1, T2, and T3 are condition matching sections. Sections T0 and T3 where the magnification is 1 are normal magnification sections where the zoom ring is not operated. In the example of FIG. 31, the zoom ring is operated at time t1, and the zoom-in operation starts. In the beginning section T1, the magnification is 5 times, the magnification is increased to 10 times at time t2, and the magnification is increased at time t3. The zoom-in operation ends when the magnification becomes 1.
[0214]
In this embodiment, the compression unit 6 sets the image thinning compression rate in each magnification mode to 1 frame / second for three magnification modes of 1 ×, 5 ×, and 10 × of the camera magnification. 5 frames / second and 10 frames / second are set.
[0215]
Similar to the above-described embodiment, when the elapsed time since the image information is recorded in the time-series information storage unit 4 becomes a predetermined time or more, the correspondence relationship storage unit 5 compresses the time information from the time information storage unit. A start instruction is given and executed. At this time, the correspondence relationship storage unit 5 sends a pair to the compression unit 6 for each condition matching section information, magnification, and storage address of the time-series information storage unit. In this case, the compression unit 6 compresses the section T1 in FIG. 23 at 5 frames / second, the section T2 at 10 frames / second, and the other sections T0 and T3 at 1 frame / second. To do.
[0216]
As described above, in the case of this embodiment, information is stored by changing the compression rate of the image signal of the important scene and the image signal of the unimportant scene according to the camera work or the change of the camera work. Can do.
When the camera operation signal or the change in the camera operation signal is detected by the condition matching section detection unit 3, the camera is zoomed in when an important sound or image is shot up by the camera. There is an effect that the sound or image of the period can be stored with high sound quality / high image quality, and even the sound or image information of the other period can be stored with a small amount of data.
[0217]
The condition matching section detection unit 3 is not limited to detecting camera work or its change from the operation information of the camera, but can also detect from the image signal from the camera.
[0218]
As camera work that can be detected from the image signal from the camera, there are panning, tilting, zooming, booming, trimming, draining, cutting start, cutting end, etc., when detecting these camera work, An input image signal is detected by image recognition. Of course, these camera works may also be detected by detecting operation signals such as buttons used for camera operation as described in JP-A-6-165209 and JP-A-7-245754.
[0219]
[Seventh Embodiment]
In the seventh embodiment, the compression rate or the compression method is changed based on the reference state whether or not the audio information or the image information stored in the time-series information storage unit 4 is referenced (accessed) by the user. A case where information is compressed will be described.
[0220]
In general, frequently referenced information is important information, so image information of frequently referred sections is stored with high image quality, and image information that is less frequently referenced is compressed with a high compression rate, Try to save with a small amount of information.
[0221]
A reference state indicating how often the audio information or image information stored in the storage medium is accessed by the user is stored, and the compression rate is changed based on the reference state. For this purpose, the seventh embodiment includes a reference state storage unit for storing the reference frequency from the user.
[0222]
In this embodiment, the reference state storage unit stores the section of the image information stored in the time-series information storage unit 4 that has been reproduced by the user and the number of times the image has been reproduced. Store as a reference state.
[0223]
FIG. 32 is a diagram for explaining the storage state of the time-series information storage unit 4. In the figure, sections T2, T4, and T6 are condition matching sections detected by the condition matching section detection unit 3 of some embodiments described above. That is, for example, in the case of the first embodiment, it is an active section of dialogue. The other sections T1, T3, T5, and T7 are sections other than the condition matching section. Information regarding these sections is stored in the correspondence storage unit 5 as described above.
[0224]
FIG. 33 is a diagram illustrating an example of the storage state of the reference state storage unit. The reference state storage unit stores how many times the image information of the sections T1 to T7 has been accessed from the time when the image information is stored in the time-series information storage unit 4 until the present time, that is, how many times the video of the section is reproduced. I remember what was done.
[0225]
When an elapsed time since the image information is recorded in the time-series information storage unit 4 (that is, information storage time) is equal to or longer than a predetermined time, when a compression start instruction is generated from the time information storage unit 7, The reference state storage unit sends the information of the reference count table in FIG. 33 to the compression unit 6. The compression unit 6 includes a compression rate setting table shown in FIG. This compression rate setting table is a correspondence table of compression rates set in each of the condition matching section and sections other than the condition matching section with respect to the reference count.
[0226]
The compression unit 6 refers to the compression rate setting table and determines the image compression rate for each section. Then, the compression unit 6 compresses the image information stored in the time series information storage unit 4 at this compression rate.
[0227]
For example, the section T1 in FIG. 32 is a section other than the condition matching section, and the number of times of reference is 0 as shown in the table of FIG. 33. Therefore, based on the compression rate setting table of FIG. The compression rate is set to 90%. That is, since the section T1 is a section other than the condition matching section and has not been accessed by the user, it can be understood that the section T1 is an unimportant section. Therefore, at the time of compression, it is compressed at a high compression rate of 90%.
[0228]
On the other hand, a section T6 in FIG. 32 is a condition matching section, and the number of times of reference is five as shown in the table of FIG. 33. Therefore, based on the compression ratio setting table of FIG. Set to 10%. That is, the section T6 is a condition matching section and is a section that has been accessed five times by the user, and thus can be regarded as a very important section. Therefore, at the time of compression, image information is stored with high image quality with little compression.
[0229]
In the seventh embodiment, the reference state storage unit stores the relationship between the section and the reference count in the form of a table, but of course, it may be stored in another form such as a list or a stack. The numerical values in the compression rate setting table may be set by the user.
Then, as in the seventh embodiment described above, based on the reference state of how often the audio information or image information stored in the time-series information storage unit 3 is referenced (accessed) by the user, When information is compressed by changing the compression amount or compression method, frequently referenced information is important information, so the audio information or image information in the frequently referenced section is of high quality. The audio information or the image information that has been stored in and referenced less frequently can be compressed with a high compression rate and stored with a small amount of information.
[0230]
[Eighth Embodiment]
In the first, second, third, fourth, fifth, sixth, and seventh embodiments described above, the elapsed time since the audio information or the image information was recorded in the time-series information storage unit 4. The compression process is started when the information storage time (ie, the information storage time) is equal to or longer than a predetermined time. In the eighth embodiment, the free area in the time-series information storage unit 4 falls below a certain value. The compression process is started at the timing when it is recognized that the storage amount has been reached, or when the storage amount in the time-series information storage unit 4 is recognized to be greater than or equal to a certain value.
[0231]
Therefore, the processing operation at the time of storage is the same as that of the above-described embodiments, but the operation at the time of information compression is different.
[0232]
FIG. 35 is a diagram for explaining the operation at the time of information compression in the eighth embodiment, together with the flow of various information and the flow of output of each unit at that time. The information storage device of this embodiment includes a storage amount detection unit 31. When the storage amount detection unit 31 detects that image information has been recorded in excess of a pre-registered storage capacity, A compression processing start instruction is output to the correspondence relationship storage unit 5. The operation after this compression processing start instruction can be performed in the same manner as in the above-described embodiments.
[0233]
FIG. 36 is a flowchart of the process of the storage amount detection unit 31 in the eighth embodiment. If it is detected in step S800 that the information storage amount exceeds the predetermined amount, the process proceeds to step S801, and a compression processing start instruction is output to the correspondence relationship storage unit 5. For example, if the storage amount detection unit 31 is set to execute the compression process when information is recorded exceeding 90% of the storage capacity of the storage medium, the storage amount is stored. When 90% of the medium is reached, the storage amount detection unit 31 outputs a compression processing start instruction.
[0234]
The correspondence storage unit 5 that has received the compression start instruction from the storage amount detection unit 31 includes the condition matching sections and the image information stored in the time-series information storage unit 4 corresponding to each condition matching section. The storage address in the time series information storage unit 4 is output to the compression unit 6. Similarly to the above, the compression unit 6 performs data compression of the image information stored in the time-series information storage unit 4. Of course, in this case, the compression process may be executed in the background while recording a new image signal.
[0235]
In the case of this embodiment, the compression rate or the compression method may be set so that the data amount of the image information fits in a predetermined storage capacity. For example, when information is recorded to exceed 90% of the storage capacity of the storage medium, the use amount of the storage medium is reduced to 30% by the compression process. From this set value, the compression rate of the condition matching section and sections other than the condition matching section are calculated.
[0236]
For example, it is assumed that 10,000 frames of uncompressed images are accumulated in the time-series information storage unit 4. The 10,000 frames are divided into 2,000 frames for the condition matching section and 8000 frames for sections other than the condition matching section.
[0237]
At this time, a case where the frame thinning compression processing is performed so as to reduce the image information to 3000 frames will be described. Further, as a condition, it is assumed that the ratio between the compression rate of the condition matching section and the compression ratio of the sections other than the condition matching section is set to 1:10 in advance.
[0238]
In this case, if the compression rate of the condition matching section is a, the compression ratio of the sections other than the condition matching section is 10a.
[0239]
2000a + 8000 × 10a = 3000
Since the compression rate a that satisfies the condition is 0.0366, the compression rate of the condition matching section and the compression rate of the sections other than the condition matching section are 3.66% and 36.6%, respectively.
[0240]
If the uncompressed image 10000 frames stored in the time-series information storage unit 4 are divided into a condition matching section and a section other than the condition matching section, and frame compression is performed at each compression rate, the desired 3000 frames It is possible to reduce the image information.
As described above, according to the eighth embodiment, when audio or image information that is sequentially input exceeds the storage capacity of the storage medium, there is a value in which there is a free area in the time-series information storage unit. Since the compression process is started at the timing when it is recognized as below or when the amount of storage in the time-series information storage means is recognized as being greater than or equal to a certain value, the newly input voice or image Even when information is input exceeding the storage capacity of the time-series information storage means, there is an effect that input can be continued.
[0241]
[Ninth Embodiment]
As a known technique, there is known an apparatus that selects information at the time of recording, records only information recognized as important, or changes the compression rate. For example, Japanese Patent Application Laid-Open No. 7-129187 describes an apparatus that records audio before and after a voice capture key is pressed for a predetermined time. In addition, some commercially available tape recorders have a silent section detection function in which silent sections do not store voice.
[0242]
However, in the method of selecting information at the time of recording as in the apparatus described in Japanese Patent Application Laid-Open No. 7-129187, for example, the person who makes the most speech in the conference is specified, and the voice of the specified person's speech is specified. It is not possible to store only information or image information with high quality, or to create a digest by extracting in order from scenes with high importance so as to have a time length specified by the user. That is, there is a problem that it is impossible to compress audio information or image information based on information obtained for the first time after recording of audio information or image information, or information that cannot be obtained while recording.
[0243]
In this embodiment, a case where a compression method and a compression rate are set based on information obtained for the first time after recording of audio information or image information will be described.
[0244]
For example, in a meeting scene, a speaker's utterance continues for a long time, a scene in which communication items are being communicated, a scene in which a group of opinions are expressed, or a summary of discussions. It is often a scene where important remarks are stated. Therefore, after shooting a single meeting, the highest importance is assigned in order from the scene with the longest speech time, and when the information is compressed, the speech part to which the high importance is assigned is stored with high sound quality / high image quality, and the importance is low. The other parts are compressed at a high compression rate.
[0245]
As another example, for example, it is stored how long a pre-registered voice keyword has been used in a meeting, and a higher importance is assigned in order from a keyword with a longer usage time. It may be. For example, in a conference scene, discussions that have been discussed for a long time are often important discussions. Therefore, a keyword that can estimate the content of the discussion is registered in advance, and the keyword is detected from the input voice signal.
[0246]
Then, by detecting that a specific keyword has been used for a long time, it is recognized that the discussion corresponding to the keyword has been made for a long time, and this keyword appearance section is regarded as an important part. At the time of information compression, a section assigned a high importance is stored with high sound quality / high image quality, and other sections with a low importance are compressed with a high compression rate.
As in the seventh embodiment described above, the importance level of the audio information or the image information is determined by combining the detection results detected by the condition matching section detection unit 3, and the condition matching section and other information are determined based on this importance level. When the compression amount or compression method is changed for each section and data compression of audio information or image information is performed, the compression rate or intermittent recording according to a complex event that combines various events There is an effect that audio or image information can be stored at time intervals.
[0247]
[Tenth embodiment]
In the first to ninth embodiments described above, a condition matching section is detected by the condition matching section detection unit 3 when information is input, and the detection result and each condition matching section are supported. The audio information or the image information stored in the time-series information storage unit 4 is associated with the storage position in the correspondence storage unit 5 and the compression unit 6 is based on the correspondence information stored in the correspondence storage unit 5. The audio information or the image information stored in the time series information storage unit 4 is compressed.
[0248]
In the tenth embodiment, the condition matching section is not detected when information is input, and the condition matching section is detected by the condition matching section detection unit 3 when information is compressed. In the case of this embodiment, the correspondence storage unit 5 is not necessary.
[0249]
FIG. 37 is a diagram for explaining the recording operation in the tenth embodiment together with the flow of various information and the flow of output of each unit at that time. At the time of storage in the case of the tenth embodiment, input voice information and image information are sequentially stored in the time-series information storage unit 4. The time series information storage unit 4 stores the storage start time in the time information storage unit 7.
[0250]
Comparing FIG. 3 for explaining the operation at the time of storage in the first embodiment and FIG. 37 for explaining the operation at the time of storage in the tenth embodiment, in FIG. In addition, since the condition matching section detection process is not performed at the time of storage, the process is greatly simplified.
[0251]
FIG. 38 is a flowchart of processing during recording in the tenth embodiment. First, in step S900, the recording start time is stored in the time information storage unit 7, and the process proceeds to step S901. In step S901, audio information and image information are input by the audio information input unit 1 and the image information input unit 2, and in step S902, the input audio information and image information are stored in the time-series information storage unit 4. . Then, steps S901 and S02 are repeated.
[0252]
FIG. 39 is a diagram for explaining the operation at the time of compression in the tenth embodiment, together with the flow of various information and the flow of output of each unit at that time. In the tenth embodiment, for example, when a predetermined time elapses from the time of storage and a compression processing start instruction is generated from the time information storage unit 7, the condition matching section detection unit 3 receives audio from the time series information storage unit 4. Information is read and the condition matching section as described above is detected. In this embodiment, as in the first embodiment, a section in which dialogue is active is detected as a condition matching section. Then, the condition matching section detection unit 3 sends information on the detected condition matching section to the time series information storage unit 4 and also to the compression unit 6.
[0253]
The time-series information storage unit 4 receives the condition-matching section information from the condition-matching section detection unit 3 and the time information of the audio information or the image information stored in the time-series information storage unit 4 corresponding to the condition matching section. The storage address in the series information storage unit 4 is calculated. Then, the storage address is output to the compression unit 6.
[0254]
The compressing unit 6 receives the condition matching interval information from the condition matching interval detection unit 3 and the storage address from the time series information storage unit 4, and based on these input information, other than the condition matching interval and the condition matching interval The compression rate of the image information in the section is determined. That is, the compression rate of the image information in the condition matching section is determined to be low and the compression ratio of the image information in the section other than the condition matching section is determined to be high. For example, the condition matching section is not compressed, and the compression unit 6 receives the partial image sequence of the section other than the condition matching section from the time-series information storage unit 4 and compresses it to 1/10 as in the first embodiment. To do. Then, the compressed partial image sequence is written back to the time-series information storage unit 4.
[0255]
FIG. 40 is a flowchart of processing during compression in the case of the tenth embodiment. The processing shown in this flowchart corresponds to the processing content executed by the conference information accumulation processing unit 12 in FIG.
[0256]
That is, in the tenth embodiment, it is detected in step S1000 that the elapsed time since the audio information and the image information are recorded in the time-series information storage unit 4 has become a predetermined time or more. This is processing performed in the time information storage unit 7 in the functional block of FIG. When the information storage period exceeds a predetermined time, the process proceeds to step S1001, where a compression processing start instruction is generated and supplied to the condition matching section detection unit 3 as described above.
[0257]
In step S1002, the condition matching section detection unit 3 that has received the compression processing start instruction detects a condition matching section. In step S1003, it is determined whether a condition matching section is detected, and the determination result is sent to the time-series information storage unit 4. When the condition matching section detection unit 3 detects a condition matching section, the process proceeds to step S1004, and compression is performed at a low compression rate. As in the first embodiment, the information on the condition matching section in the time-series information storage unit 4 may not be compressed.
[0258]
On the other hand, when it is determined in step S1003 that it is a section other than the condition matching section, the process proceeds to step S1005, and the image information of the section other than the condition matching section stored in the time series information storage unit 4 is read, and the high compression rate For example, compression by thinning is performed to leave only the first frame of 10 frames.
[0259]
After steps S1004 and S1005, the process proceeds to step S1006, and the compressed image information is written back to the time-series information storage unit 4. Then, the process proceeds to step S1007, where it is determined whether or not the compression process has been completed for all the information stored in the time-series information storage unit 4. If not completed, the process returns to step S1003, and this step S1003 and subsequent steps are repeated. If the compression process has been completed for all information, this compression process routine is terminated.
[0260]
[Other variations]
In the first to tenth embodiments described above, in addition to the aspects specified in the description of each embodiment, the information storage device of the present invention is appropriately combined with each embodiment. It is also possible to configure.
[0261]
For example, the stepwise compression method described in the third embodiment is used in combination with the reference state storage method described in the seventh embodiment, so that the image is stepwise according to the reference state of the image information. The compression rate of information can be increased.
[0262]
Moreover, although 3rd Embodiment was demonstrated as a deformation | transformation of 2nd Embodiment, you may implement as a deformation | transformation of 1st Embodiment.
[0263]
Furthermore, the storage amount detection unit described in the eighth embodiment is combined with the first embodiment to the seventh embodiment, and the ninth embodiment to the tenth embodiment. Can be implemented.
[0264]
Further, the embodiments from the first embodiment to the ninth embodiment can be configured not to include the correspondence storage unit as described in the tenth embodiment.
[0265]
【The invention's effect】
As described above, according to the information storage device of the first aspect of the present invention, it is possible to store a large number of important portions of audio information or image information in which characteristic events occur in a limited storage medium. Moreover, there is an effect that even audio information or image information other than the important part can be stored for a long time with a small amount of data.
[0266]
In the case of the invention of claim 2, the condition matching section detection operation at the time of information compression becomes unnecessary, and there is an effect that the load on the system when compressing audio information or image information can be reduced.
[0267]
Further, in the case of the invention of claim 3, even when an event in which it is difficult to detect a change in state of an audio signal occurs or when a state change occurs in information not included in the input audio signal, A large number of important parts of information or image information in which characteristic events occur can be stored in a limited storage medium, and audio data or image information other than important parts can be stored in a small amount of data. It has the effect of memorizing for a long time.
[0268]
Further, in the case of the invention of claim 4, the condition matching section detection operation at the time of information compression becomes unnecessary, and there is an effect that it is possible to reduce the load on the system when compressing audio information or image information.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing the entirety of a first embodiment of an information storage device according to the present invention;
FIG. 2 is a diagram illustrating an outline of a system to which an embodiment of an information storage device according to the present invention is applied.
FIG. 3 is a diagram for explaining the sound level detection operation of the condition matching section detection unit according to the first embodiment.
FIG. 4 is a diagram for explaining an operation of detecting an active conversation section by the condition matching section detection unit according to the first embodiment.
FIG. 5 is a diagram showing a flow of operation during information recording according to the first embodiment.
FIG. 6 is a flowchart of the operation of a condition matching section detection unit in the first embodiment.
FIG. 7 is a flowchart of the operation of a correspondence relationship storage unit in the first embodiment.
FIG. 8 is a diagram for explaining a correspondence relationship between a detection result of a condition matching section detection unit and a memory address of a time-series information storage unit in the first embodiment.
FIG. 9 is a table for managing the correspondence between the detection result of the condition matching section detection unit and the memory address of the time-series information storage unit in the first embodiment.
FIG. 10 is a flowchart of the operation of the time-series information storage unit in the first embodiment.
FIG. 11 is a flowchart of the operation of a time information storage unit in the first embodiment.
FIG. 12 is a diagram illustrating a storage structure of a time information storage unit in the first embodiment.
FIG. 13 is a diagram illustrating another example of the storage structure of the time information storage unit in the first embodiment.
FIG. 14 is a diagram illustrating a flow of operation during information compression according to the first embodiment.
FIG. 15 is a flowchart of the operation of the compression unit in the first embodiment.
FIG. 16 is a diagram for comparing and explaining storage states of a time-series information storage unit before compression and after compression in the first embodiment.
FIG. 17 is a diagram illustrating a flow of operations during information recording according to the second embodiment.
FIG. 18 is a flowchart of processing performed by a frequency band-specific image generation unit according to the second embodiment.
FIG. 19 is a diagram illustrating a memory storage state before compression processing of a time-series information storage unit in the second embodiment.
FIG. 20 is a diagram illustrating an operation flow when compressing information according to the second embodiment;
FIG. 21 is a flowchart of compression processing in the second embodiment.
FIG. 22 is a diagram illustrating a memory storage state after compression processing in a time-series information storage unit according to the second embodiment.
FIG. 23 is a diagram for explaining a memory storage state at the time of information recording in the time-series information storage unit in the third embodiment;
FIG. 24 is a compression time management table for managing the time for executing stepwise compression in the third embodiment.
FIG. 25 is a diagram illustrating a memory storage state after one week has elapsed in the time-series information storage unit according to the third embodiment.
FIG. 26 is a diagram illustrating a memory storage state after one month has elapsed in the time-series information storage unit according to the third embodiment.
FIG. 27 is a diagram for explaining a memory storage state after six months in the time-series information storage unit in the third embodiment;
FIG. 28 is a table for managing keyword validity periods when it is detected that a keyword registered in advance in an audio signal appears in the fourth embodiment.
FIG. 29 is a table for managing a pattern valid period when it is detected that an audio signal pattern registered in advance in an audio signal appears in the fourth embodiment.
FIG. 30 is a table for managing the location and the importance of the location in association with each other when the condition matching section detection unit detects the location in the fifth embodiment.
FIG. 31 is a diagram illustrating compression rate setting processing when the condition matching section detection unit detects camera work in the sixth embodiment.
FIG. 32 is a diagram illustrating a condition matching section of a time-series information storage unit in the seventh embodiment.
FIG. 33 is a table for managing the storage state of the reference state storage unit in the seventh embodiment;
FIG. 34 is a table for managing the storage state of the compression rate setting table in the seventh embodiment.
FIG. 35 is a diagram illustrating a flow of operation during information compression according to the eighth embodiment.
FIG. 36 is a flowchart of processing of a storage amount detection unit in the eighth embodiment.
FIG. 37 is a diagram showing a flow of operations at the time of information recording in the tenth embodiment.
FIG. 38 is a flowchart of information recording processing in the tenth embodiment;
FIG. 39 is a diagram showing a flow of operations during information compression in the tenth embodiment.
FIG. 40 is a flowchart of information compression processing in the tenth embodiment.
[Explanation of symbols]
1 Voice information input section
2 Image information input section
3 Condition matching section detector
4 Time series information storage
5 Correspondence storage unit
6 Compression unit
7 Time information storage
8 Playback section
9 Control unit
10 Information storage device
11 Audio signal analyzer
12 Conference information storage processing device
13 Storage media
14 Monitor device
15 Microphone
16 meeting participants
21 Image generator by frequency band
31 Memory detection unit

Claims

Information input means for inputting audio information and / or image information to be stored;
Time-series information storage means for compressing and storing the audio information and / or the image information input from the information input means;
Condition matching section detection means for detecting a condition matching section in which the voice information stored in the time series information storage means matches a predetermined condition set in advance;
Time information indicating the time when the audio information and / or the image information is stored in the time-series information storage unit is stored, and an elapsed time from the time indicated by the stored time information is set in advance. Elapsed time measuring means for outputting a compression processing start instruction when the time is over,
The time series is started by the compression processing start instruction from the elapsed time measuring means, and the condition matching section detected by the condition matching section detecting means and another section are changed in compression rate or compression method. Compression means for recompressing the data amount of the compressed audio information and / or image information stored in the information storage means;
An information storage device comprising:

Information input means for inputting audio information and / or image information to be stored;
Time-series information storage means for compressing and storing the audio information and / or the image information input from the information input means;
Condition matching section detection means for detecting a condition matching section in which the voice information input from the information input means matches a predetermined condition set in advance;
The correspondence relationship between the section information indicating the condition matching section detected by the condition matching section detection means and the storage position of the audio information and / or the image information corresponding to the section information in the time-series information storage means. Correspondence storage means for storing;
Time information indicating the time when the audio information and / or the image information is stored in the time-series information storage unit is stored, and an elapsed time from the time indicated by the stored time information is set in advance. Elapsed time measuring means for outputting a compression processing start instruction when the time is over,
Based on the correspondence between the section information stored in the correspondence storage means and the storage position in the time series information storage means, activated by the compression processing start instruction from the elapsed time measurement means, the condition matching section And the other section, the compression rate or the compression method is changed, and the data amount of the compressed audio information and / or image information stored in the time series information storage unit is recompressed,
An information storage device comprising:

Information input means for inputting audio information and / or image information to be stored;
Sensor information detecting means for detecting information from an external sensor;
With compression and storing the voice information and / or the image information inputted from said information input means, and the time-series data storing means for storing the sensor information from the sensor information detecting means,
Condition matching section detection means for detecting a condition matching section where the sensor information stored in the time series information storage means matches a predetermined condition set in advance;
Time information indicating the time when the audio information and / or the image information is stored in the time-series information storage unit is stored, and an elapsed time from the time indicated by the stored time information is set in advance. Elapsed time measuring means for outputting a compression processing start instruction when the time is over,
The time series is started by the compression processing start instruction from the elapsed time measuring means, and the condition matching section detected by the condition matching section detecting means and another section are changed in compression rate or compression method. stored in the information storage means, the data amount of the audio information is the compressed and / or image information, a compression means for recompressing,
An information storage device comprising:

Information input means for inputting audio information and / or image information to be stored;
Time-series information storage means for compressing and storing the audio information and / or the image information input from the information input means;
Sensor information detecting means for detecting information from an external sensor;
Condition matching section detection means for detecting a condition matching section in which sensor information from the sensor information detection means matches a predetermined condition set in advance;
The correspondence relationship between the section information indicating the condition matching section detected by the condition matching section detection means and the storage position of the audio information and / or the image information corresponding to the section information in the time-series information storage means. Correspondence storage means for storing;
Time information indicating the time when the audio information and / or the image information is stored in the time-series information storage unit is stored, and an elapsed time from the time indicated by the stored time information is set in advance. Elapsed time measuring means for outputting a compression processing start instruction when the time is over,
Based on the correspondence between the section information stored in the correspondence storage means and the storage position in the time series information storage means, activated by the compression processing start instruction from the elapsed time measurement means, the condition matching section Compression means for recompressing the data amount of the compressed audio information and / or image information stored in the time-series information storage means by changing the compression rate or the compression method in the other sections When,
An information storage device comprising:

In claim 1, claim 2, claim 3, or claim 4,
The information storage is characterized in that the image information of the condition matching section stored in the time series information storage means is subjected to data compression that maintains higher quality than the image information of other sections by the compression means. apparatus.

In claim 1, claim 2, or claim 5,
The condition matching section detecting means compares a voice signal level of the voice information with a predetermined threshold, and detects a start point or an end point of the condition matching section based on the comparison result. Information storage device.

In claim 1, claim 2, or claim 5,
The condition matching section detecting means detects a specific sender of voice or a change of sender in the voice information, and detects a start point or an end point of the condition matching section based on the detection result. Characteristic information storage device.

In claim 1, claim 2, or claim 5,
The condition matching section detecting means detects a specific keyword or a specific pattern predetermined in the voice information, and detects a start point or an end point of the condition matching section based on the detection result. Information storage device.

In claim 1, claim 2, or claim 5,
The sensor information detection means detects information related to a location where the audio information and / or the image information is input or a location where the sensor information is detected,
The image forming apparatus according to claim 1, wherein the condition matching section detecting unit detects a start point or an end point of the condition matching section based on the sensor information.

In claim 3, claim 4, or claim 5,
The sensor information detection means detects a specific person by the external sensor,
The condition matching section detecting means detects a start point or an end point of the condition matching section based on the sensor information.

In claim 3, claim 4, or claim 5,
The sensor information detecting means detects a camera operation signal or a change in the camera operation signal,
The condition matching section detecting means detects a start point or an end point of the condition matching section based on the sensor information.

In claim 1, claim 2, claim 3, claim 4 or claim 5,
When the compression means is recognized that the free space in the time series information storage means has become a certain value or less, or when it is recognized that the storage amount in the time series information storage means has become a certain value or more the information storage apparatus characterized by performing the recompression process in the compression means.

In claim 2, claim 4, or claim 5,
The compression unit is configured to store the audio information in the condition matching section or other sections stored in the correspondence storage unit so that the data amount of the audio information and / or the image information is within a predetermined storage capacity. And / or setting a compression rate or compression method of the image information.

In claim 1, claim 2, claim 4, or claim 5,
The time series information storage means stores the audio information and / or the image information input from the information input means for each frequency band,
The compression means, at the time of compression, at least a high frequency band of a section other than the condition matching section of the audio information and / or the image information stored in the time series information storage means. An information storage device for erasing from a storage medium.

In claim 2, claim 4, or claim 5,
The time series information storage means, when storing the audio information and / or the image information input from the information input means for each frequency band, a condition matching section detected by the condition matching section detection means, and a condition matching Change the way of dividing the frequency band in the sections other than the section and store it,
The compression means deletes at least a high frequency band in a section other than the condition matching section of the audio information and / or the image information stored in the time-series information storage means at the time of compression. Storage device.

In claim 1, claim 2, claim 3, claim 4 or claim 5,
The time-series information storage means stores the image information input from the information input means separately into a luminance signal and a color signal,
The information storage device, wherein the compression unit deletes at least the color information in a section other than the condition matching section in the image information stored in the time-series information storage unit during compression.

In claim 1, claim 2, claim 3, claim 4 or claim 5,
A reference state storage means for storing information that distinguishes a section that has been referred to by a user a predetermined number of times within a predetermined time and a section that has not been referred to by the user more than the predetermined number of times within the predetermined time;
Based on the information stored in the reference state storage unit, the compression unit is referred to by the user a predetermined number of times within the predetermined time, and has not been referred to by the user the predetermined number of times within the predetermined time. The information is characterized in that the data amount of the compressed audio information and / or image information stored in the time-series information storage means is recompressed by changing the compression rate or the compression method for each section. Storage device.

In claim 1, claim 2, claim 3, claim 4 or claim 5,
The compression means determines the importance of the audio information and / or the image information by combining the detection results detected by the condition matching section detection means, and based on this importance, the detection by the condition matching section detection means The amount of compressed audio information and / or image information stored in the time-series information storage means is changed by changing the compression amount or compression method between the interval determined from the result and another interval. An information storage device characterized by compressing .

A time-series information storage step of compressing voice information and / or image information input through the information input means and storing the compressed information in the time-series information storage means;
A condition matching section detecting step for detecting a section in which the voice information stored in the time series information storing step matches a predetermined condition set in advance;
A time information storage step of storing time information indicating a time at which the audio information and / or the image information is stored in the time-series information storage unit;
An elapsed time measuring step of measuring whether an elapsed time from the time indicated by the time information stored in the time information storage step is equal to or greater than a preset time;
The elapsed time measurement step is activated when the elapsed time is detected to be equal to or longer than the preset time, and the compression rate is reduced between the condition matching section detected in the condition matching section detection step and another section. Alternatively, a compression step of changing the compression method and recompressing the data amount of the compressed audio information and / or image information stored in the time-series information storage means,
An information storage method comprising:

A time-series information storage step of compressing voice information and / or image information input through the information input means and storing the compressed information in the time-series information storage means;
A condition matching section detecting step for detecting a section in which the input voice information matches a predetermined condition set in advance;
A correspondence relationship between the section information indicating the condition matching section detected in the condition matching section detection step and the storage position in the time-series information storage unit of the audio information and / or the image information corresponding to the section information is stored. A correspondence storage process to
A time information storage step of storing time information indicating a time at which the audio information and / or the image information is stored in the time-series information storage unit;
An elapsed time measuring step of measuring whether an elapsed time from the time indicated by the time information stored in the time information storage step is equal to or greater than a preset time;
When the elapsed time measuring step detects that the elapsed time is equal to or greater than the preset time, the section information stored in the correspondence storage step and the storage position in the time series information storage means The compressed audio information and / or image stored in the time-series information storage means by changing the compression rate or the compression method between the condition matching section and another section based on the correspondence relationship of A compression process for recompressing the amount of information data ;
An information storage method comprising:

A sensor information detection step for detecting information from an external sensor;
A time-series information storage step for storing in the time-series information storage means information obtained by compressing audio information and / or image information input through the information input means, and sensor information detected in the sensor information detection step;
A condition matching section detecting step for detecting a section in which the sensor information stored in the time-series information storage means matches a predetermined condition set in advance;
A time information storage step of storing time information indicating a time at which the audio information and / or the image information is stored in the time-series information storage unit;
An elapsed time measuring step of measuring whether an elapsed time from the time indicated by the time information stored in the time information storage step is equal to or greater than a preset time;
The elapsed time measurement step is activated when the elapsed time is detected to be equal to or longer than the preset time, and the compression rate is reduced between the condition matching section detected in the condition matching section detection step and another section. Alternatively, a compression step of changing the compression method and recompressing the data amount of the compressed audio information and / or image information stored in the time-series information storage means,
An information storage method comprising:

A time-series information storage step of compressing voice information and / or image information input through the information input means and storing the compressed information in the time-series information storage means;
A sensor information detection step for detecting information from an external sensor;
A condition matching section detecting step for detecting a condition matching section in which the sensor information detected in the sensor information detecting step matches a predetermined condition set in advance;
A correspondence relationship between the section information indicating the condition matching section detected in the condition matching section detection step and the storage position of the audio information and / or the image information corresponding to the section information in the time-series information storage unit. A correspondence storing step for storing;
A time information storage step of storing time information indicating a time at which the audio information and / or the image information is stored in the time-series information storage unit;
An elapsed time measuring step of measuring whether an elapsed time from the time indicated by the time information stored in the time information storage step is equal to or greater than a preset time;
When the elapsed time measuring step detects that the elapsed time is equal to or greater than the preset time, the section information stored in the correspondence storage step and the storage position in the time series information storage means The compressed audio information and / or image stored in the time-series information storage means by changing the compression rate or the compression method between the condition matching section and another section based on the correspondence relationship of A compression process for recompressing the amount of information data ;
An information storage method comprising: