JPH0895596A

JPH0895596A - Quick-look and quick-listening device and method thereof

Info

Publication number: JPH0895596A
Application number: JP6227545A
Authority: JP
Inventors: Kenichi Minami; 憲一南; Akito Akutsu; 明人阿久津; Yoshinobu Tonomura; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-09-22
Filing date: 1994-09-22
Publication date: 1996-04-12

Abstract

PURPOSE: To provide a quick-look/quick-listening device and its method capable of grasping the outer shell and the atmosphere of an image and a sound in a short time by using sound information. CONSTITUTION: Video information (information composed of picture information and sound information) or audio information is inputted in real time from a video input part 101, the video or audio information inputted in real time is stored in a video/audio feature variables storage part 105 and the stored video or audio information is inputted. Various kinds of feature variables are extracted from sound information among the inputted video or audio information by means of an audio feature extracting part 103. The video or audio information is processed based on the extracted feature variables by means of a feature variable managing part 104. Processed video or sound information is outputted from an interface part 106. Quick-look/quick-listening is realized by the processing based on the feature variables.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、映像や音の内容を短時
間に把握する場合等に好適な速見速聴き装置及び方法に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a quick-listening device and method suitable for grasping the contents of images and sounds in a short time.

【０００２】[0002]

【従来の技術】映像情報の速見技術には、ビデオブラウ
ザ（大場：“動画像一覧表示技術：ビデオブラウザ”，
１９９４テレビジョン学会全国大会，ｐｐ．２１９−２
２０，１９９１）がある。これは、映像を早送りした
り、映像の切れ目であるカット点の前後のみを再生した
りする等の操作によって、映像を見る時間を短縮するも
のである。また、映像の一覧性を向上させ、カット点直
後の静止画を空間的に並べてディスプレイや紙の上に印
刷し、一目で映像の外殻を把握できるようにする技術
（外村ら：“ＰａｐｅｒＶｉｄｅｏ：映像情報の紙へ
の定着”，１９９４電子情報通信学会春季全国大会，Ａ
−３１５，１９９４）がある。2. Description of the Related Art A video browser (Oba: "Motion picture list display technology: video browser",
1994 National Conference of Television Society, pp. 219-2
20, 1991). This shortens the time for viewing the video by performing an operation such as fast-forwarding the video or playing back only before and after a cut point which is a break of the video. In addition, a technology that improves the visibility of images and prints still images immediately after the cut points spatially side by side on a display or paper so that the outer shell of the images can be grasped at a glance (Tomura et al .: “Paper Video: Fixing video information on paper ", 1994 IEICE Spring National Convention, A
-315, 1994).

【０００３】音を利用したものでは、ワードスポッティ
ング（ＬｙｎｎＷｉｌｃｏｘｅｔ．ａｌ，“Ｗｏｒ
ｄｓｐｏｔｔｉｎｇｆｏｒＶｏｉｃｅＥｄｉｔｉ
ｎｇａｎｄＡｕｄｉｏＩｎｄｅｘｉｎｇ，Ｇｅｏｒ
ｇｉａＩｎｓｔｉｔｕｔｅｏｆＴｅｃｈｎｏｌｏ
ｇｙ”，１９９２）と呼ばれる方法が存在し、これはマ
イクから音声を入力、認識してその言葉と同じ言葉が出
てくる場所をテープ等の蓄積媒体から検索するものであ
る。In the case of using sound, word spotting (Lyn Wilcox et.
dspotting for Voice Edit
ng Audio Audio Indexing, Geor
gia Institute of Technology
gy ", 1992), which involves inputting and recognizing a voice from a microphone and searching a storage medium such as a tape for a place where the same word as that word appears.

【０００４】[0004]

【発明が解決しようとする課題】映像や音の内容を把握
したり、どの様な映像や音が蓄えられているのを短時間
で知るためには、再生したり早送りしたりしなければな
らない。しかし、この様な方法では、所望の部分にたど
り着いたり、全体の雰囲気をつかんだりするまでにかな
りの時間を要する。これらの解決方法としては、前述し
たようなビデオブラウザも考えられるが、単純な早送り
では音が再生できなかったり、映像の構造的な特徴量に
のみ依存しているので、選ばれたブラウズする画像が映
像の中で起きた事象を必ずしも反映しているとは言えな
い。また、つなぎ目での音がぶつ切れになって不自然で
あったり、さらに時間短縮を行いたい場合に数多くのカ
ットや画像からどの様に重要なものだけを選択するかと
いう問題が生じる。前述のワードスポッティング技術で
は、単語が予め分かっていなくてはならないのでブラウ
ジングの機能は全く考えられていない。In order to grasp the contents of images and sounds and to know what kind of images and sounds are stored in a short time, it is necessary to play or fast-forward. . However, in such a method, it takes a considerable amount of time to reach a desired portion and grasp the whole atmosphere. As a solution to these problems, the video browser as described above can be considered, but because the sound cannot be played back by simple fast-forwarding, or it depends only on the structural feature amount of the image, the selected browsing image Cannot necessarily be said to reflect what happened in the video. In addition, the sound at the joint is broken and unnatural, or there is a problem of how to select only important ones from a large number of cuts and images in order to further shorten the time. In the word spotting technique described above, the function of browsing is not considered at all because the word must be known in advance.

【０００５】本発明は上記のような問題点を解決するた
めになされたものであり、その目的は、音情報を用いて
映像や音の外殻や雰囲気を短時間のうちに把握できる速
見速聴き装置及び方法を提供することにある。The present invention has been made to solve the above problems, and an object of the present invention is to provide a quick-reading speed that can grasp the outer shell or atmosphere of a picture or sound by using sound information in a short time. A listening device and method are provided.

【０００６】さらに詳しくは、映像は音の中で起きた事
象を反映した映像或いは音の区間の選択を可能にし、幾
つかの音の区間を再生する際には各区間のつなぎ目をぶ
つ切れの少ない自然なものにして映像や音の外殻や雰囲
気をより短時間で把握できるようにする技術を提供す
る。また、音を利用して複数のカットや画像を幾つかに
まとめたり、音の特徴量と事象を対応付けることによっ
て、利用者が欲しいと思った事象に応じた速見或いは速
聴きをすることを可能にする技術を提供する。さらに、
欲しいと思った事象がはっきりと分からなくても異なる
事象に対応する音の区間をある間隔で入れ替えながら周
期的に再生する事が可能で、ブラウジングにも適した技
術を提供する。More specifically, the image enables selection of an image or a section of sound that reflects an event that occurred in the sound, and when reproducing some sound sections, the joints between the sections are broken. We will provide a technology that makes it possible to grasp the outer shell and atmosphere of images and sounds in a shorter time by making them less natural. Also, by using sound, you can combine multiple cuts and images into several pieces, and by associating sound feature amounts with events, you can quickly see or listen according to the event you want. To provide the technology. further,
Even if the desired event is not clearly known, it is possible to periodically reproduce the sound segments corresponding to different events by exchanging them at a certain interval, which provides a technology suitable for browsing.

【０００７】[0007]

【課題を解決するための手段】上記の目的を達成するた
め、請求項１の発明では、実時間で画像情報と音情報か
らなる映像情報或は音情報を入力する映像・音情報入力
部と、前記実時間で入力された映像或いは音情報を蓄積
し該蓄積された映像或いは音情報を出力し該映像或いは
音情報の特徴量を蓄積する映像・音・特徴量蓄積部と、
前記入力された映像或いは音情報において音情報から種
々の特徴量を抽出し該特徴量を管理する映像・音管理制
御部と、前記特徴量の管理された映像或いは音情報を出
力するインターフェース部と、を有する速見速聴き装置
を手段とする。In order to achieve the above object, in the invention of claim 1, a video / sound information input section for inputting video information consisting of image information and sound information or sound information in real time is provided. An image / sound / feature amount storage unit that stores the input image or sound information in real time, outputs the stored image or sound information, and stores a feature amount of the image or sound information.
An image / sound management control unit that extracts various feature amounts from sound information in the input image or sound information and manages the feature amounts, and an interface unit that outputs the image or sound information in which the feature amounts are managed. And a quick-listening and listening device having

【０００８】請求項２の発明では、前記映像・音管理制
御部が、音情報から種々の特徴量を抽出する音特徴抽出
部と、該特徴量を管理し該特徴量に基づいて音情報を加
工、変換する特徴量管理部とを有する速見速聴き装置を
手段とする。According to the second aspect of the present invention, the video / sound management control section extracts the various feature quantities from the sound information, the sound feature extraction section manages the feature quantities, and outputs the sound information based on the feature quantities. A fast-viewing / listening device having a feature amount management unit for processing and converting is used as means.

【０００９】請求項３の発明では、前記音特徴抽出部
が、音情報から種々の特徴量を抽出しある条件を満たす
特徴量が存在する音区間を音情報から検出する音情報変
化検出手段を具備する速見速聴き装置を手段とする。According to a third aspect of the present invention, the sound feature extraction unit includes sound information change detecting means for extracting various feature quantities from sound information and detecting from the sound information a sound section in which a feature quantity satisfying a certain condition exists. The fast-listening and listening device provided is used as means.

【００１０】請求項４の発明では、前記特徴量管理部
が、一定の特徴量を有する音区間と映像或いは音の中で
発生した事象とを対応付ける事象対応付け手段を具備す
る速見速聴き装置を手段とする。According to a fourth aspect of the present invention, there is provided a quick-speed listening device in which the feature quantity management section includes event associating means for associating a sound section having a certain feature quantity with an event occurring in a video or sound. Use it as a means.

【００１１】請求項５の発明では、前記特徴量管理部
が、特定の特徴量を有する或いは特定の事象に対応する
該音区間をつなぎ合わせるつなぎ合わせ手段を具備する
速見速聴き装置を手段とする。According to a fifth aspect of the present invention, the feature amount management unit is a fast-speed listening device that includes a joining unit that joins the sound sections having a particular feature amount or corresponding to a particular event. .

【００１２】請求項６の発明では、前記特徴量管理部
が、入力が映像情報の場合には該音区間に対応する画像
情報を同期させる再生手段を具備する速見速聴き装置を
手段とする。In a sixth aspect of the present invention, the feature quantity managing section is a fast-speed listening device having a reproducing means for synchronizing image information corresponding to the sound section when the input is video information.

【００１３】請求項７の発明では、前記音情報変化検出
手段が、音情報から周波数スペクトルを算出し、スペク
トルの包絡線からピークを検出し、該ピークの特徴量が
ある条件を満たす音区間を音情報から検出するものであ
る速見速聴き装置を手段とする。According to the invention of claim 7, the sound information change detecting means calculates a frequency spectrum from the sound information, detects a peak from the envelope of the spectrum, and selects a sound section satisfying a condition that the feature quantity of the peak satisfies a certain condition. A quick-listening and listening device that is detected from sound information is used.

【００１４】請求項８の発明では、前記事象対応付け手
段が、該ピークの特徴量を典型的な事象における該ピー
クの特徴量と比較することによって該音区間と事象を対
応付けるものである速見速聴き装置を手段とする。In the invention of claim 8, the event associating means associates the sound segment with the event by comparing the feature quantity of the peak with the feature quantity of the peak in a typical event. A fast-listening device is used.

【００１５】請求項９の発明では、前記つなぎ合わせ手
段が、音情報に含まれる特定の特徴量或いは事象を有す
る音区間をつなぎ合わせ、再生した場合に要する時間の
総和を該特徴量或いは該事象毎に算出し、再生時間の制
限に応じて制限時間に最も近い音区間を選択する音区間
選択手段を有する速見速聴き装置を手段とする。According to a ninth aspect of the present invention, the connecting means connects the sound sections having a specific feature quantity or event included in the sound information and reproduces the total sum of the time required for reproducing the feature quantity or the event. A quick-reading speed listening device having sound section selection means for selecting the sound section closest to the time limit according to the limitation of the reproduction time is used as the means.

【００１６】請求項１０の発明では、前記つなぎ合わせ
手段が、選択された該音区間において、音情報のパワー
が低くなったところをつなぎ合わせ点とするものである
速見速聴き装置を手段とする。According to a tenth aspect of the present invention, the connecting means is a quick-speed listening device in which the connecting point is a point where the power of the sound information is low in the selected sound section. .

【００１７】請求項１１の発明では、前記つなぎ合わせ
手段が、選択された該音区間において、該音区間の両端
のパワーを低くしてつなぎ合わせ点とするものである速
見速聴き装置を手段とする。According to an eleventh aspect of the invention, there is provided a fast-speed listening device in which the joining means lowers the power at both ends of the selected sound section to be a joining point. To do.

【００１８】請求項１２の発明では、前記再生手段が、
音情報変化抽出手段によって一定の特徴量或いはつなぎ
合わせ手段によって対応づけられた特定の事象毎に分類
された音区間の集まりを各々の振幅を周期的に変化させ
て再生する手段を有する速見速聴き装置を手段とする。In a twelfth aspect of the present invention, the reproducing means is
Fast-speed listening with a means for reproducing a set of sound sections classified by a sound information change extracting means by a certain characteristic amount or a specific event associated by a connecting means by periodically changing each amplitude. The device is the means.

【００１９】請求項１３の発明では、まず、実時間で画
像情報と音情報からなる映像情報或いは音情報を入力
し、前記実時間で入力された映像或いは音情報を蓄積
し、実時間でない場合には蓄積された映像或いは音情報
を入力し、次に、前記入力された映像或いは音情報にお
いて音情報から種々の特徴量を抽出し、次に、前記抽出
した特徴量に基づいて前記映像或いは音情報を加工し、
次に、前記加工した映像或いは音情報を出力する速見速
聴き方法を手段とする。In the thirteenth aspect of the present invention, first, image information or sound information consisting of image information and sound information is input in real time, and the image or sound information input in the real time is accumulated. In the input video or sound information, various feature quantities are extracted from the sound information in the input video or sound information, and then the video or sound information is extracted based on the extracted feature quantities. Process sound information,
Next, a quick-listening and fast-listening method for outputting the processed image or sound information is used as means.

【００２０】請求項１４の発明では、映像或は音情報を
加工する過程において、音情報から抽出された種々の特
徴量のうち、ある条件を満たす特徴量が存在する音区間
を音情報から検出する音情報変化検出方法を用いる速見
速聴き方法を手段とする。According to the fourteenth aspect of the invention, in the process of processing the image or the sound information, a sound section in which a feature quantity satisfying a certain condition is detected from the sound information among various feature quantities extracted from the sound information. The method is a fast-viewing and fast-listening method using the sound information change detection method.

【００２１】請求項１５の発明では、一定の特徴量を有
する音区間と映像或いは音の中で発生した事象とを対応
付ける事象対応付け方法を用いる速見速聴き方法を手段
とする。The fifteenth aspect of the present invention uses a fast-speed listening method that uses an event association method for associating a sound section having a certain characteristic amount with an event that occurs in a video or sound.

【００２２】請求項１６の発明では、特定の特徴量を有
する或いは特定の事象に対応する該音区間をつなぎ合わ
せるつなぎ合わせ方法を用いる速見速聴き方法を手段と
する。In the sixteenth aspect of the present invention, a fast-listening fast listening method using a joining method for joining the sound sections having a particular feature quantity or corresponding to a particular event is used.

【００２３】請求項１７の発明では、入力が映像情報の
場合には該音区間に対応する画像情報を同期させる再生
方法を用いる速見速聴き方法を手段とする。In the seventeenth aspect of the present invention, when the input is video information, a fast-viewing speed listening method using a reproducing method for synchronizing image information corresponding to the sound section is used.

【００２４】請求項１８の発明では、前記音情報変化検
出方法であって、音情報から周波数スペクトルを算出
し、スペクトルの包絡線からピークを検出し、該ピーク
の特徴量がある条件を満たす音区間を音情報から検出す
る方法を用いる速見速聴き方法を手段とする。According to an eighteenth aspect of the present invention, in the sound information change detecting method, a frequency spectrum is calculated from sound information, a peak is detected from an envelope curve of the spectrum, and a feature quantity of the peak satisfies a certain condition. A quick-listening and fast-listening method using a method of detecting a section from sound information is used.

【００２５】請求項１９の発明では、前記事象対応付け
方法であって、該ピークの特徴量を典型的な事象におけ
る該ピークの特徴量と比較することによって該音区間と
事象を対応付ける方法を用いる速見速聴き方法を手段と
する。According to a nineteenth aspect of the present invention, there is provided the method of associating an event, wherein the feature amount of the peak is compared with the feature amount of the peak in a typical event to associate the sound segment with the event. Use the fast-seeing, fast-listening method used.

【００２６】請求項２０の発明では、前記つなぎ合わせ
方法であって、音情報に含まれる特定の特徴量或いは事
象を有する音区間をつなぎ合わせ、再生した場合に要す
る時間の総和を該特徴量或いは該事象毎に算出し、再生
時間の制限に応じて制限時間に最も近い音区間を選択す
る音区間選択方法を用いる速見速聴き方法を手段とす
る。According to a twentieth aspect of the present invention, there is provided the above-mentioned joining method, wherein a total time required for joining and reproducing sound sections having a specific feature amount or event included in sound information is used as the feature amount or A quick-reading speed listening method that uses a sound segment selection method that calculates for each event and selects a sound segment that is closest to the time limit in accordance with the reproduction time limit is used.

【００２７】請求項２１の発明では、前記つなぎ合わせ
方法であって、選択された該音区間において、音情報の
パワーが低くなったところをつなぎ合わせ点とする方法
を用いる速見速聴き方法を手段とする。According to a twenty-first aspect of the invention, there is provided a fast-connecting fast listening method using the joining method, wherein a joining point is a portion where the power of sound information is low in the selected sound section. And

【００２８】請求項２２の発明では、前記つなぎ合わせ
方法であって、選択された該音区間において、該音区間
の両端のパワーを低くしてつなぎ合わせ点とする方法を
用いる速見速聴き方法を手段とする。According to a twenty-second aspect of the present invention, there is provided a fast-connecting fast listening method, which is the joining method, wherein a method is used in which the power at both ends of the selected sound section is lowered to form a joining point. Use it as a means.

【００２９】請求項２３の発明では、音情報変化抽出方
法によって一定の特徴量或いはつなぎ合わせ方法によっ
て対応づけられた特定の事象毎に分類された音区間の集
まりを各々の振幅を周期的に変化させて再生する再生方
法を用いる速見速聴き方法を手段とする。According to the twenty-third aspect of the present invention, a set of sound sections classified for each specific event associated with a certain feature amount or a joining method by the sound information change extraction method changes each amplitude periodically. A fast-viewing and fast-listening method using a reproduction method of reproducing by making it is a means.

【００３０】[0030]

【作用】本発明の請求項１及び２及び１３の速見速聴き
装置及び方法は、実時間で映像情報（画像情報と音情報
から成る情報）、音情報を入力し、実時間で入力された
映像或いは音情報を蓄積し、蓄積された映像或いは音情
報を入力し、入力された映像或いは音情報において、音
情報から種々の特徴量を抽出し、その特徴量に基づいて
映像或いは音情報を管理／加工し、管理／加工した映像
或いは音情報を出力する手段を用いることにより、実時
間情報、蓄積情報によらず映像或いは音情報を入力し、
管理／加工し、出力し、その特徴量に基づく管理／加工
によって速見、速聴きを実現する。According to the fast-viewing speed listening apparatus and method of claims 1 and 2 and 13 of the present invention, video information (information consisting of image information and sound information) and sound information are input in real time, and input in real time. The image or sound information is accumulated, the accumulated image or sound information is input, various characteristic quantities are extracted from the sound information in the inputted image or sound information, and the image or sound information is extracted based on the characteristic quantity. By using the means for managing / processing and outputting the managed / processed video or sound information, the video or sound information is input regardless of real-time information or accumulated information,
It manages / processes, outputs, and manages / processes based on the feature quantity to realize quick-seeing and quick-listening.

【００３１】本発明の請求項３の音特徴抽出部は、音情
報から種々の特徴量を抽出し、ある条件を満たす特徴量
が存在する音区間を音情報から検出する手段を用いるこ
とにより、音情報を分割できるようにする。According to the third aspect of the present invention, the sound feature extracting section extracts a variety of feature quantities from the sound information, and detects a sound section in which the feature quantity satisfying a certain condition exists from the sound information. Enable to divide sound information.

【００３２】本発明の請求項４の特徴量管理部は、一定
の特徴量を有する音区間と映像或いは音の中で発生した
事象とを対応付ける手段を用いることにより、音情報を
分類できるようにする。According to the fourth aspect of the present invention, the feature quantity management unit can classify the sound information by using means for associating a sound section having a certain feature quantity with an event occurring in the image or the sound. To do.

【００３３】本発明の請求項５の特徴量管理部は、特定
の特徴量を有する或いは特定の事象に対応する該音区間
をつなぎ合わせる手段を用いることにより、所望の音区
間をまとめる。The feature quantity management unit according to claim 5 of the present invention collects desired sound sections by using means for connecting the sound sections having a particular feature quantity or corresponding to a particular event.

【００３４】本発明の請求項６の特徴量管理部は、入力
が映像情報の場合には該音区間に対応する画像情報を同
期させる手段を用いることにより、画像と音がずれるこ
となく再生する。According to the sixth aspect of the present invention, when the input is video information, the feature quantity management unit uses means for synchronizing the image information corresponding to the sound section, thereby reproducing the image and the sound without deviation. .

【００３５】本発明の請求項１４の速見速聴き方法は、
音情報から種々の特徴量を抽出し、ある条件を満たす特
徴量が存在する音区間を音情報から検出する方法を用い
ることにより、音情報を分割できるようにする。According to claim 14 of the present invention,
The sound information can be divided by extracting various feature quantities from the sound information and detecting the sound section in which the feature quantity satisfying a certain condition exists from the sound information.

【００３６】本発明の請求項１５の速見速聴き方法は、
一定の特徴量を有する音区間と映像或いは音の中で発生
した事象とを対応付ける方法を用いることにより、音情
報を分類できるようにする。According to a fifteenth aspect of the present invention, there is provided a fast-viewing and fast-listening method,
The sound information can be classified by using a method of associating a sound section having a certain characteristic amount with an event occurring in a video or a sound.

【００３７】本発明の請求項１６の速見速聴き方法は、
特定の特徴量を有する或いは特定の事象に対応する該音
区間をつなぎ合わせる方法を用いることにより、所望の
音区間をまとめる。According to claim 16 of the present invention,
A desired sound section is put together by using a method of connecting the sound sections having a specific characteristic amount or corresponding to a specific event.

【００３８】本発明の請求項１７の速見速聴き方法は、
入力が映像情報の場合には該音区間に対応する画像情報
を同期させる方法を用いることにより、画像と音がずれ
ることなく再生する。According to a seventeenth aspect of the present invention, there is provided a quick-listening and fast-listening method,
When the input is video information, a method of synchronizing the image information corresponding to the sound section is used to reproduce the image and the sound without displacement.

【００３９】本発明の請求項７、１８の音情報変化検出
手段及び方法は、音情報から周波数スペクトルを算出
し、スペクトルの包絡線からピークを検出し、該ピーク
の特徴量がある条件を満たす音の区間を音情報から検出
する方法を用いることにより、音の変化を検出して特徴
を検出する。According to the sound information change detecting means and method of the present invention, the frequency spectrum is calculated from the sound information, the peak is detected from the envelope of the spectrum, and the feature quantity of the peak satisfies a certain condition. By using the method of detecting the section of the sound from the sound information, the change of the sound is detected and the feature is detected.

【００４０】本発明の請求項８、１９の事象対応付け手
段及び方法は、該ピークの特徴量を典型的な事象におけ
る該ピークの特徴量と比較することによって該音区間と
事象を対応付ける。The event associating means and method according to claims 8 and 19 of the present invention associates the sound segment with the event by comparing the feature quantity of the peak with the feature quantity of the peak in a typical event.

【００４１】本発明の請求項９、２０のつなぎ合わせ手
段及び方法は、音情報に含まれる特定の特徴量或いは事
象を有する音区間をつなぎ合わせ、再生した場合に要す
る時間の総和を該特徴量或いは該事象毎に算出し、再生
時間の制限に応じて制限時間に最も近い音区間を選択す
る手段を用いることにより、再生時間を変化指せ、制限
時間に応じた速見速聴きを実現する。The connecting means and method according to claims 9 and 20 of the present invention is to connect the sound sections having a specific characteristic amount or event included in the sound information, and to sum the total time required for reproduction, Alternatively, the reproduction time can be changed by using a means for calculating for each event and selecting the sound section closest to the time limit in accordance with the restriction of the reproduction time, and the quick-listening listening according to the time limit is realized.

【００４２】本発明の請求項１０、２１のつなぎ合わせ
手段及び方法は、音情報のパワーが低くなったところを
つなぎ合わせ点とする方法を用いることにより、選択さ
れた該音区間をつなぎ合わせる際に生じる音の不自然な
ぶつ切れをなくす。In the connecting means and method according to claims 10 and 21 of the present invention, when the selected sound sections are connected by using the method of using the connecting point where the power of the sound information is low. Eliminates the unnatural breaks in the sound that occur in.

【００４３】本発明の請求項１１、２２のつなぎ合わせ
手段及び方法は、該音区間の両端のパワーを低くしてつ
なぎ合わせ点とする方法を用いることにより、選択され
た該音区間のつなぎ合わせる際に生じる音の不自然なぶ
つ切れをなくす。The connecting means and method according to the eleventh and twenty-second aspects of the present invention uses the method of lowering the power at both ends of the sound section to use as a connection point, thereby connecting the selected sound sections. Eliminates unnatural breaks in the sound that occur at the time.

【００４４】本発明の請求項１２、２３の再生手段及び
方法は、前記音情報変化抽出手段によって一定の特徴量
或いは前記つなぎ合わせ手段によって対応づけられた特
定の事象毎に分類された音区間の集まりを各々の振幅を
周期的に変化させる手段を用いることにより、異なる特
徴や事象に対応した音をある間隔で順番に再生し、どの
ような特徴或は事象が含まれているのかを短時間で把握
可能とする。According to the twelfth and the twenty-third reproduction means and methods of the present invention, the sound information change extraction means classifies the sound features classified into a certain feature amount or a specific event associated by the connection means. By using a means to periodically change the amplitude of each group, sounds corresponding to different features and events are played in sequence at a certain interval, and it is possible to quickly determine what features or events are included. Can be grasped with.

【００４５】[0045]

【実施例】次に、本発明の一実施例について図面を参照
して説明する。An embodiment of the present invention will be described with reference to the drawings.

【００４６】図１は本発明の一実施例の速見速聴き装置
の概略構成を示すブロック図である。本実施例の速見速
聴き装置は、映像或いは音情報を入力する映像・音入力
部１０１と、実時間で入力された映像、音、特徴量を逐
次蓄積する映像・音・特徴量蓄積部１０５と、映像、
音、特徴量を管理し、制御する映像・音管理制御部１０
２と、出力された映像、音を提示するインターフェース
部１０６から構成されている。映像・音管理制御部１０
２は、音情報の特徴を抽出する音特徴抽出部１０３と、
特徴量に基づいて映像或いは音を加工、変換、再生する
特徴量管理部１０４で構成されており、音特徴抽出部１
０３と特徴量管理部１０４は各々、並列或いは時分割で
作動し、実時間で特徴抽出しながら映像或いは音を加
工、変換、再生できる。FIG. 1 is a block diagram showing a schematic configuration of a fast-viewing speed listening apparatus according to an embodiment of the present invention. The fast-viewing speed listening apparatus of this embodiment includes a video / sound input unit 101 for inputting video or sound information, and a video / sound / feature amount storage unit 105 for sequentially storing video, sound, and feature amounts input in real time. And the video,
Video / sound management control unit 10 that manages and controls sounds and feature quantities
2 and an interface unit 106 that presents the output video and sound. Video / sound management control unit 10
2 is a sound feature extraction unit 103 that extracts a feature of sound information;
The sound feature extraction unit 1 includes a feature amount management unit 104 that processes, converts, and reproduces video or sound based on the feature amount.
03 and the feature amount management unit 104 operate in parallel or in time division, and can process, convert, and reproduce video or sound while extracting features in real time.

【００４７】図２は映像・音管理制御部１０２の構成を
示すブロック図である。音特徴抽出部１０３は、音の特
徴がある条件を満たす音の区間を検出する音情報変化検
出手段２０１から成り、特徴量管理部１０４は、音情報
変化検出手段２０１によって求められた音区間と映像或
いは音の中で起きた事象を対応付ける事象対応付け手段
２０２と、該音区間をつなぎ合わせるつなぎ合わせ手段
２０３と、該音区間と対応する画像情報を同期させる再
生手段２０４によって構成されている。FIG. 2 is a block diagram showing the arrangement of the video / sound management control unit 102. The sound feature extraction unit 103 includes a sound information change detection unit 201 that detects a sound section that satisfies a certain condition of the sound, and the feature amount management unit 104 includes a sound section obtained by the sound information change detection unit 201. It is composed of an event associating means 202 for associating an event occurring in a video or a sound, a connecting means 203 for connecting the sound sections, and a reproducing means 204 for synchronizing image information corresponding to the sound sections.

【００４８】図３は、映像・音管理制御部１０２の音特
徴抽出部１０３及び特徴量管理部１０４を計算機等のソ
フトウェア的に実現する場合の処理を示すフローチャー
トである。この場合、まず、音情報変化検出処理３０１
を行い、次に、事象対応付け処理３０２を行い、次に、
つなぎ合わせ処理３０３を行い、次に、再生処理３０４
を行う。FIG. 3 is a flowchart showing the processing when the sound feature extraction unit 103 and the feature amount management unit 104 of the video / sound management control unit 102 are realized by software such as a computer. In this case, first, the sound information change detection processing 301
Then, event association processing 302 is performed, and then
The joining process 303 is performed, and then the reproduction process 304
I do.

【００４９】次に、本実施例の動作を説明する。Next, the operation of this embodiment will be described.

【００５０】図４は音情報変化検出手段２０１または音
情報変化検出処理３０１の処理を示すフローチャート、
図５は本実施例による音の波形から周波数スペクトルと
スペクトルの包絡線を算出した様子を示す図、図６は本
実施例によるスペクトル包絡のピークや特徴量を示す
図、図７は本実施例によるつなぎ合わせ手段において用
いられる波形のパワーの時間変化を示す図、図８は本実
施例による再生手段において用いられる重み付け関数を
示す図である。FIG. 4 is a flow chart showing the processing of the sound information change detecting means 201 or the sound information change detecting processing 301,
5 is a diagram showing a state in which a frequency spectrum and a spectrum envelope are calculated from a sound waveform according to the present embodiment, FIG. 6 is a diagram showing peaks and feature amounts of the spectrum envelope according to the present embodiment, and FIG. 7 is a present embodiment. FIG. 8 is a diagram showing the time change of the power of the waveform used in the connecting means according to the present invention, and FIG. 8 is a diagram showing the weighting function used in the reproducing means according to this embodiment.

【００５１】映像或いは音情報は映像・音入力部１０１
によって入力され、映像或いは音が実時間で入力された
場合には、逐次映像・音・特徴量蓄積部１０５によって
蓄積される。入力された映像情報は画像情報と音情報に
分離され、音情報は音情報変化検出手段２０１によって
解析される。入力が音情報のみの場合にはそのまま音情
報変化検出手段２０１によって解析される。音情報変化
検出手段２０１では、始めに図４のスペクトル算出処理
４０１が施される。入力された音情報の波形が図５の５
０１の様に与えられたとすれば、５０２の様に数ｍｓ〜
数１０ｍｓのフレームに切り出され、ＦＦＴ（高速フー
リエ変換）処理が施される。ＦＦＴ処理を施された波形
は５０３に示すようにパワー方向の起伏が激しいのでピ
ーク検出には不適当であ。そこで、図４の周波数スペク
トル抱絡線算出処理４０２を施して、ＦＦＴケプストラ
ムを求め、リフタリング処理を施した後、もう一度ＦＦ
Ｔ処理を施してスペクトルの包絡線を算出する。算出さ
れた包絡線は、５０４のようになる。ケプストラムの算
出方法にはＦＦＴの他にＬＰＣ（線形予測分析）を用い
る方法も考えらえる。The video / sound information is sent to the video / sound input unit 101.
When the video or sound is input in real time, the video / sound / feature amount storage unit 105 sequentially stores the video or sound. The input video information is separated into image information and sound information, and the sound information is analyzed by the sound information change detecting means 201. When the input is only the sound information, it is directly analyzed by the sound information change detecting means 201. In the sound information change detecting means 201, the spectrum calculation processing 401 of FIG. 4 is first performed. The waveform of the input sound information is 5 in FIG.
If it is given as 01, it will be a few ms like 502.
It is cut out into a frame of several tens of ms and subjected to FFT (Fast Fourier Transform) processing. The waveform subjected to the FFT processing has a large undulation in the power direction as indicated by 503, and is not suitable for peak detection. Therefore, the frequency spectrum envelope calculation process 402 of FIG. 4 is performed to obtain the FFT cepstrum, the lifter process is performed, and then the FF is performed again.
The T process is performed to calculate the envelope of the spectrum. The calculated envelope is 504. As a method for calculating the cepstrum, a method using LPC (linear prediction analysis) in addition to FFT can be considered.

【００５２】次に、求められた包絡線から図４の４０３
に示すピーク検出処理を施す。リフタリング処理によっ
て包絡線がなめらかになっているため包絡線の極大値及
び極小値から容易にピークを求めることができる。求め
られたピークの例を図６の６０４〜６０８に示す。Next, from the obtained envelope curve, 403 in FIG.
The peak detection process shown in is performed. Since the envelope is smoothed by the lifter process, the peak can be easily obtained from the maximum value and the minimum value of the envelope. Examples of the obtained peaks are shown in 604 to 608 of FIG.

【００５３】音情報の波形から切り出すフレームは数ｍ
ｓ〜数１０ｍｓ程度時間軸方向にずらし、同様の処理を
繰り返し行う。この様にして各時刻において求められた
ピークを基にピーク変動算出処理４０４が施される。こ
の処理では、ピークを基準とした種々のパラメータの時
間的な変化を求める。パラメータとしては、図６に示す
ピークのパワー６０１、ピークの周波数６０２、極大ピ
ークから隣接する極小ピークまでのパワーの差６０３、
ピークの数等が考えられる。Frames cut out from the waveform of sound information are several meters
The same process is repeated by shifting in the time axis direction for about s to several tens of ms. In this way, the peak variation calculation processing 404 is performed based on the peaks obtained at each time. In this process, changes with time of various parameters based on the peak are obtained. As the parameters, the peak power 601, the peak frequency 602, and the power difference 603 from the maximum peak to the adjacent minimum peak shown in FIG.
The number of peaks is considered.

【００５４】これらのパラメータの時間変化から事象対
応付け手段２０２によって映像や音の中で起きた事象と
の対応付けが行われる。例えば、映像や音に音楽が含ま
れていたとすれば、ピークの周波数６０２の時間変動が
ある閾値以下であったり、一定周期でピークのパワー６
０１が大きくなる等の特徴が見られるので、この様な特
徴を含む音区間は音楽と対応付けられる。また、ピーク
の周波数がなめらかに変化したり、極大ピークと極小ピ
ークの差が大きい等の特徴が見られる場合には人の声と
対応付けられ、ピークのパワーや数が細かく変動する場
合には雑踏や雑音と対応付けられる。Based on the temporal changes in these parameters, the event associating means 202 associates with the event that occurred in the image or sound. For example, if the image or sound contains music, the peak frequency 602 varies with time with a threshold value or less, or the peak power 6 is exceeded at a constant cycle.
Since a feature such as 01 becoming larger can be seen, a sound section including such a feature is associated with music. In addition, when the frequency of the peak changes smoothly, or when features such as a large difference between the maximum peak and the minimum peak are seen, it is associated with the human voice, and when the power or number of the peaks fluctuates finely, It is associated with crowds and noise.

【００５５】次につなぎ合わせ手段２０３により、例え
ば人の声の区間だけを再生したい場合には事象対応付け
手段２０２によって対応づけられた人の声の音区間が選
択されつなぎ合わせられる。この際、図７の７０２が選
択された音区間の１つであるとすると、７０３，７０４
の様に音区間の始めと終わりの近傍においてパワーが低
くなる部分をつなぎ合わせることにより、不自然な音の
ぶつ切れを軽減することができる。また、音区間７０２
の両端のパワーを強制的に低くしてつなげる方法もあ
る。パワーの算出には音波形を数ｍｓの範囲で区切り振
幅の２乗和を用いる方法等が考えられる。算出されたパ
ワーの一例を７０１に示す。Next, by the connecting means 203, for example, when it is desired to reproduce only the human voice section, the sound section of the human voice correlated by the event correlating means 202 is selected and connected. At this time, if 702 in FIG. 7 is one of the selected sound sections, 703, 704
By connecting the portions where the power is low in the vicinity of the beginning and end of the sound section as described above, it is possible to reduce the unnatural break of the sound. Also, the sound section 702
There is also a method of connecting by forcibly lowering the power at both ends of. For the calculation of power, a method of dividing the sound waveform within a range of several ms and using the sum of squares of the amplitude can be considered. An example of the calculated power is shown at 701.

【００５６】再生する音区間は一定の特徴量をもった区
間のつなぎ合わせだけでなく、音情報が急激に変化した
点の前後一定時間にすることも可能である。The sound section to be reproduced is not limited to the connection of sections having a fixed characteristic amount, but may be a fixed time before and after the point at which the sound information changes abruptly.

【００５７】つなぎ合わせ手段２０３によって生成され
た音区間の集まりは再生手段２０４によって再生され
る。この際、入力情報が映像であった場合には、各々の
音区間に対応した画像と共に再生される。入力情報が音
のみの場合にはそのまま再生される。The collection of sound sections generated by the connecting means 203 is reproduced by the reproducing means 204. At this time, when the input information is a video, it is reproduced together with an image corresponding to each sound section. When the input information is only sound, it is reproduced as it is.

【００５８】再生手段には次の様なブラウジング機能も
付いている。一定の特徴量或いは特定の事象毎につなぎ
合わされた音区間に、各々振幅方向に図８の８０１〜８
０４の様な周期的に変化する重み付けを施し、８０５に
示すような位相差を与えて再生する。図８は音区間の分
類が４つの場合を示しているが、音区間の数や重み付関
数、位相差等は任意に設定できる。この様な方法で再生
することによって異なる特徴或いは事象に対応した音区
間が入れ替わりながら聞こえてくる。The reproducing means also has the following browsing function. 801 to 8 of FIG. 8 are arranged in the amplitude direction in the sound sections connected by a certain feature amount or each specific event.
Periodically changing weighting such as 04 is applied, and a phase difference as indicated by 805 is given for reproduction. FIG. 8 shows the case where there are four sound segments, but the number of sound segments, the weighting function, the phase difference, etc. can be set arbitrarily. By reproducing by such a method, the sound sections corresponding to different features or events are heard while being switched.

【００５９】[0059]

【発明の効果】以上説明したように、本発明は以下に示
すような効果がある。As described above, the present invention has the following effects.

【００６０】請求項１〜１２の発明は、音情報からその
特徴をとらえ、映像や音の中で起きた事象との対応付け
を行い、所望の事象や特徴量がある条件を満たす音の区
間を再生するため、映像或いは音の内容を反映した速見
或いは速聴きができる。According to the first to twelfth aspects of the present invention, the feature is detected from the sound information, and the feature is correlated with the event that occurred in the image or the sound, and the desired sound segment or the sound segment satisfying a certain condition is satisfied. Since it is reproduced, it is possible to perform a quick viewing or a quick listening reflecting the contents of the image or the sound.

【００６１】請求項９、２０のつなぎ合わせ手段方法
は、音情報に含まれる特定の特徴量或いは事象を有する
音区間をつなぎ合わせ、再生した場合に要する時間の総
和を特徴量或いは該事象毎に算出し、利用者が指定した
再生制限時間に最も近い音区間を選択するため、再生時
間の制限に応じた速見或いは速聴きができる。The connecting means method according to claims 9 and 20 connects the sound sections having a specific feature amount or event included in the sound information, and sums the time required for reproduction for each feature amount or each event. Since the calculated sound section is selected and the sound section closest to the reproduction time limit specified by the user is selected, it is possible to perform quick-reading or quick listening according to the reproduction time limit.

【００６２】本発明の請求項１０、２１のつなぎ合わせ
手段及び方法は、選択された該音区間をつなぎ合わせる
際に音情報のパワーが低くなったところをつなぎ合わせ
点とするため、不自然な音のぶつ切れを軽減することが
できる。In the connecting means and method according to the tenth and the twenty-first aspects of the present invention, since the part where the power of the sound information becomes low when connecting the selected sound sections is the connecting point, it is unnatural. It is possible to reduce the break in the sound.

【００６３】本発明の請求項１１、２２のつなぎ合わせ
手段及び方法は、選択された音区間をつなぎ合わせる際
に、音区間の両端のパワーを低くしてつなぎ合わせるた
め、不自然の音のぶつ切れを軽減することができる。According to the connecting means and method of claims 11 and 22 of the present invention, when connecting the selected sound sections, the power of both ends of the sound sections is lowered so that the selected sound sections are connected. Cuts can be reduced.

【００６４】本発明の請求項１２、２３の再生手段及び
方法は、一定の特徴量或いは特定の事象毎に分類された
音区間の集まりを各々の振幅を周期的に変化させて再生
するため、どの様な特徴量或いは事象が含まれているの
かを短時間で把握する事が可能になる。The reproducing means and method according to the twelfth and twenty-third aspects of the present invention reproduces a set of sound sections classified by a certain characteristic amount or a specific event by periodically changing the amplitude of each. It becomes possible to grasp in a short time what kind of characteristic amount or event is included.

[Brief description of drawings]

【図１】本発明の一実施例の速見速聴き装置の概略構成
を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a fast-viewing speed listening apparatus according to an embodiment of the present invention.

【図２】本発明の一実施例の映像・音管理制御部の構成
を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a video / sound management control unit according to an embodiment of the present invention.

【図３】本発明の一実施例の音特徴抽出部及び特徴量管
理部の動作を計算機等でソフトウェア的に実現した場合
の処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing a processing flow when the operations of the sound feature extraction unit and the feature amount management unit according to the embodiment of the present invention are realized by software such as a computer.

【図４】本実施例による音情報の周波数スペクトル算出
処理から周波数スペクトルの包絡線のピーク変動を算出
するまでの処理の流れを説明するためのフローチャート
である。FIG. 4 is a flowchart for explaining the flow of processing from the frequency spectrum calculation processing of sound information to the calculation of the peak variation of the envelope of the frequency spectrum according to the present embodiment.

【図５】本実施例による音情報の波形の例とＦＦＴによ
って算出された周波数スペクトル及びケプストラムから
リフタリング処理を施しＦＦＴによって算出された周波
数スペクトルの包絡線の様子を示す図である。FIG. 5 is a diagram showing an example of a waveform of sound information according to the present embodiment, a frequency spectrum calculated by FFT, and a state of an envelope of a frequency spectrum calculated by FFT after performing a lifting process from the cepstrum.

【図６】本実施例による音情報の周波数スペクトルの包
絡線と音情報の変化をとらえるために用いる各種パラメ
ータを示す図である。FIG. 6 is a diagram showing an envelope of a frequency spectrum of sound information and various parameters used to detect changes in the sound information according to the present embodiment.

【図７】本実施例による音情報のパワーの様子とパワー
が低くなったつなぎ合わせ部分を示す図である。FIG. 7 is a diagram showing a state of power of sound information according to the present embodiment and a joint portion where the power is low.

【図８】本実施例による音区間の重み付け関数と位相差
の様子を示す図である。FIG. 8 is a diagram showing a state of a weighting function of a sound section and a phase difference according to the present embodiment.

[Explanation of symbols]

１０１…映像・音入力部１０２…映像・音管理制御部１０３…音特徴抽出部１０４…特徴量管理部１０５…映像・音・特徴量蓄積部１０６…インーフェース部２０１…音情報変化検出手段２０２…事象対応付け手段２０３…つなぎ合わせ手段２０４…再生手段３０１…音情報変化検出処理３０２…事象対応付け処理３０３…つなぎ合わせ処理３０４…再生処理４０１…周波数スペクトル算出処理４０２…周波数スペクトル包絡線算出処理４０３…ピーク検出処理４０４…ピーク変動算出処理 101 ... Video / sound input section 102 ... Video / sound management control section 103 ... Sound feature extraction section 104 ... Feature amount management section 105 ... Video / sound / feature amount storage section 106 ... Interface section 201 ... Sound information change detection means 202 ... event associating means 203 ... connecting means 204 ... reproducing means 301 ... sound information change detecting processing 302 ... event associating processing 303 ... connecting processing 304 ... reproducing processing 401 ... frequency spectrum calculating processing 402 ... frequency spectrum envelope calculating processing 403 ... Peak detection processing 404 ... Peak fluctuation calculation processing

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 5/928 ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI Technical indication H04N 5/928

Claims

[Claims]

1. A video / sound information input unit for inputting video information or sound information consisting of image information and sound information in real time, and storing the video or sound information input in the real time. An image / sound / feature amount storage unit that outputs image or sound information and accumulates the feature amount of the image or sound information, and various feature amounts by extracting various feature amounts from the sound information in the input image or sound information. And a sound / speed management device that manages the video and sound, and an interface unit that outputs video or sound information in which the characteristic amount is managed.

2. The video / sound management control unit, a sound feature extraction unit that extracts various feature amounts from sound information, and a feature that manages the feature amount and processes and converts the sound information based on the feature amount. The quick-listening listening device according to claim 1, further comprising a quantity management unit.

3. The sound feature extraction unit includes sound information change detecting means for extracting various feature quantities from sound information and detecting from the sound information a sound section in which a feature quantity satisfying a certain condition exists. The rapid-watching and listening device according to claim 2.

4. The feature quantity management unit comprises event association means for associating a sound section having a certain feature quantity with an event occurring in a video or a sound. Quick-speed listening device.

5. The feature quantity management unit comprises a joining means for joining the sound sections having a particular feature quantity or corresponding to a particular event, to each other. Quick-listening device.

6. The feature quantity managing section comprises a reproducing means for synchronizing the image information corresponding to the sound section when the input is video information. Item 10. A quick-listening and fast-listening device according to Item 5.

7. The sound information change detecting means calculates a frequency spectrum from the sound information, detects a peak from the envelope of the spectrum, and detects a sound section in which the feature amount of the peak satisfies a certain condition from the sound information. The quick-listening fast-listening device according to claim 3, wherein

8. The event associating means associates the sound segment with an event by comparing the feature value of the peak with the feature value of the peak in a typical event. Item 10. A quick-listening and fast-listening device according to Item 4.

9. The connecting means connects sound sections having a specific feature amount or event included in the sound information, and calculates the total time required for reproduction for each feature amount or each event, 6. The fast-paced speed listening device according to claim 5, further comprising a sound section selecting unit that selects a sound section closest to the time limit in accordance with the reproduction time limit.

10. The connecting means according to claim 5, wherein the connecting point is a connecting point when the power of the sound information is low in the selected sound section. Quick-listening device.

11. The connecting means for lowering the power at both ends of the selected sound section as a connecting point in the selected sound section. Fast-listening and listening device described.

12. A reproduction for reproducing a set of sound sections classified by a certain feature amount by the sound information change extracting means or a specific event associated by the connecting means by periodically changing each amplitude. The quick-speed listening device according to claim 4, 5, or 6, further comprising means.

13. First, image information or sound information consisting of image information and sound information is input in real time, and the image or sound information input in said real time is accumulated, and if it is not in real time, it is accumulated. Inputting image or sound information, then extracting various feature amounts from the sound information in the input image or sound information, and then processing the image or sound information based on the extracted feature amount. Next, a quick-listening and fast-listening method, characterized in that the processed image or sound information is output.

14. A sound information change detection for detecting, from the sound information, a sound section in which a feature quantity satisfying a certain condition among various feature quantities extracted from the sound information is detected in the process of processing the image or the sound information. 14. The method for listening to fast-paced speed according to claim 13, wherein the method is used.

15. The rapid-speed listening method according to claim 14, wherein an event association method is used for associating a sound section having a certain characteristic amount with an event occurring in a video or a sound.

16. The method according to claim 14 or 1, wherein a joining method for joining the sound sections having a particular feature quantity or corresponding to a particular event is used.
The quick-listening and listening method described in 5.

17. The reproducing method for synchronizing the image information corresponding to the sound section when the input is video information, according to claim 14 or claim 15 or claim 1.
The quick-listening and listening method described in 6.

18. The sound information change detecting method, wherein a frequency spectrum is calculated from sound information, a peak is detected from an envelope of the spectrum, and a sound section in which a feature amount of the peak satisfies a certain condition is detected from the sound information. 15. The fast-listening fast-listening method according to claim 14, wherein a detecting method is used.

19. The method of associating an event, wherein a method of associating the sound segment with an event by comparing the feature value of the peak with the feature value of the peak in a typical event is used. The quick-listening, fast-listening method according to claim 15.

20. The connection method, wherein a total sum of time required for reproducing and reproducing a sound section having a specific feature amount or event included in sound information is calculated for each feature amount or each event. 17. The fast-listening fast listening method according to claim 16, wherein a sound segment selection method for selecting a sound segment closest to the time limit is used according to the limit of the reproduction time.

21. The joining method, wherein the joining point is a portion where the power of sound information is low in the selected sound segment. 20. Quick listening and listening method described in 20.

22. The joining method, wherein in the selected sound section, a method is used in which the power at both ends of the sound section is lowered to form a joining point. Item 20. A quick-listening and listening method according to Item 20.

23. A reproduction for reproducing a set of sound sections classified by a sound information change extraction method by a certain feature amount or a specific event associated by a joining method by periodically changing each amplitude. The method of using the method of claim 15 or claim 16 or claim 17, characterized in that the method is used.