JP2002116784A

JP2002116784A - Information signal processing device, information signal processing method, information signal recording and reproducing device and information signal recording medium

Info

Publication number: JP2002116784A
Application number: JP2000308330A
Authority: JP
Inventors: Noboru Murabayashi; 昇村林; Takao Takahashi; 孝夫高橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-10-06
Filing date: 2000-10-06
Publication date: 2002-04-19

Abstract

PROBLEM TO BE SOLVED: To decide the attributes of input signals as to whether the signals are musical tone signals or not from a broadcasting program or the like. SOLUTION: This device consists of a frequency analyzing means for frequency analyzing the prescribed input information signals by each of prescribed sections, a peak frequency detecting means for successively detecting a prescribed plurality of the peak frequencies corresponding to the prescribed maximum value of the analysis processing values in a prescribed frequency range by each of the prescribed sections from the analysis signals of the frequency analyzing means, a signal deciding means for deciding whether the signals of the detection signal frequencies from the peak frequency detecting means in the prescribed section are within the prescribed frequency range inclusive of the detection frequencies even in another section of the front direction or rear direction continuous with this section or not, a signal detecting means for detecting the analysis signal level in the respective detection frequencies detected by the peak frequency detecting means or the signal level in the prescribed section of the input signals described above and a signal deciding means for deciding the attributes of the signals from the signals from the signal deciding means and the signal detecting means.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は例えば、放送番組を
受信しその音声信号を周波数解析することで楽音信号で
あるかどうかの入力信号の属性を判定する装置、方法、
及び、その放送番組を半導体メモリーや磁気ディスクな
どの磁気記録媒体に記録する際に上記判定結果に応じて
識別信号を生成し、その識別信号を用いて再生時に再生
制御を行いダイジェスト再生など特殊再生を行なう装
置、方法および記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to, for example, an apparatus and a method for receiving a broadcast program and analyzing the audio signal of the broadcast program to determine the attribute of an input signal as to whether or not the signal is a tone signal.
In addition, when recording the broadcast program on a magnetic recording medium such as a semiconductor memory or a magnetic disk, an identification signal is generated according to the determination result, and reproduction control is performed at the time of reproduction using the identification signal to perform special reproduction such as digest reproduction. And a recording medium for performing the method.

【０００２】[0002]

【従来の技術】最近では民生用のビデオ、オーディオな
どの映像音声装置においても様々なデータ圧縮方式の装
置が開発され、手軽に利用できるようになってきた。多
くの情報を色々な記録媒体に手軽に記録できるようにな
ったことにより、これら記録媒体から効率良く記録した
内容を把握したいという要求も出てきた。例えば、音声
の無音区間を検出してその部分は再生しないで実質的に
短時間でビデオテープを再生するビデオ再生装置があ
る。2. Description of the Related Art Recently, various data compression systems have been developed for video and audio devices for consumer use such as video and audio, and can be easily used. 2. Description of the Related Art As much information can be easily recorded on various recording media, there has been a demand for efficiently grasping recorded contents from these recording media. For example, there is a video reproducing apparatus that detects a silent section of audio and reproduces a video tape substantially in a short time without reproducing the section.

【０００３】このような装置においては、再生速度はせ
いぜい２倍速以下で、さらに短時間で記録した概要を把
握しようとすると無音区間の他に音声信号の一部を検出
して、その部分は再生しないという方法も考えられる。
例えば、放送番組のニュース番組などではアナウンサー
などの人声区間が情報として重要で、番組の最初や途中
で流れる音楽信号はニュース番組の情報としては優先順
位は低いと考えられる。In such a device, the reproduction speed is at most twice the speed, and in order to grasp the outline of the recording in a shorter time, a part of the audio signal is detected in addition to the silent section, and that part is reproduced. A method of not doing this is also conceivable.
For example, in a news program of a broadcast program, a human voice section such as an announcer is important as information, and a music signal flowing at the beginning or in the middle of the program is considered to have a low priority as information of a news program.

【０００４】また、人声信号と楽音信号の特性を比べた
場合に、例えばスペクトル特性は、人声信号は母音や子
音、性別、年齢などにより複雑に変化するが、楽音信号
は、特殊な場合や複雑な場合を除いて基本的に楽器によ
って発生することにより、楽音信号の方がその特性を把
握しやすいと考えられる。When the characteristics of a human voice signal and a musical tone signal are compared, for example, the spectral characteristics of a human voice signal vary in a complicated manner depending on vowels, consonants, gender, age, etc. It is thought that the characteristics of the tone signal are easier to grasp because the tone signal is basically generated by the musical instrument except for complicated cases.

【０００５】音声信号から音楽区間を検出する技術とし
ては、例えば、南憲一他：「音情報を用いた映像インデ
クシングとその応用」（電子情報通信学会論文誌）（D-
2 Vol.J81-D-2 No.3 pp.529-537 1998年3月）およ
び、南憲一他：「音情報を用いたビデオ・ブラウジング
・インターフェイス」（テレビジョン学会技術報告（Vo
l.19 No.7 pp.1-6 1995年）などに記載されてい
る。As a technique for detecting a music section from an audio signal, for example, Kenichi Minami et al .: "Video Indexing Using Sound Information and Its Application" (Transactions of the Institute of Electronics, Information and Communication Engineers) (D-
2 Vol.J81-D-2 No.3 pp.529-537 March 1998) and Kenichi Minami et al .: "Video browsing interface using sound information" (Technical Report of the Institute of Television Engineers of Japan (Vo
l.19 No.7 pp.1-6 1995).

【０００６】これら従来の楽音検出技術はリズムのみの
音楽は対象外としスペクトルピークの継続性が顕著な楽
音信号の検出を目的としていることが記載されている。[0006] It is described that these conventional musical sound detection techniques aim at detecting a musical sound signal in which the continuity of a spectral peak is remarkable, excluding rhythm-only music.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、放送番
組を記録再生する記録再生装置やカメラ一体型ビデオカ
メラで撮影する場合を想定した場合、色々な音楽ジャン
ルの信号が入力されるので一部の楽音信号の検出を目的
とするのはあまり実用的でなく、できるだけ広い音楽ジ
ャンルの信号にも対応できる楽音検出技術の方が効果的
である。そのため、上述した従来技術では音楽ジャンル
によっては楽音信号をスペクトル解析した場合にそのス
ペクトルピーク継続性が顕著でない信号などでは従来技
術で検出するのは困難な場合もあるという問題が存在し
ている。However, if it is assumed that a recording / reproducing apparatus for recording / reproducing a broadcast program or a camera-integrated video camera is used for shooting, signals of various music genres are input, so that some musical sounds are input. The purpose of signal detection is not very practical, and a tone detection technique that can handle signals of a music genre as wide as possible is more effective. For this reason, in the above-mentioned conventional technology, there is a problem that it is sometimes difficult to detect a signal having a spectral continuity that is not remarkable when the musical signal is analyzed in a spectrum, depending on a music genre.

【０００８】従って、本発明は以上のような観点からで
きだけ広い音楽ジャンルの範囲にも対応できる楽音検出
技術に解決しなければならない課題を有する。Accordingly, the present invention has a problem to be solved from the above-mentioned viewpoint to a musical sound detection technique which can cope with a range of music genres as wide as possible.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、本願発明の情報信号処理装置、情報信号処理方法、
情報信号記録再生装置及び情報信号記録媒体は、次に示
す構成にすることである。In order to solve the above-mentioned problems, an information signal processing apparatus, an information signal processing method, and an information signal processing method according to the present invention are provided.
The information signal recording / reproducing apparatus and the information signal recording medium have the following configurations.

【００１０】（１）所定の入力情報信号を所定区間毎に
周波数解析する周波数解析手段と、上記周波数解析手段
の解析信号から所定区間毎に所定周波数範囲で解析処理
値の所定の最大値に該当するピーク周波数を順次所定の
複数個検出するピーク周波数検出手段と、上記所定区間
におけるピーク周波数検出手段からの検出信号周波数が
その区間に連なる前方向または後ろ方向の他の区間でも
その検出周波数を含む所定周波数範囲内に信号が検出さ
れるかを判定する信号判定手段と、上記ピーク周波数検
出手段で検出された各々の検出周波数における解析信号
レベルまたは上記入力信号の所定区間における信号レベ
ルを検出する信号検出手段と、上記信号判定手段と上記
信号検出手段からの信号から上記信号の属性を判定する
信号判定手段と、備えたことを特徴とする情報信号処理
装置。（２）上記所定の情報信号は音声信号とすることを特徴
とする（１）に記載の情報信号処理装置。（３）上記周波数解析は上記情報信号を所定の平均化処
理または所定の間引き処理を施した後にＦＦＴ（高速フ
ーリエ変換処理）又はＤＣＴ（離散コサイン変換処理）
若しくはそれに類似する周波数解析をすることを特徴と
する（１）に記載の情報信号処理装置。（４）上記信号検出手段は検出信号レベルと所定周波数
帯域内における所定の雑音レベルから信号対雑音比を検
出することを特徴とする（１）に記載の情報信号処理装
置。(1) Frequency analysis means for performing frequency analysis of a predetermined input information signal for each predetermined section, and a predetermined maximum value of an analysis processing value within a predetermined frequency range for each predetermined section from an analysis signal of the frequency analysis means. A peak frequency detecting means for sequentially detecting a plurality of peak frequencies to be detected, and a detection signal frequency from the peak frequency detecting means in the predetermined section also includes the detection frequency in another section in the forward or backward direction connected to the section. Signal determination means for determining whether a signal is detected within a predetermined frequency range, and a signal for detecting an analysis signal level at each detection frequency detected by the peak frequency detection means or a signal level in a predetermined section of the input signal Detection means, signal determination means for determining the attribute of the signal from the signal from the signal determination means and the signal from the signal detection means, Information signal processing apparatus, characterized in that was e. (2) The information signal processing device according to (1), wherein the predetermined information signal is an audio signal. (3) In the frequency analysis, after performing a predetermined averaging process or a predetermined thinning process on the information signal, FFT (fast Fourier transform process) or DCT (discrete cosine transform process)
Alternatively, the information signal processing device according to (1), wherein a frequency analysis similar to the above is performed. (4) The information signal processing device according to (1), wherein the signal detection means detects a signal-to-noise ratio from a detected signal level and a predetermined noise level within a predetermined frequency band.

【００１１】（５）所定の入力情報信号を所定区間毎
に周波数解析し、上記周波数解析からの解析信号から所
定区間毎に所定周波数範囲における解析値の所定の最大
値に該当するピーク周波数を順次所定の複数検出し、上
記所定区間におけるピーク周波数検出から検出信号周波
数がその区間に連なる前方向又は後ろ方向の他の区間で
もその検出周波数を含む所定周波数範囲内に信号が検出
されるかを判定し、上記信号判定により上記入力情報信
号の属性を判定することを特徴とする情報信号処理方
法。（６）上記所定の情報信号は音声信号とすることを特徴
とする（５）に記載の情報信号処理方法。（７）上記周波数解析は上記情報信号を所定の平均化処
理または所定の間引き処理を施した後にＦＦＴ（高速フ
ーリエ変換）処理、又はＤＣＴ（離散コサイン変換）処
理、若しくはそれに類似する周波数解析を行なうことを
特徴とする（５）に記載の情報信号処理方法。(5) A predetermined input information signal is frequency-analyzed for each predetermined interval, and a peak frequency corresponding to a predetermined maximum value of an analysis value in a predetermined frequency range is sequentially determined for each predetermined interval from the analysis signal obtained from the frequency analysis. A plurality of predetermined detections are performed, and it is determined whether a signal is detected within a predetermined frequency range including the detection frequency in another section in the forward or backward direction in which the detection signal frequency continues from the peak frequency detection in the predetermined section. And an attribute of the input information signal is determined by the signal determination. (6) The information signal processing method according to (5), wherein the predetermined information signal is an audio signal. (7) The frequency analysis performs a predetermined averaging process or a predetermined thinning process on the information signal, and then performs an FFT (fast Fourier transform) process, a DCT (discrete cosine transform) process, or a similar frequency analysis. (5) The information signal processing method according to (5).

【００１２】（８）所定の入力手段からの入力情報信号
を所定区間毎に周波数解析する周波数解析手段と、上記
周波数解析手段からの解析信号から所定区間毎に所定周
波数範囲において解析値の所定の最大値に該当するピー
ク周波数を所定の複数個検出するピーク周波数検出手段
と、上記所定区間におけるピーク周波数検出手段からの
検出信号周波数がその区間に連なる前方向又は後ろ方向
の他の区間でもその検出周波数を含む所定周波数範囲内
に信号が検出されるかを判定する信号判定手段と、上記
ピーク周波数検出手段で検出された各々の複数の検出周
波数における信号レベルを検出する信号検出手段と、上
記信号判定手段と上記信号検出手段からの信号から上記
信号の属性を判定する信号判定手段と、上記判定手段か
らの信号に応じて所定の識別信号を生成する識別信号生
成手段と、上記識別信号生成手段からの信号と上記情報
信号を所定の記録媒体の記録する記録手段と、上記記録
媒体に記録された所定の情報信号及び所定の識別信号を
再生する再生手段と、上記再生された識別信号に応じて
上記記録媒体からの再生を制御する再生制御手段と、を
備えたことを特徴とする情報信号記録再生装置。（９）上記所定の情報信号は音声信号とすることを特徴
とする（８）に記載の情報信号記録再生装置。（１０）上記周波数解析は上記情報信号を所定の平均化
処理又は所定の間引き処理を施した後にＦＦＴ（高速フ
ーリエ変換処理）またはＤＣＴ（離散コサイン変換処
理）若しくはそれに類似する周波数解析とする（８）に
記載の情報信号記録装置。（１１）上記信号検出手段は検出信号レベルと所定周波
数帯域内における所定の雑音レベルから信号対雑音比を
検出することを特徴とする（８）に記載の情報信号記録
再生装置。(8) Frequency analysis means for frequency-analyzing an input information signal from a predetermined input means for each predetermined section, and a predetermined analysis value of an analysis value in a predetermined frequency range for each predetermined section from the analysis signal from the frequency analysis means. A peak frequency detecting means for detecting a plurality of peak frequencies corresponding to the maximum value, and a detection signal frequency from the peak frequency detecting means in the predetermined section is also detected in another section in the forward or backward direction connected to the section. Signal determination means for determining whether a signal is detected within a predetermined frequency range including a frequency; signal detection means for detecting signal levels at a plurality of detection frequencies detected by the peak frequency detection means; Signal determining means for determining the attribute of the signal from the signal from the determining means and the signal detecting means; Identification signal generation means for generating a fixed identification signal; recording means for recording the signal from the identification signal generation means and the information signal on a predetermined recording medium; a predetermined information signal recorded on the recording medium; An information signal recording / reproducing apparatus, comprising: reproducing means for reproducing the identification signal of (i), and reproduction control means for controlling reproduction from the recording medium according to the reproduced identification signal. (9) The information signal recording / reproducing apparatus according to (8), wherein the predetermined information signal is an audio signal. (10) The frequency analysis is performed by performing a predetermined averaging process or a predetermined thinning process on the information signal, and then performing FFT (fast Fourier transform process) or DCT (discrete cosine transform process) or a frequency analysis similar thereto (8). The information signal recording device according to (1). (11) The information signal recording / reproducing apparatus according to (8), wherein the signal detecting means detects a signal-to-noise ratio from the detected signal level and a predetermined noise level within a predetermined frequency band.

【００１３】（１２）所定の情報信号を所定区間毎に周
波数解析し、上記周波数解析からの解析信号から所定区
間毎に所定周波数範囲における解析値の所定の最大値に
該当するピーク周波数を順次所定の複数検出し、上記所
定区間におけるピーク周波数検出から検出信号周波数が
その区間に連なる前方向又は後ろ方向の他の区間でもそ
の検出周波数を含む所定周波数範囲内に信号が検出され
るかを判定し、上記信号判定により上記情報信号の所定
区間が所定の属性信号であると判定される場合にその所
定区間を識別する所定の識別信号を記録媒体の所定領域
に記録することを特徴とする情報信号記録媒体。（１３）上記情報信号は音声信号とすることを特徴とす
る（１２）に記載の情報信号記録媒体。（１４）上記周波数解析は上記情報信号を所定の平均化
処理または所定の間引き処理を施した後にＦＦＴ（高速
フーリエ変換処理）又はＤＣＴ（離散コサイン変換処
理）若しくはそれに類似する周波数解析とする（１２）
に記載の情報信号記録装置。(12) A predetermined information signal is subjected to frequency analysis for each predetermined section, and a peak frequency corresponding to a predetermined maximum value of an analysis value in a predetermined frequency range is sequentially determined for each predetermined section from the analysis signal obtained from the frequency analysis. A plurality of detections, and from the peak frequency detection in the predetermined section, determine whether a signal is detected within a predetermined frequency range including the detection frequency in another section in the forward or backward direction in which the detection signal frequency continues to the section. An information signal characterized by recording a predetermined identification signal for identifying the predetermined section in a predetermined area of a recording medium when the predetermined section of the information signal is determined to be a predetermined attribute signal by the signal determination. recoding media. (13) The information signal recording medium according to (12), wherein the information signal is an audio signal. (14) The frequency analysis is a frequency analysis similar to FFT (fast Fourier transform) or DCT (discrete cosine transform) after subjecting the information signal to a predetermined averaging process or a predetermined thinning process (12). )
An information signal recording device according to claim 1.

【００１４】このような構成において、音声信号など所
定の入力情報信号を、例えば、その対応する映像信号の
１フレーム区間の整数倍の所定区間毎に周波数解析する
していく。そして、解析信号から所定区間毎に所定周波
数範囲で解析処理値の所定の最大値に該当するピーク周
波数を、第１次ピーク周波数、第２次ピーク周波数、−
−−−と順次所定のピーク周波数を検出していく。そし
て、上記所定区間におけるピーク周波数検出からの検出
信号がその区間に連なる他の区間でもその検出周波数を
含む所定周波数範囲内に信号が検出されるかを判定して
いく。このようにして、検出された各々の検出ピーク周
波数における解析信号レベルまたは上記入力信号の所定
区間における信号レベルを検出して、信号のＳ／Ｎが良
好でない場合や所定レベル以下の微少信号などの場合を
判定し効率良く楽音検出を行う。In such a configuration, a predetermined input information signal such as an audio signal is frequency-analyzed, for example, for each predetermined section which is an integral multiple of one frame section of the corresponding video signal. Then, a peak frequency corresponding to a predetermined maximum value of the analysis processing value in a predetermined frequency range for each predetermined section from the analysis signal is defined as a first peak frequency, a second peak frequency,-
A predetermined peak frequency is sequentially detected in the order of ---. Then, it is determined whether a signal is detected within a predetermined frequency range including the detection frequency in another section in which the detection signal from the peak frequency detection in the predetermined section is continuous with the section. In this way, the analytic signal level at each detected peak frequency or the signal level in a predetermined section of the input signal is detected, and if the S / N of the signal is not good or a small signal equal to or lower than the predetermined level is detected. The case is determined and the musical sound is detected efficiently.

【００１５】又、以上の検出結果に応じて楽音区間を識
別する識別信号を音声信号などと共に所定の記録媒体に
記録し、再生時にこれを検出することで効果的なダイジ
ェスト再生システムや楽音区間を効率的に編集するなど
の編集システムを実現することができる。Further, an identification signal for identifying a musical tone section is recorded on a predetermined recording medium together with a voice signal or the like in accordance with the above detection result, and this is detected at the time of reproduction, so that an effective digest reproducing system or musical tone section can be obtained. An editing system such as efficient editing can be realized.

【００１６】[0016]

【発明の実施の形態】次に、本発明に係る情報信号処理
装置、情報信号処理方法、情報信号記録再生装置及び情
報信号記録媒体に関する実施形態について、図面を参照
して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of an information signal processing device, an information signal processing method, an information signal recording / reproducing device and an information signal recording medium according to the present invention will be described with reference to the drawings.

【００１７】先ず、本願発明の概要について説明する
と、入力音声信号を所定区間ごとにＦＦＴ（高速フーリ
エ変換）やＤＣＴ（離散コサイン変換）処理など所定の
周波数解析処理を行ない、その周波数解析信号から所定
の周波数範囲において所定のピーク周波数を順次複数個
検出していく。それら、ピーク周波数が上記所定の区間
に連なる他の区間でも検出されるかを検出し、ピーク周
波数における信号対雑音比または所定区間の入力信号レ
ベルを検出して上記ピーク検出との検出結果に応じて入
力情報信号が所定の楽音信号であるか判定する。First, an outline of the present invention will be described. An input audio signal is subjected to predetermined frequency analysis processing such as FFT (Fast Fourier Transform) or DCT (Discrete Cosine Transform) processing for each predetermined section, and a predetermined analysis is performed from the frequency analysis signal. A plurality of predetermined peak frequencies are sequentially detected in the above frequency range. Detecting whether the peak frequency is detected in another section connected to the predetermined section, detecting the signal-to-noise ratio at the peak frequency or the input signal level in the predetermined section, and responding to the detection result with the peak detection. It is determined whether the input information signal is a predetermined tone signal.

【００１８】また、この判定結果に応じて所定の識別信
号を生成し磁気テープ、光磁気ディスク、ハードディス
クまたは半導体メモリーなどの予定の記録媒体に情報信
号と共に識別信号を記録し、再生することで楽音信号の
検出区間を再生制御し、所定のダイジェスト再生を行な
う。Further, a predetermined identification signal is generated in accordance with the result of the determination, and the identification signal is recorded together with the information signal on a predetermined recording medium such as a magnetic tape, a magneto-optical disk, a hard disk or a semiconductor memory, and reproduced to reproduce a musical tone. Reproduction control is performed on a signal detection section, and a predetermined digest reproduction is performed.

【００１９】次に、上記概要からなる本願発明の具体的
な実施例について、図面を用いて以下の順序で説明す
る。（１）本発明の動作原理（２）本発明による楽音検出系ブロック構成例１（３）本発明による楽音検出系ブロック構成例２（４）本発明による記録再生装置ブロック構成例（５）動作フローチャート例１（６）動作フローチャート例２Next, specific embodiments of the present invention having the above outline will be described in the following order with reference to the drawings. (1) Operation principle of the present invention (2) Tone detection system block configuration example 1 of the present invention (3) Tone detection block configuration example 2 of the present invention (4) Recording / reproducing apparatus block configuration example of the present invention (5) Operation Flowchart example 1 (6) Operation flowchart example 2

【００２０】（１）本発明の動作原理本発明の動作原理を説明する前に、従来における楽音信
号を検出する手法について説明する。図１０は従来技術
による楽音信号を検出する方法を示した概念図である。
図１０（１）に示すように、音声信号を所定区間毎に分
割し、図１０（２）に示すようにＦＦＴ（高速フーリエ
変換）処理を行ない、スペクトルピーク周波数を検出す
ることを考える。(1) Operation Principle of the Present Invention Before describing the operation principle of the present invention, a conventional technique for detecting a tone signal will be described. FIG. 10 is a conceptual diagram showing a method for detecting a tone signal according to the prior art.
As shown in FIG. 10 (1), it is assumed that the audio signal is divided into predetermined sections and FFT (fast Fourier transform) processing is performed as shown in FIG. 10 (2) to detect a spectrum peak frequency.

【００２１】図１０（３）は、図１０（１）の音声信号
が楽音信号であると仮定した場合で、分かりやすいよう
に図１０（２）のＦＦＴ解析の結果から時間方向にスペ
クトルピーク周波数をプロットした概念図である。ここ
では、スペクトルピークが明確に検出できる音楽信号を
仮定しているので、図１０（３）の黒丸で示すようにス
ペクトルピーク周波数ｆ２（Ｈｚ）が区間Ａ１〜Ａ４の
間で継続しており、容易に楽音信号であると判定でき
る。FIG. 10 (3) shows a case where the voice signal of FIG. 10 (1) is assumed to be a musical tone signal. For easy understanding, the spectrum peak frequency in the time direction is obtained from the result of the FFT analysis of FIG. 10 (2). FIG. 3 is a conceptual diagram in which is plotted. Here, since a music signal whose spectrum peak can be clearly detected is assumed, the spectrum peak frequency f2 (Hz) continues between the sections A1 to A4 as indicated by the black circles in FIG. It can be easily determined that the signal is a tone signal.

【００２２】ここで、図１０（２）のＦＦＴ解析の結果
から２番目のピーク周波数に着目すると第１ピーク周波
数の高調波であるｆ４でもピークが継続していることが
分かる。第２ピーク周波数も同様に時間方向にプロット
すると図１０（３）の白丸で示すようにピーク周波数が
継続していることが分かる。Here, focusing on the second peak frequency from the result of the FFT analysis of FIG. 10B, it can be seen that the peak continues at f4 which is a harmonic of the first peak frequency. Similarly, if the second peak frequency is plotted in the time direction, it can be seen that the peak frequency continues as indicated by a white circle in FIG.

【００２３】しかし、放送番組などからの音声信号にお
ける楽音信号では色々な楽器音声の組み合わせにより、
そのスペクトル特性は複雑で、人が明らかに楽音信号と
認識できるものでも図１０（３）のようにピーク周波数
は継続性を示すことがないものもある。However, in the case of a tone signal in an audio signal from a broadcast program or the like, various combinations of instrument sounds are used.
The spectral characteristics are complicated, and there are those which can be clearly recognized as a tone signal by a person but whose peak frequency does not show continuity as shown in FIG.

【００２４】次に、本発明の動作原理について、図１を
参照して説明する。図１（２）は図１（１）の各音声区
間をＦＦＴ解析したもので、図１（３）の黒丸は第１ス
ペクトルピーク周波数をプロットしたもので、白丸は第
２スペクトルピーク周波数をプロットしたものである。
図１（３）を見ると第１ピーク周波数は、Ａ２〜Ａ３区
間では継続性を示すが、Ａ１〜Ａ４の全区間では継続し
ていないことが分かる。実際の楽音信号ではこのように
スペクトルピークが明確に継続する信号ではない場合が
多い。Next, the operation principle of the present invention will be described with reference to FIG. FIG. 1 (2) shows the result of FFT analysis of each voice section of FIG. 1 (1). The black circle in FIG. 1 (3) plots the first spectrum peak frequency, and the white circle plots the second spectrum peak frequency. It was done.
From FIG. 1 (3), it can be seen that the first peak frequency shows continuity in the sections A2 to A3, but does not continue in all sections A1 to A4. In many cases, an actual musical tone signal is not a signal whose spectral peak clearly continues as described above.

【００２５】しかし、着目しているピーク周波数の時間
方向の特性を見てみると、例えば、図１（２）に示すよ
うに、第１ピーク周波数は図１（１）のＡ１区間ではｆ
２（Ｈｚ）で、Ａ２区間ではｆ３（Ｈｚ）、Ａ３区間で
はｆ３（Ｈｚ）、Ａ４区間ではｆ４（Ｈｚ）である。第
２ピーク周波数については、Ａ１区間ではｆ３（Ｈ
ｚ）、Ａ２区間ではｆ２（Ｈｚ）、Ａ３区間ではｆ４
（Ｈｚ）、Ａ４区間ではｆ３（Ｈｚ）となっている。図
１（３）の時間方向にこれら第１ピーク周波数、第２ピ
ーク周波数をプロットすると、第１ピーク周波数あるい
は第２ピーク周波数それ事体は、Ａ１〜Ａ４の長い区間
にわたって継続性はないものの、例えば、第１ピーク周
波数に存在していたスペクトルは楽音信号の場合にはそ
の他の信号区間でも存在する確率が高くなる。However, looking at the characteristic of the peak frequency in the time direction, for example, as shown in FIG. 1 (2), the first peak frequency is f in the section A1 of FIG. 1 (1).
2 (Hz), f3 (Hz) in the A2 section, f3 (Hz) in the A3 section, and f4 (Hz) in the A4 section. Regarding the second peak frequency, f3 (H
z), f2 (Hz) in section A2, f4 in section A3
(Hz) and f3 (Hz) in the A4 section. When the first peak frequency and the second peak frequency are plotted in the time direction of FIG. 1 (3), the first peak frequency or the second peak frequency itself has no continuity over a long section from A1 to A4. For example, in the case of a tone signal, the spectrum that has been present at the first peak frequency has a higher probability of being present in other signal sections.

【００２６】上記の説明では簡単のため、第１ピーク周
波数と第２ピーク周波数について述べたが、その他の第
３ピーク周波数や第４ピーク周波数といった高次のピー
ク周波数が継続する場合もある。In the above description, the first peak frequency and the second peak frequency have been described for simplicity. However, other higher peak frequencies such as the third peak frequency and the fourth peak frequency may continue.

【００２７】ここで実際のＦＦＴ解析によるスペクトル
の概念図は、図２のようになり、第１ピーク周波数はｆ
６（Ｈｚ）であるが、第２ピーク周波数はｆ１８（Ｈ
ｚ）であってｆ７（Ｈｚ）でない。同様に、第３ピーク
周波数はｆ２８（Ｈｚ）であってｆ５（Ｈｚ）ではな
い。また、ＦＦＴ解析によるパワースペクトルの値が小
さい場合は雑音と考えられる場合もあるので、例えば、
図２の場合では、しきい値Ｔｈ以下のスペクトルピーク
はピークとして考えないことにする。図２の例では、第
４ピーク周波数はｆ３５（Ｈｚ）になるが、所定しきい
値Ｔｈ以下なので、第４ピーク周波数の検出は行なわな
い。Here, a conceptual diagram of the spectrum by the actual FFT analysis is as shown in FIG. 2, and the first peak frequency is f
6 (Hz), but the second peak frequency is f18 (H
z) and not f7 (Hz). Similarly, the third peak frequency is f28 (Hz), not f5 (Hz). Also, when the value of the power spectrum obtained by the FFT analysis is small, it may be considered as noise.
In the case of FIG. 2, a spectrum peak below the threshold value Th is not considered as a peak. In the example of FIG. 2, the fourth peak frequency is f35 (Hz), but is not more than the predetermined threshold Th, so that the fourth peak frequency is not detected.

【００２８】ところで音楽信号の場合は、テンポが速い
信号やオーケストラのように色々な楽器が混在する信号
のような場合には、上記のようにスペクトルの時間方向
の特性が必ずしも顕著に現れないものもある。そこで、
より楽音信号の検出精度を改善するためにスペクトル特
性の他に楽音信号のリズム性を検出することも考える。By the way, in the case of a music signal, such as a signal having a fast tempo or a signal in which various musical instruments are mixed like an orchestra, the characteristic of the spectrum in the time direction does not always appear remarkably as described above. There is also. Therefore,
To further improve the detection accuracy of the tone signal, detection of the rhythmicity of the tone signal in addition to the spectral characteristics is considered.

【００２９】図３に示すのは音声信号からリズム性を検
出する概念図である。図３（１）は音声信号波形の一例
であり、図３（２）はその音声信号をエンベロープ検波
した波形の一例である。リズム性のある楽音信号の場合
では、エンベロープ検波波形は周期性が認められ、自己
相関関数を演算すると図３（３）のようにノイズが改善
さた周期的波形が得られる。この波形を例えばＦＦＴ解
析してピーク周波数とＳ／Ｎを検出し、Ｓ／Ｎが良好な
波形はリズム性があると判定でき、また場合によっては
検出されたリズム周波数から、雑音であるか音楽信号で
あるかを判定することも考えられる。FIG. 3 is a conceptual diagram for detecting rhythmicity from an audio signal. FIG. 3A shows an example of an audio signal waveform, and FIG. 3B shows an example of an envelope-detected waveform of the audio signal. In the case of a rhythmic tone signal, the envelope detection waveform has periodicity, and when the autocorrelation function is calculated, a periodic waveform with improved noise is obtained as shown in FIG. This waveform is subjected to, for example, FFT analysis to detect a peak frequency and S / N, and a waveform having a good S / N can be determined to have rhythmic properties. It is also conceivable to determine whether the signal is a signal.

【００３０】以上のように、高次のスペクトルピーク解
析における時間特性の他にリズム特性を検出し、両方の
検出結果から楽音信号性を判定することでより広い音楽
ジャンルに対応した検出が比較的簡単な構成により実現
できる。As described above, by detecting the rhythm characteristic in addition to the time characteristic in the higher-order spectral peak analysis and judging the tone signal characteristics from both the detection results, the detection corresponding to a wider music genre can be relatively performed. It can be realized with a simple configuration.

【００３１】ここで、上記で説明したスペクトルピーク
検出のＦＦＴ解析処理およびリズム検出処理系でのＦＦ
Ｔ解析処理はＤＣＴ（離散コサイン変換）処理で行なっ
ても良く、またバンドパスフィルター（ＢＰＦ）を複数
組み合わせた信号周波数特性解析フィルターのような周
波数解析系で同様の処理を行なうことも考えられる。Here, the FFT analysis processing of the spectrum peak detection described above and the FF in the rhythm detection processing system are performed.
The T analysis processing may be performed by DCT (discrete cosine transform) processing, or similar processing may be performed by a frequency analysis system such as a signal frequency characteristic analysis filter in which a plurality of band-pass filters (BPFs) are combined.

【００３２】（２）本発明による楽音検出系ブロック構
成例１図４は本発明による楽音検出系のブロック構成例であ
る。この楽音検出系ブロックは、前記したｎ次スペクト
ルピークの継続性を検出するブロックと入力音声信号の
エンベロープ波形からリズム性を検出するブロックによ
り構成されている。(2) Example 1 of Tone Detection System Block Configuration According to the Present Invention FIG. 4 shows an example of a block configuration of a tone detection system according to the present invention. This musical tone detection system block is composed of a block for detecting the continuity of the n-th spectral peak and a block for detecting rhythmicity from the envelope waveform of the input audio signal.

【００３３】［２．１］スペクトルピーク継続性検出ブ
ロックの説明まず始めにスペクトルピーク継続長検出ブロックについ
て説明する。入力音声信号は、所定音声区間毎、すなわ
ち所定データサンプル毎に処理を行なうため所定音声区
間検出系２０に入力し所定の数サンプルのデータが取り
出される。[2.1] Description of Spectrum Peak Continuity Detection Block First, the spectrum peak continuity detection block will be described. The input voice signal is input to a predetermined voice section detection system 20 for processing for each predetermined voice section, that is, for each predetermined data sample, and data of a predetermined number of samples is extracted.

【００３４】なお、ここで処理を高速化するためにサン
プリング周波数に応じた所定のデータ平均化処理により
信号処理を行なった音声データに対して以下の説明を行
なう楽音検出信号処理を行なっても良い。Here, in order to speed up the processing, a tone detection signal processing described below may be performed on the audio data that has been subjected to the signal processing by the predetermined data averaging processing corresponding to the sampling frequency. .

【００３５】例えば、音声信号をサンプリング周波数４
８ＫＨｚ、量子化ビット数１６ビットでＡ／Ｄ変換処理
した信号を入力とした場合に、１／８程度に平均化処理
を行なってから数秒分の音声データを処理することも考
えられる。For example, the audio signal is converted to a sampling frequency of 4
When an A / D-converted signal is input at 8 KHz and the number of quantization bits is 16 bits, audio data for several seconds may be processed after averaging about 1/8.

【００３６】人がある音声信号を聞いた場合に、ある一
瞬の音声信号、例えば極端な場合では０．１秒くらいの
音声を瞬時に音楽であるか、人の会話音声であるか判定
をするのは困難であり、ある時間の音声信号を聞いては
じめて、その音声が音楽であるか会話であるかの判定が
できる。When a person hears a certain sound signal, a momentary sound signal, for example, a sound of about 0.1 seconds in an extreme case, is instantaneously determined to be music or human conversation sound. This is difficult, and it is only after listening to an audio signal for a certain time period that it can be determined whether the audio is music or conversation.

【００３７】そのため、信号処理を行なう場合も秒分の
データを処理してから判定を行なうのは、いたって自然
なことであり、検出処理を行なう音声データ数がある所
定数あった方が検出精度が良くなると考えられるが、あ
まりデータ数を多くすると判定するまでの時間が余計に
かかってしまうので、処理を行なう音声データサンプル
数は数秒分のデータをめどに適当な数に設定する。Therefore, when performing signal processing, it is quite natural to make a judgment after processing data for seconds, and it is more natural to detect a predetermined number of audio data to be detected. It believed accuracy is good, because it takes an extra time until it is determined that increasing too much the number of data, the number of audio data samples to perform the process is set to the number appropriate to the prospect of a few seconds of data.

【００３８】所定音声区間検出系２０からの音声データ
は、ＦＦＴ処理系２１で所定のＦＦＴ解析処理が行なわ
れ、次のパワースペクトル検出系２２で、周波数ｆ（Ｈ
ｚ）におけるＦＦＴ解析データの実部データdrealと虚
部データ dimagからパワースペクトルＰ（ｆ）、Ｐ（ｆ）＝（dreal×dreal + dimag×dimag）（２．１）が演算される。The voice data from the predetermined voice section detection system 20 is subjected to a predetermined FFT analysis process in an FFT processing system 21, and a frequency f (H
The power spectrum P (f), P (f) = (dreal × dreal + dimag × dimag) (2.1) is calculated from the real part data dreal and the imaginary part data dimag of the FFT analysis data in z).

【００３９】パワースペクトル演算された信号は、第１
次から第９次までのスペクトルピーク周波数を検出する
ために４０１〜４０９の各々のピーク周波数検出系に入
力する。The signal whose power spectrum has been calculated is the first signal
The spectrum frequencies from the next to the ninth order are detected and input to the respective peak frequency detection systems 401 to 409.

【００４０】また、ピーク周波数におけるＳ／Ｎは、そ
れぞれのＳ／Ｎ検出系５０１〜５０９において検出され
る。The S / N at the peak frequency is detected by each of the S / N detection systems 501-509.

【００４１】ピーク周波数検出系４０１〜４０９からの
信号と、Ｓ／Ｎ検出系５０１〜５０９からの信号はスペ
クトルピーク継続長検出系３０に入力し、前記動作原理
の説明で説明したようなスペクトルピークの継続長検出
を行なう。The signals from the peak frequency detection systems 401 to 409 and the signals from the S / N detection systems 501 to 509 are input to the spectrum peak continuation length detection system 30, and the spectrum peaks as described in the above description of the operation principle are obtained. Is detected.

【００４２】［２．２］リズム性検出ブロックの説明次に、入力音声信号のリズム性検出ブロックについて説
明する。所定音声区間検出系２０からの音声データはエ
ンベロープ検出系２４で所定のエンベロープ検波処理に
よりエンベロープ波形が取り出され、次の自己相関関数
演算系２５で所定の自己相関演算が行われる。[2.2] Explanation of Rhythm Detection Block Next, the rhythm detection block of the input audio signal will be described. From the voice data from the predetermined voice section detection system 20, an envelope waveform is extracted by a predetermined envelope detection process in an envelope detection system 24, and a predetermined autocorrelation calculation is performed in a next autocorrelation function calculation system 25.

【００４３】なお、ここで相関性のある信号のスペクト
ルも相関性があることを考えると、自己相関関数演算を
行なう場合にエンベロープ検出信号の替わりに図４
（２）のリズム性検出ブロックの他の実施例に示すよう
にＦＦＴ処理を行なった後の信号を用いる方法も考えら
れる。Here, considering that the spectrum of the correlated signal also has correlation, when performing the autocorrelation function operation, the envelope detection signal is used instead of the envelope detection signal.
As shown in another embodiment of the rhythmicity detection block (2), a method of using the signal after performing the FFT processing may be considered.

【００４４】自己相関演算された信号は、次のＦＦＴ処
理系２６において所定のＦＦＴ解析処理が行われ、パワ
ースペクトル検出系２７で所定のパワースペクトルが検
出される。パワースペクトル検出系２７からの信号は次
のピーク周波数検出系２８でピーク周波数が検出され、
その後Ｓ／Ｎ検出系２９で所定のＳ／Ｎが検出される。The signal subjected to the autocorrelation calculation is subjected to a predetermined FFT analysis processing in the next FFT processing system 26, and a predetermined power spectrum is detected in the power spectrum detection system 27. The peak frequency of the signal from the power spectrum detection system 27 is detected by the next peak frequency detection system 28,
Thereafter, the S / N detection system 29 detects a predetermined S / N.

【００４５】ここで会話音声と音楽（楽音）音声波形の
相違について考えてみる。図５（１）は会話音声波形の
概念図で、普通人が言葉を喋っている場合に、息継ぎを
するので、（１）の波形のＡ、Ｂ，Ｃ、―――― Ｈ、
Ｉの部分のようにある時間で音声波形のレベルが急に下
がる部分がある。Here, consider the difference between the speech sound waveform and the music (musical sound) sound waveform. FIG. 5 (1) is a conceptual diagram of a conversation voice waveform. When a normal person speaks a word, breathing is performed. Therefore, waveforms A, B, C, --- H,
There is a portion where the level of the audio waveform suddenly drops at a certain time like the portion I.

【００４６】ところが、図５（２）に示すように、音楽
（楽音）信号の場合は、一般的に楽器の演奏などであり
会話音声波形のように急なレベルダウンが起こることは
まれであると考えれる。そこで、信号レベル検出系２３
で音声信号のレベルを検出し、このレベルダウンを検出
することで楽音検出のパラメータとして利用することを
考える。However, as shown in FIG. 5 (2), in the case of a music (musical tone) signal, generally, a musical instrument is performed, and a sudden level drop like a speech voice waveform rarely occurs. It is thought. Therefore, the signal level detection system 23
It is considered that the level of the audio signal is detected by using, and that this level reduction is used as a parameter for tone detection.

【００４７】上記のパワースペクトルピーク継続長検出
系３０、Ｓ／Ｎ検出系２９、音声レベル検出系２３から
の信号は総合判定系３１に入力し、それぞれの検出結果
から総合的に入力音声信号の楽音判定を行なう。The signals from the power spectrum peak continuation length detection system 30, the S / N detection system 29, and the audio level detection system 23 are input to a general judgment system 31, and the respective detection results are used to comprehensively determine the input audio signal. Performs tone determination.

【００４８】ここで、それぞれの検出結果に重み付けを
し総合的に判定するようにしても良く、例えば、総合判
定値をＷとし、スペクトルピーク継続長の検出結果をＳ
ｐ（０≦Ｓｐ≦１）、リズム性検出結果をＲｈ（０≦Ｒ
ｈ≦１）、音声レベルダウン検出結果をＬｄ（０≦Ｌｄ
≦１）とすると、便宜的に判定パラメータとして、Ｗ＝Ｓｐ＋Ｒｈ＋（１−Ｌｄ）（２．２）を定義すると、Ｗは、０≦Ｗ≦３で、Ｗが３近い程、音
声信号は楽音信号である可能性が高いと判定できる。Here, each detection result may be weighted to make a comprehensive judgment. For example, the total judgment value is W, and the detection result of the spectrum peak continuation length is S.
p (0 ≦ Sp ≦ 1) and the rhythmic detection result is Rh (0 ≦ R
h ≦ 1), the sound level down detection result is Ld (0 ≦ Ld)
If ≦ 1), W = Sp + Rh + (1−Ld) (2.2) is defined as a determination parameter for convenience, W is 0 ≦ W ≦ 3, and as W becomes closer to 3, the sound signal becomes a musical tone. It can be determined that there is a high possibility that the signal is a signal.

【００４９】そこで、ある所定のしきい値としてＷｔｈ
を設定し、Ｗ＞Ｗｔｈ（２．３）となる音声区間を楽音区間と判定することができる。Therefore, as a predetermined threshold value, Wth
Is set, and a voice section satisfying W> Wth (2.3) can be determined as a musical tone section.

【００５０】Ｓｐはスペクトル継続長が所定区間以上継
続している場合をＳｐ＝１とし、継続性が認められない
場合はＳｐ＝０とする。Sp is set to Sp = 1 when the spectrum continuation length is longer than a predetermined section, and is set to Sp = 0 when continuity is not recognized.

【００５１】Ｒｈも同様に考え、リズム性が顕著に検出
される場合はＲｈ＝１とし、リズム性が認められない場
合はＲｈ＝０とする。Rh is considered in the same manner. When the rhythmicity is remarkably detected, Rh = 1 is set, and when the rhythmicity is not recognized, Rh = 0.

【００５２】また、Ｌｄは所定区間にわたり音声波形の
レベルダウンがまったく無い場合はＬｄ＝０とし、頻繁
にレベルダウンが検出される場合はＬｄ＝１とする。Ld is set to Ld = 0 when there is no level drop of the audio waveform over a predetermined section, and is set to Ld = 1 when the level is frequently detected.

【００５３】ここで、上記でのべたように重み付けを行
なうため、重み付け係数として、α（０＜α＜１）、β
（０＜β＜１）、γ（０＜γ＜１）を考え、Ｗ＝α・Ｓｐ＋β・Ｒｈ＋γ・（１−Ｌｄ）（２．４）としてＷが３に近い程、音声信号は音楽信号である可能
性は高いと判定し、後で述べる記録再生装置における識
別信号をこの判定結果に応じて生成しても良い。Here, in order to perform weighting as described above, α (0 <α <1), β
(0 <β <1) and γ (0 <γ <1), W = α · Sp + β · Rh + γ · (1-Ld) (2.4) As W is closer to 3, the audio signal is a music signal. May be determined to be high, and an identification signal in the recording / reproducing apparatus described later may be generated according to the determination result.

【００５４】また、音声信号のレベルダウンは容易に検
出でき、無音信号区間は本発明における一連の所定の信
号処理を行なわなくても音楽信号ではないと判定できる
ので音声レベル検出を始めに行なうようにして、レベル
ダウンのパラメータを全体に掛けて、Ｗ＝γ・（１−Ｌｄ）・（α・Ｓｐ＋β・Ｒｈ）（２．５）とすることも考えられる。この場合は、Ｗが２に近いほ
ど楽音である確率が高いことになる。Further, the level reduction of the audio signal can be easily detected, and it can be determined that the silent signal section is not a music signal without performing a series of predetermined signal processing in the present invention. Then, it is conceivable to multiply the whole by the parameter of the level down to obtain W = γ · (1−Ld) · (α · Sp + β · Rh) (2.5) In this case, the closer W is to 2, the higher the probability of being a musical tone.

【００５５】（３）本発明による楽音検出ブロック構成
例２図６は楽音検出ブロックの他の実施例である。後で述べ
るような本発明を適用する記録再生装置で入力画像音声
信号をチューナー系とし放送番組などを記録再生するこ
とを考えると、チューナーの性能や放送の受信状態など
によっては音声信号にバズ音やその他の雑音が含まれる
可能性がある。(3) Example 2 of Tone Detection Block Configuration According to the Present Invention FIG. 6 shows another embodiment of the tone detection block. Considering that a recording / reproducing apparatus to which the present invention is applied uses an input image / audio signal as a tuner system to record / reproduce a broadcast program or the like, depending on the performance of the tuner and the reception state of the broadcast, a buzz sound is generated in the audio signal. And other noise.

【００５６】バズ音は極端な場合では「ブー」といった
ような継続性のある音声信号でそのスペクトルピーク継
続長も継続しており、音楽信号の場合のスペクトル継続
特性との区別が困難になる可能性がある。In an extreme case, the buzz sound is a sound signal having continuity such as "boo" and its spectrum peak continuation length also continues, and it may be difficult to distinguish the buzz sound from the spectrum continuation characteristic of a music signal. There is.

【００５７】図６に示す実施例は、低域に存在する可能
性のあるバズ音を検出して誤検出の可能性を低くするこ
とを考えたブロック構成の例である。The embodiment shown in FIG. 6 is an example of a block configuration in which a buzz sound possibly existing in a low frequency band is detected to reduce the possibility of erroneous detection.

【００５８】音声信号は、前記したような所定音声区間
検出系２０に入力し、所定の平均化処理されたサンプル
データ数の音声データが、スペクトルピーク継続性検出
系ブロック系２００、および６００、リズム性検出ブロ
ック系３００、音声レベル検出系２３に各々入力する。The audio signal is input to the predetermined audio section detection system 20 as described above, and the audio data of a predetermined number of sampled data subjected to the averaging process is converted into the spectrum peak continuity detection system block systems 200 and 600, and the rhythm. It is input to the gender detection block system 300 and the audio level detection system 23, respectively.

【００５９】ここで、スペクトルピーク継続性検出ブロ
ック系２００および６００、リズム性検出ブロック系３
００は、図４において説明したブロックと重複するので
省略する。Here, the spectrum peak continuity detection block systems 200 and 600 and the rhythmic detection block system 3
00 is the same as the block described with reference to FIG.

【００６０】スペクトルピーク継続性検出ブロック６０
０の入力音声データは、スペクトルピーク継続性検出ブ
ロック２００の入力音声データに比べて、間引き処理ま
たはさらに平均化処理されたデータで周波数特性的に
は、音声データの帯域は低くなっている。Spectral peak continuity detection block 60
The input audio data of 0 is data that has been subjected to the thinning process or the averaging process and has a lower frequency band in terms of frequency characteristics than the input audio data of the spectrum peak continuity detection block 200.

【００６１】例えば、前記したように音声信号をサンプ
リング周波数４８ＫＨｚ、量子化ビット数１６と仮定
し、スペクトルピーク継続性検出ブロック系２００の入
力音声データを１／８程度に平均化処理したデータと
し、スペクトルピーク継続性検出ブロック系６００の入
力音声データは、このスペクトルピーク継続性検出ブロ
ック系２００に入力するデータをさらに１／４程度間引
くか、平均化処理したデータとすることを考える。For example, as described above, assuming that the audio signal has a sampling frequency of 48 KHz and the number of quantization bits is 16, the input audio data of the spectral peak continuity detection block system 200 is data obtained by averaging about 1/8, It is assumed that the input speech data of the spectrum peak continuity detection block system 600 is data obtained by further thinning out the data input to the spectrum peak continuity detection block system 200 by about か or averaging the data.

【００６２】すなわち、スペクトルピーク継続性検出ブ
ロック系２００の入力音声データの周波数帯域は、サン
プリング周波数を４８ＫＨｚと仮定しているので、（４８ＫＨｚ／２）／８＝３ＫＨｚ程度になっている。That is, since the sampling frequency is assumed to be 48 KHz, the frequency band of the input audio data of the spectrum peak continuity detection block system 200 is about (48 KHz / 2) / 8 = 3 KHz.

【００６３】スペクトルピーク継続性検出系６００の入
力音声データの周波数帯域については、１／４程度に間
引き処理または平均化処理されるので３ＫＨｚ／４＝７５０Ｈｚ程度の帯域になっている。The frequency band of the audio data input to the spectrum peak continuity detection system 600 is about 3 KHz / 4 = about 750 Hz because the frequency band is thinned out or averaged to about 1/4.

【００６４】ここで、スペクトルピーク継続性検出ブロ
ック系２００の入力音声データ帯域は３ＫＨｚ程度にな
っているが、一般的な音楽信号を考えた場合にその基本
周波数はその程度の周波数帯域に収まると考えられるの
で、前記したように信号処理の高速化のために平均化処
理して音声データ数を少なくしても問題ないと考えられ
る。Here, the input audio data band of the spectrum peak continuity detection block system 200 is about 3 KHz, but when a general music signal is considered, its fundamental frequency falls within that frequency band. Therefore, it is considered that there is no problem even if the number of audio data is reduced by averaging to speed up the signal processing as described above.

【００６５】また、スペクトルピーク継続性検出ブロッ
ク系６００の入力音声データの周波数帯域は７５０Ｈｚ
程度になっており、バズ音の周波数スペクトルが低域に
あることから検出できることが分かる。The frequency band of the input voice data of the spectrum peak continuity detection block system 600 is 750 Hz.
It can be seen that it can be detected from the fact that the frequency spectrum of the buzz sound is in the low band.

【００６６】スペクトルピーク継続性検出ブロック系２
００、および６００からの出力信号、リズム性検出ブロ
ック３００からの出力信号、音声レベル検出系２３から
の出力信号は総合判定系３１に入力し、入力音声信号が
楽音信号であるかの判定を行なう。Spectral peak continuity detection block system 2
The output signals from 00 and 600, the output signal from the rhythmic detection block 300, and the output signal from the audio level detection system 23 are input to the overall determination system 31, and it is determined whether the input audio signal is a tone signal. .

【００６７】ここで前記した図４のブロック構成の場合
と同様にそれぞれの検出系からの検出結果に重み付け処
理するなどして総合的に判定処理を行なうことを考え
る。Here, as in the case of the block configuration of FIG. 4 described above, it is considered that the determination result is comprehensively performed by weighting the detection results from the respective detection systems.

【００６８】前記した（２．４）の場合と同様に総合判
定結果をＷとし、スペクトルピーク継続性検出ブロック
系２００の結果をＳｐ１としその重み付け係数をα１、
スペクトルピーク継続性検出ブロック系６００の検出結
果をＳｐ２としその重み付け係数をα２とし、リズム性
検出ブロック系３００の検出結果をＬｄ、その重み付け
係数をβとし、音声レベル検出系２３の検出結果をＬ
ｄ、その重み付け係数をγとすると、Ｗ＝α１・Ｓｐ１＋α２（１−Ｓｐ２）＋β・Ｒｈ＋γ（１−Ｌｄ）（３．１）となる。この場合は、Ｗが４に近いほどその区間の音声
信号は楽音信号である可能性があると判定される。As in the case of the above (2.4), the overall judgment result is W, the result of the spectrum peak continuity detection block system 200 is Sp1, the weighting factor is α1,
The detection result of the spectrum peak continuity detection block system 600 is Sp2, its weighting factor is α2, the detection result of the rhythmic detection block system 300 is Ld, its weighting factor is β, and the detection result of the audio level detection system 23 is L.
d, assuming that the weighting coefficient is γ, W = α1 · Sp1 + α2 (1-Sp2) + β · Rh + γ (1-Ld) (3.1) In this case, it is determined that the closer W is to 4, the more likely the audio signal in that section is a tone signal.

【００６９】または（２．５）と同様に考え、Ｗ＝γ（１−Ｌｄ）（α１・Ｓｐ１＋α２（１−Ｓｐ２）＋β・Ｒｈ）（３．２）となる。この場合は、Ｗが３に近いほどその区間の音声
信号は楽音信号である可能性があると判定される。Or, in the same way as in (2.5), W = γ (1-Ld) (α1 · Sp1 + α2 (1-Sp2) + β · Rh) (3.2) In this case, it is determined that the closer the W is to 3, the more likely the audio signal in that section is a tone signal.

【００７０】（４）本発明による記録再生装置ブロック
構成例図７は本発明を情報信号記録再生装置に適用した実施例
のブロック構成図の一例である。先ず、始めに記録系に
ついて説明する。(4) Example of Block Configuration of Recording / Reproducing Apparatus According to the Present Invention FIG. 7 is an example of a block configuration diagram of an embodiment in which the present invention is applied to an information signal recording / reproducing apparatus. First, the recording system will be described.

【００７１】映像信号は、映像信号Ａ／Ｄ変換処理系１
に、音声信号は音声信号Ａ／Ｄ変換処理系１１に各々入
力し、所定のＡ／Ｄ変換処理がなされる。この映像信号
は、所定のＡ／Ｄ変換処理された後に、次の映像信号エ
ンコード処理系２に入力し、例えばＭＰＥＧなど所定の
帯域圧縮処理がなされ多重化処理系３に入力する。音声
信号は、所定のＡ／Ｄ変換された後、音声信号エンコー
ド処理系１０および楽音検出処理系１６に入力する。The video signal is supplied to the video signal A / D conversion processing system 1
Then, the audio signal is input to the audio signal A / D conversion processing system 11 and is subjected to a predetermined A / D conversion process. After this video signal is subjected to a predetermined A / D conversion process, the video signal is input to the next video signal encoding processing system 2, where a predetermined band compression process such as MPEG is performed, and the video signal is input to the multiplexing processing system 3. The audio signal is input to an audio signal encoding processing system 10 and a musical sound detection processing system 16 after predetermined A / D conversion.

【００７２】音声信号エンコード処理系１０では、例え
ばＭＰＥＧオーディオなど所定の帯域圧縮処理がなさ
れ、次の多重化処理系３に入力する。In the audio signal encoding processing system 10, a predetermined band compression processing such as MPEG audio is performed, and is input to the next multiplex processing system 3.

【００７３】楽音検出処理系１６では、前記の楽音検出
ブロックの説明のように所定の楽音検出判定処理が行わ
れ、その出力はシステムコントローラー系１７に入力
し、システムコントローラー系１７の制御で識別信号生
成系１２において所定の識別信号が生成され、多重化処
理系３に入力する。The tone detection processing system 16 performs predetermined tone detection determination processing as described in the above tone detection block. The output of the tone detection processing system 16 is input to the system controller system 17, and the identification signal is controlled by the system controller system 17. A predetermined identification signal is generated in the generation system 12 and input to the multiplex processing system 3.

【００７４】多重化信号処理系３では、映像データ、音
声データ、識別データなど入力された所定の信号は所定
の多重化信号処理により処理され、次の記録処理系４に
おいて誤り訂正符号付加やインターリーブなどの所定の
信号処理がなされた後、所定の記録媒体５に記録され
る。In the multiplexed signal processing system 3, predetermined signals inputted such as video data, audio data, identification data and the like are processed by predetermined multiplexed signal processing, and the next recording processing system 4 adds error correction codes and interleaves. After a predetermined signal processing such as is performed, it is recorded on a predetermined recording medium 5.

【００７５】次に、再生系について説明する。記録媒体
５から再生された信号は再生信号処理系６で所定の誤り
訂正処理やデインターリーブ処理などが行なわれ、次の
再生データ分離処理系７で映像データ、音声データ、識
別データが所定の信号処理により、それぞれ分離され
る。Next, the reproducing system will be described. The signal reproduced from the recording medium 5 is subjected to predetermined error correction processing and deinterleaving processing in a reproduction signal processing system 6, and the next reproduction data separation processing system 7 converts the video data, audio data, and identification data into predetermined signals. Each is separated by the processing.

【００７６】分離処理された映像データ信号は、映像デ
ータデコード処理系８に入力し所定のデコード処理が行
われた後、映像信号Ｄ／Ａ変換処理系９において所定の
Ｄ／Ａ変換処理がなされ映像信号出力する。The separated video data signal is input to a video data decoding processing system 8 and subjected to a predetermined decoding process, and then subjected to a predetermined D / A conversion process in a video signal D / A conversion processing system 9. Output video signal.

【００７７】同様に分離処理された音声データ信号は、
音声データデコード処理系１４に入力し所定のデコード
処理が行われた後、音声信号Ｄ／Ａ変換処理系１５にお
いて所定のＤ／Ａ変換処理がなされ音声信号出力する。The audio data signal similarly separated is
After being input to the audio data decoding processing system 14 and subjected to predetermined decoding processing, the audio signal D / A conversion processing system 15 performs predetermined D / A conversion processing and outputs an audio signal.

【００７８】再生データ分離処理系７で分離処理された
識別データ信号は、識別信号検出系１８で所定の検出処
理が行われ、出力信号はシステムコントローラー系１７
に入力する。The identification data signal separated by the reproduction data separation processing system 7 is subjected to predetermined detection processing by the identification signal detection system 18, and the output signal is output to the system controller system 17.
To enter.

【００７９】システムコントローラー系１７は、検出さ
れた識別信号に応じて記録媒体制御系１３を介して記録
媒体５の再生動作を制御し、楽音信号検出区間をスキッ
プしたり、楽音検出区間をのみを再生するなどの所定の
特殊再生動作を行う。The system controller system 17 controls the reproduction operation of the recording medium 5 via the recording medium control system 13 in accordance with the detected identification signal, and skips the tone signal detection section or controls only the tone detection section. A predetermined special reproduction operation such as reproduction is performed.

【００８０】（５）動作フローチャート例１図８は本発明の動作フローチャートであり、このフロー
チャートを用いて動作説明を行なう。(5) Example 1 of Operation Flowchart FIG. 8 is an operation flowchart of the present invention, and the operation will be described with reference to this flowchart.

【００８１】Ｓ０からスタートしＳ１でループ回数の初
期ｍ＝０を設定し、Ｓ２で音声信号を入力する。Starting from S0, the initial number of loops m = 0 is set in S1, and an audio signal is input in S2.

【００８２】Ｓ３では入力した音声信号を所定区間また
は所定サンプルデータ数分のデータを検出し、Ｓ４でピ
ーク検出ループの回数の初期値設定ｎ＝０を設定する。In S3, data of a predetermined section or a predetermined number of sample data of the input audio signal is detected, and in S4, an initial value setting n = 0 of the number of peak detection loops is set.

【００８３】Ｓ５では所定区間毎に検出された音声デー
タを所定のＦＦＴ解析処理を行ない、Ｓ６ではパワース
ペクトルを算出し、Ｓ７でｎ次ピーク周波数を検出す
る。In step S5, a predetermined FFT analysis process is performed on the audio data detected in each predetermined section. In step S6, a power spectrum is calculated. In step S7, an n-th peak frequency is detected.

【００８４】Ｓ８では検出されたｎ次ピーク周波数を所
定のメモリーに記憶し、Ｓ９では次のピークを検出する
ためにｎを１インクリメントする。At S8, the detected n-th peak frequency is stored in a predetermined memory, and at S9, n is incremented by 1 to detect the next peak.

【００８５】Ｓ１０では所定の数のｎ次ピーク周波数が
検出されたかが判定され、まだ所定のピーク周波数が検
出されていない場合は、Ｓ７に戻りピーク周波数の検出
を行なう。In S10, it is determined whether a predetermined number of n-order peak frequencies have been detected. If the predetermined peak frequency has not been detected yet, the process returns to S7 to detect the peak frequency.

【００８６】Ｓ１０で所定のピーク周波数がすべて検出
されたと判定された場合はＳ１１に移行し、所定区間信
号処理の回数を１インクリメントし、Ｓ１２で所定時間
ｍａの処理が終了したかの判定がなされる。If it is determined in S10 that all of the predetermined peak frequencies have been detected, the process proceeds to S11, where the number of times of the signal processing in the predetermined section is incremented by one, and in S12, it is determined whether the processing for the predetermined time ma has been completed. You.

【００８７】Ｓ１２でまだ所定時間分の処理が終了して
いない場合は、上記のＳ３に戻り次の所定区間における
上記と同様なピーク周波数検出を行なう。If the processing for the predetermined time has not yet been completed in S12, the flow returns to S3, and the same peak frequency detection in the next predetermined section is performed.

【００８８】Ｓ１２で所定時間の検出処理が終了したと
判定された場合は、Ｓ１３に移行し、スペクトルピーク
の継続長ルーチンの回数初期設定ｎ＝０を設定する。If it is determined in S12 that the detection process for the predetermined time has been completed, the flow shifts to S13, where the initial setting n = 0 of the number of times of the spectrum peak continuation length routine is set.

【００８９】Ｓ１４ではｎ次ピークの継続長初期値ｋｎ
＝０を設定し、Ｓ１５でｍａ区間に渡るスペクトルピー
ク継続長が検出され、Ｓ１６で所定の継続性が認めれる
場合はＳ１７で継続長カウンタを１増加し、そうでない
場合は次のＳ１８に移行する。In S14, the continuation length initial value kn of the n-th peak is obtained.
= 0, the spectrum peak continuation length over the ma section is detected in S15, and if the predetermined continuity is recognized in S16, the continuation length counter is incremented by 1 in S17, and if not, the process proceeds to the next S18. I do.

【００９０】Ｓ１８でｎ次ピーク継続性検出の回数を１
増加し、次のＳ１９で所定ピークの継続長検出が終了し
たかの判定がなされ、終了していない場合は上記のＳ１
５に戻り継続長検出を行なう。In S18, the number of times of the n-th peak continuity detection is set to 1
It is determined in the next S19 whether the detection of the continuation length of the predetermined peak has been completed.
Returning to step 5, the continuation length is detected.

【００９１】Ｓ１９で継続長検出が終了したと判定され
た場合は、次のＳ２０で検出されたピーク周波数におけ
るＳ／Ｎが検出されＳ２１で検出されたＳ／Ｎレベルの
判定が行なわれる。If it is determined in S19 that the continuation length detection has been completed, the S / N at the peak frequency detected in S20 is detected, and the S / N level detected in S21 is determined.

【００９２】ここで、Ｓ／Ｎの検出はピーク検出と継続
長検出のあと行なっているが、Ｓ６におけるパワースペ
クトル検出時に行なっても良い。Here, the S / N detection is performed after the peak detection and the continuation length detection, but may be performed when the power spectrum is detected in S6.

【００９３】Ｓ２１で所定レベル以上のＳ／Ｎであれば
Ｓ２２でＳ２２でスペクトルピークの継続長判定が行わ
れＳ２３で所定継続長以上の場合はＳ２４でスペクトル
継続長判定としては楽音らしいとする＜１＞を設定し、
Ｓ２３でそうではないと判定される場合はＳ２５で＜０
＞を設定する。If the S / N is equal to or more than the predetermined level in S21, the continuation length of the spectrum peak is determined in S22 in S22. 1>
If it is determined in S23 that this is not the case, <0 in S25
Set>.

【００９４】Ｓ２１で所定レベル以上のＳ／Ｎではない
と判定される場合はＳ２５において＜０＞と設定する。If it is determined in S21 that the S / N is not higher than the predetermined level, <0> is set in S25.

【００９５】以上のように、本発明によるスペクトル継
続長検出法のみから楽音判定を行なう他に、その他の判
定パラメータとしてリズム性を検出することも考えれ、
これについて次に説明する。As described above, in addition to performing the tone determination using only the spectrum continuation length detection method according to the present invention, it is conceivable to detect rhythmicity as another determination parameter.
This will be described below.

【００９６】（６）動作フローチャート例２図９は前記した動作フローチャートとは別の本発明によ
る動作フローチャート例である。この動作フローチャー
トはスペクトル継続性判定の他にリズム性を判定のパラ
メータに加えたものである。(6) Example 2 of Operation Flowchart FIG. 9 is an example of an operation flowchart according to the present invention, which is different from the above-described operation flowchart. In this operation flowchart, in addition to the spectrum continuity determination, rhythmicity is added to the determination parameters.

【００９７】Ｐ０からスタートしＰ１のスペクトル継続
性検出ルーチンに移行する。尚、このスペクトル継続性
検出ルーチンは前記した動作フローチャート１の説明と
重複するので、ここでは省略する。Starting from P0, the process proceeds to the spectrum continuity detection routine of P1. Note that this spectrum continuity detection routine is the same as that of the operation flowchart 1 described above, and is therefore omitted here.

【００９８】スペクトル継続性検出の後、Ｐ２で所定区
間の音声信号はエンベロープ検波されＰ３で自己相関関
数が演算される。After detecting the spectrum continuity, the audio signal in the predetermined section is envelope-detected at P2, and the autocorrelation function is calculated at P3.

【００９９】Ｐ３で自己相関関数演算された信号は、Ｐ
４でＦＦＴ解析処理されＰ５でパワースペクトルが演算
され、Ｐ６において第１ピーク周波数のＳ／Ｎが検出さ
れる。The signal for which the autocorrelation function is calculated in P3 is P
The FFT analysis processing is performed in step 4, the power spectrum is calculated in step P5, and the S / N of the first peak frequency is detected in step P6.

【０１００】Ｐ７では検出されたＳ／Ｎのレベルが判定
され、所定レベル以上と判定される場合はＰ８において
リズム性が認められるとする判定Ｒを＜１＞に設定す
る。At P7, the level of the detected S / N is determined, and when it is determined that the level is equal to or higher than the predetermined level, the determination R at P8 that rhythm is recognized is set to <1>.

【０１０１】Ｐ７で所定レベル以上でない場合は、リズ
ム性があまり認められない場合と考えられ判定Ｒを＜０
＞と設定する。If the level is not equal to or higher than the predetermined level in P7, it is considered that rhythmicity is not recognized so much, and the judgment R is set to <0.
>

【０１０２】以上のようにリズム性が判定された後、Ｐ
１０の楽音総合判定処理に移行して前記したスペクトル
継続長判定およびリズム性判定の結果から総合的に入力
音声信号が音楽信号であるかの判定を行なう。After the rhythmicity is determined as described above, P
The process proceeds to the musical tone comprehensive determination process of FIG. 10, and it is determined whether the input audio signal is a music signal based on the results of the spectrum continuation length determination and the rhythmicity determination described above.

【０１０３】Ｐ１１で先のスペクトル継続長判定で、楽
音信号と認められる判定＜１＞の場合はＰ１２に移行
し、さらにリズム性も認められる判定＜１＞の場合は最
も音楽信号らしいとする総合判定Ｇは＜１１＞と設定す
る。In P11, in the above determination of the spectrum continuation length, in the case of the judgment <1> that is recognized as a tone signal, the process proceeds to P12, and in the case of the judgment <1> in which rhythmicity is also recognized, it is determined that the signal is most likely to be a music signal. Determination G is set to <11>.

【０１０４】Ｐ１２でスペクトル継続長は認められるが
リズム性が認められない場合は総合判定Ｇとして次に音
楽信号らしいとする＜１０＞を設定する。If the spectrum continuation length is recognized but the rhythmicity is not recognized in P12, <10> is set as the overall judgment G to be the next music signal.

【０１０５】Ｐ１１でスペクトル継続性判定で＜０＞と
判定される場合はＰ１４に移行し、リズム性のみ認めら
れるとする場合はＰ１６で音楽判定としては優先順位は
低いが音楽信号の可能性があるかもしれないとする総合
判定Ｇとして＜０１＞を設定する。If the spectrum continuity is determined to be <0> in P11, the process proceeds to P14, and if only rhythmicity is recognized, the priority is low in music determination in P16, but the possibility of a music signal is high. <01> is set as the overall judgment G that may exist.

【０１０６】Ｐ１４でスペクトル継続性も認められず、
リズム性も認められない場合は、総合判定Ｇとして音楽
信号でないとする＜００＞を設定する。At P14, no spectral continuity was observed.
If the rhythmicity is not recognized, the general judgment G is set to <00> indicating that the signal is not a music signal.

【０１０７】以上のように、ここの動作フローチャート
ではスペクトル継続長判定とリズム性判定から音声信号
の音楽信号らしさの判定として可能性を４通り設定した
が、種類をさらに細分化して判定しても良く、また判定
結果をスペクトル継続長判定とリズム性判定が認められ
る場合のみ音楽とするようにしても良い。As described above, in the operation flowchart here, four possibilities are set as the judgment of the music signal likeness of the audio signal from the judgment of the spectrum continuation length and the judgment of the rhythmicity. Alternatively, the determination result may be music only when the spectrum continuation length determination and the rhythmicity determination are recognized.

【０１０８】このように４通りの判定を行なった場合
に、例えば記録媒体に音声信号と識別信号を記録し、再
生時に識別信号を再生し特殊再生を行なうなどの場合
は、特殊再生時間なたは再生速度に応じて識別信号も４
通り設定し、その４種類の識別信号に応じて再生動作を
制御することも考えれる。When the four kinds of determinations are performed as described above, for example, when an audio signal and an identification signal are recorded on a recording medium and the identification signal is reproduced at the time of reproduction to perform special reproduction, the special reproduction time is not required. Indicates that the identification signal is 4 depending on the playback speed.
It is also conceivable that the reproduction operation is controlled in accordance with the four types of identification signals.

【０１０９】例えば、音楽区間をスキップ再生する動作
を考えた場合に最も短時間で特殊再生動作を行いたい場
合は、総合判定Ｇが＜１１＞、＜１０＞、＜０１＞に応
じた識別信号が再生検出された区間をすべてスキップす
れば良い。For example, when a special reproduction operation is to be performed in the shortest time in consideration of an operation of skipping reproduction of a music section, an identification signal corresponding to <11>, <10>, and <01> is determined in the general judgment G. May be skipped in all the sections in which reproduction is detected.

【０１１０】また、次に短時間で特殊再生を行いたい場
合は総合判定Ｇが＜１１＞、＜１０＞に応じた識別信号
区間をスキップすれば良く、最も音楽信号区間であると
思われる区間のみスキップ動作させたい場合は総合判定
Ｇが＜１１＞に応じた識別信号区間のみスキップ動作を
行なえば良い。If it is desired to perform special reproduction in the next short time, the overall judgment G may skip the identification signal section corresponding to <11> or <10>, and the section considered to be the most music signal section. When it is desired to perform only the skip operation, the skip operation may be performed only in the identification signal section corresponding to the comprehensive judgment G according to <11>.

【０１１１】また、例えば放送番組などで音楽歌謡番組
を記録再生する場合を考えた場合には上記の総合判定Ｇ
に応じた識別信号区間を特殊再生の時間に応じて再生す
れば良い。For example, in the case of recording and reproducing a music / song program in a broadcast program, etc.,
May be reproduced according to the special reproduction time.

【０１１２】[0112]

【発明の効果】本発明により放送番組などから容易に楽
音区間を検出することができ、それにより例えば、音楽
番組や歌謡番組を記録して再生する場合に音楽区間にイ
ンデックスを自動的に打ち込み、再生にその部分をダイ
ジェスト再生するなどして、音楽、歌謡番組を効率的に
効果的に楽しむことができる。According to the present invention, a musical tone section can be easily detected from a broadcast program or the like, whereby, for example, when a music program or a song program is recorded and reproduced, an index is automatically entered into the music section. By playing back that part in a digest for playback, music and song programs can be enjoyed efficiently and effectively.

【０１１３】また音楽でリズム性が顕著でない楽音信号
においても効率良く楽音検出ができるようになり、それ
により音楽ジャンルによらない楽音検出システムが比較
的簡単な構成で実現できるようになった。Further, it is possible to efficiently detect a musical tone even in a musical tone signal in which rhythmicity is not remarkable in a music, and thereby a musical tone detecting system independent of a music genre can be realized with a relatively simple configuration.

[Brief description of the drawings]

【図１】本発明に係る情報信号処理の動作原理を示した
説明図である。FIG. 1 is an explanatory diagram showing the operation principle of information signal processing according to the present invention.

【図２】同ＦＥＴ解析によるパワースペクトルの一例を
示した概念図である。FIG. 2 is a conceptual diagram showing an example of a power spectrum by the FET analysis.

【図３】同楽音信号におけるリズム検出方法の一例を示
した概念図である。FIG. 3 is a conceptual diagram showing an example of a rhythm detection method for the musical sound signal.

【図４】同楽音検出系のいちれいを示したブロック構成
図である。FIG. 4 is a block diagram showing the configuration of the musical sound detection system.

【図５】同会話音声波形と音楽（楽音）波形の相違を示
した概念図である。FIG. 5 is a conceptual diagram showing a difference between the conversation voice waveform and a music (musical sound) waveform.

【図６】同楽音検出系ブロックの他の実施例の構成例を
示した概念図である。FIG. 6 is a conceptual diagram showing a configuration example of another embodiment of the musical tone detection system block.

【図７】同本発明の記録再生装置の一例を示したブロッ
ク構成図である。FIG. 7 is a block diagram showing an example of the recording / reproducing apparatus of the present invention.

【図８】同本発明の動作の一例を示したフローチャート
である。FIG. 8 is a flowchart showing an example of the operation of the present invention.

【図９】本発明の動作の一例を示したフローチャートで
ある。FIG. 9 is a flowchart showing an example of the operation of the present invention.

【図１０】従来技術における楽音検出の様子を示した概
念図である。FIG. 10 is a conceptual diagram showing a tone detection in the related art.

[Explanation of symbols]

１：映像信号Ａ／Ｄ変換系、２：映像信号エンコード処
理系、３：多重化処理系、４：記録信号処理系、５：記
録媒体、６：再生信号処理系、７：再生データ分離処理
系、８：映像データエンコード処理系、９：映像信号Ｄ
／Ａ変換系、１０：音声データエンコード処理系、１
１：音声信号Ａ／Ｄ変換系、１２：識別信号生成系、１
３：記録媒体制御系、１４：音声データデコード処理
系、１５：音声データＤ／Ａ処理系、１６：楽音検出
系、１７：システムコントローラー系、１８：識別信号
検出系、２０：所定音声区間検出系、２１：ＦＦＴ（高
速フーリエ変換）処理系、２２：パワースペクトル検出
系、２３：信号レベル検出系、２４：エンベロープ検出
系、２５：自己相関関数演算系、２６：ＦＦＴ（高速フ
ーリエ変換）処理系、２７：パワースペクトル検出系、
２８：ピーク周波数検出系、２９：Ｓ／Ｎ検出系、３
０：スペクトルピーク継続長検出系、３１：総合判定
系、３２：ＦＦＴ（高速フーリエ変換）処理系、２０
０：スペクトルピーク継続性検出ブロック系、３００：
リズム性検出ブロック系、４０１：第１次スペクトルピ
ーク周波数検出系、４０２：第２次スペクトルピーク周
波数検出系、４０３：第３次スペクトルピーク周波数検
出系、４０９：第９次スペクトルピーク周波数検出系、
５０１：Ｓ／Ｎ検出系、５０２：Ｓ／Ｎ検出系、５０
３：Ｓ／Ｎ検出系、５０９：Ｓ／Ｎ検出系、６００：ス
ペクトルピーク継続性検出ブロック系1: video signal A / D conversion system, 2: video signal encoding system, 3: multiplex processing system, 4: recording signal processing system, 5: recording medium, 6: reproduction signal processing system, 7: reproduction data separation processing System, 8: video data encoding system, 9: video signal D
/ A conversion system, 10: audio data encoding system, 1
1: audio signal A / D conversion system, 12: identification signal generation system, 1
3: recording medium control system, 14: audio data decode processing system, 15: audio data D / A processing system, 16: musical sound detection system, 17: system controller system, 18: identification signal detection system, 20: predetermined audio section detection System, 21: FFT (Fast Fourier Transform) processing system, 22: Power spectrum detection system, 23: Signal level detection system, 24: Envelope detection system, 25: Autocorrelation function operation system, 26: FFT (Fast Fourier Transform) processing System, 27: power spectrum detection system,
28: peak frequency detection system, 29: S / N detection system, 3
0: spectrum peak duration detection system, 31: comprehensive judgment system, 32: FFT (fast Fourier transform) processing system, 20
0: Spectrum peak continuity detection block system, 300:
Rhythm detection block system, 401: first spectrum peak frequency detection system, 402: second spectrum peak frequency detection system, 403: third spectrum peak frequency detection system, 409: ninth spectrum peak frequency detection system,
501: S / N detection system, 502: S / N detection system, 50
3: S / N detection system, 509: S / N detection system, 600: Spectral peak continuity detection block system

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考） // Ｇ１０Ｌ 101:027 Ｆターム(参考） 5C053 GB11 GB22 JA12 JA21 LA07 LA14 5D015 AA06 CC01 CC03 DD03 5D044 AB05 AB07 BC01 CC05 DE19 DE40 DE49 EF05 FG18 FG23 GK12 HL11 Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (reference) // G10L 101: 027 F term (reference) 5C053 GB11 GB22 JA12 JA21 LA07 LA14 5D015 AA06 CC01 CC03 DD03 5D044 AB05 AB07 BC01 CC05 DE19 DE40 DE49 EF05 FG18 FG23 GK12 HL11

Claims

[Claims]

A frequency analysis means for frequency-analyzing a predetermined input information signal for each predetermined section, and a predetermined maximum value of an analysis processing value in a predetermined frequency range for each predetermined section from an analysis signal of the frequency analysis means. A peak frequency detecting means for sequentially detecting a plurality of peak frequencies, a predetermined signal including the detected frequency in another section in a forward or backward direction in which the detection signal frequency from the peak frequency detecting means in the predetermined section is continuous with the section. Signal determination means for determining whether a signal is detected within a frequency range; and signal detection for detecting an analysis signal level at each detection frequency detected by the peak frequency detection means or a signal level in a predetermined section of the input signal. Means, and signal determination means for determining an attribute of the signal from a signal from the signal determination means and a signal from the signal detection means. An information signal processing device, characterized in that:

2. The information signal processing apparatus according to claim 1, wherein said predetermined information signal is an audio signal.

3. The method of claim 2, wherein the frequency analysis is performed by subjecting the information signal to a predetermined averaging process or a predetermined thinning process, and then performing an FFT.
2. The information signal processing apparatus according to claim 1, wherein (fast Fourier transform processing) or DCT (discrete cosine transform processing) or a frequency analysis similar thereto is performed.

4. The information signal processing apparatus according to claim 1, wherein said signal detecting means detects a signal-to-noise ratio from a detected signal level and a predetermined noise level within a predetermined frequency band.

5. A predetermined input information signal is frequency-analyzed for each predetermined section, and a peak frequency corresponding to a predetermined maximum value of an analysis value in a predetermined frequency range is predetermined for each predetermined section from the analysis signal from the frequency analysis. A plurality of detections are performed, and it is determined whether a signal is detected within a predetermined frequency range including the detection frequency in another section in the forward or backward direction in which the detection signal frequency is continuous with the section from the peak frequency detection in the predetermined section. An information signal processing method, wherein an attribute of the input information signal is determined by the signal determination.

6. The information signal processing method according to claim 5, wherein said predetermined information signal is an audio signal.

7. The method according to claim 1, wherein the frequency analysis is performed by subjecting the information signal to a predetermined averaging process or a predetermined thinning process, and then performing an FFT.
6. The information signal processing method according to claim 5, wherein (fast Fourier transform) processing, DCT (discrete cosine transform) processing, or frequency analysis similar thereto is performed.

8. A frequency analysis means for frequency-analyzing an input information signal from a predetermined input means for each predetermined section, and a predetermined maximum value of an analysis value in a predetermined frequency range for each predetermined section from the analysis signal from said frequency analysis means. A peak frequency detecting means for detecting a predetermined plurality of peak frequencies corresponding to the value, and a detection signal frequency from the peak frequency detecting means in the above-mentioned predetermined section is also detected in another section in the forward or backward direction connected to the section. Signal determination means for determining whether a signal is detected within a predetermined frequency range including: signal detection means for detecting signal levels at a plurality of detection frequencies detected by the peak frequency detection means; Means and signal determination means for determining the attribute of the signal from the signal from the signal detection means; Identification signal generating means for generating an identification signal, a recording means for recording the signal from the identification signal generating means and the information signal on a predetermined recording medium, a predetermined information signal recorded on the recording medium and a predetermined An information signal recording / reproducing apparatus, comprising: reproducing means for reproducing an identification signal; and reproduction control means for controlling reproduction from the recording medium according to the reproduced identification signal.

9. An information signal recording / reproducing apparatus according to claim 8, wherein said predetermined information signal is an audio signal.

10. The frequency analysis according to claim 1, wherein said information signal is subjected to a predetermined averaging process or a predetermined decimation process and then subjected to an FFT.
9. The information signal recording apparatus according to claim 8, wherein (fast Fourier transform processing) or DCT (discrete cosine transform processing) or a frequency analysis similar thereto is performed.

11. An information signal recording / reproducing apparatus according to claim 8, wherein said signal detecting means detects a signal-to-noise ratio from a detected signal level and a predetermined noise level within a predetermined frequency band.

12. A predetermined information signal is subjected to frequency analysis for each predetermined section, and a peak frequency corresponding to a predetermined maximum value of an analysis value in a predetermined frequency range is sequentially determined for each predetermined section from the analysis signal from the frequency analysis. A plurality of detections, it is determined whether a signal is detected within a predetermined frequency range including the detection frequency also in another section in the forward or backward direction where the detection signal frequency is continuous from the peak frequency detection in the predetermined section to the section, When a predetermined section of the information signal is determined to be a predetermined attribute signal by the signal determination, a predetermined identification signal for identifying the predetermined section is recorded in a predetermined area of a recording medium. Medium.

13. The information signal recording medium according to claim 12, wherein said information signal is an audio signal.

14. The method according to claim 1, wherein the frequency analysis is performed by subjecting the information signal to a predetermined averaging process or a predetermined thinning process and then performing an FFT.
13. The information signal recording apparatus according to claim 12, wherein (fast Fourier transform processing), DCT (discrete cosine transform processing), or frequency analysis similar thereto is performed.