JP6035785B2

JP6035785B2 - Acoustic analysis apparatus and acoustic analysis method

Info

Publication number: JP6035785B2
Application number: JP2012051634A
Authority: JP
Inventors: 慶太有元; リカルド・マークサー; ジョルディ・ジェイナー; 近藤　多伸; 多伸近藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-03-08
Filing date: 2012-03-08
Publication date: 2016-11-30
Anticipated expiration: 2032-03-08
Also published as: JP2013186312A

Description

本発明は、音響信号を解析する技術に関する。 The present invention relates to a technique for analyzing an acoustic signal.

音響信号のピッチ（基本周波数）を検出する技術が従来から提案されている。例えば非特許文献１や特許文献１には、複数の音響が混合された音響信号を対象としてピッチを検出する技術が開示されている。 Techniques for detecting the pitch (fundamental frequency) of acoustic signals have been conventionally proposed. For example, Non-Patent Document 1 and Patent Document 1 disclose a technique for detecting a pitch for an acoustic signal in which a plurality of sounds are mixed.

A.P.Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," IEEE Trans. Speech and Audio Proc., 11(6), 804-816, 2003A.P.Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," IEEE Trans. Speech and Audio Proc., 11 (6), 804-816, 2003

特開２００１−１２５５６２号公報JP 2001-125562 A

非特許文献１や特許文献１の技術では音響信号のピッチについて複数の候補が推定されるが、複数の候補から利用者が所望のピッチを選択する構成は提案されていない。したがって、誤検出されたピッチの音響成分を利用者からの指示に応じて補助的に除外したり、音響信号のうち利用者の所望の音響成分（例えば歌唱音や楽器音）を分離したりすることができないという問題がある。以上の事情を考慮して、本発明は、音響信号から特定されるピッチを利用者が容易に選択できるようにすることを目的とする。 In the techniques of Non-Patent Document 1 and Patent Document 1, a plurality of candidates are estimated for the pitch of an acoustic signal, but a configuration in which a user selects a desired pitch from a plurality of candidates has not been proposed. Therefore, the acoustic component of the erroneously detected pitch is excluded in an auxiliary manner according to an instruction from the user, or the user's desired acoustic component (for example, singing sound or instrument sound) is separated from the acoustic signal. There is a problem that can not be. In view of the above circumstances, an object of the present invention is to enable a user to easily select a pitch specified from an acoustic signal.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の要素と後述の実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。 Means employed by the present invention to solve the above problems will be described. In order to facilitate the understanding of the present invention, in the following description, the correspondence between the elements of the present invention and the elements of the embodiments described later will be indicated in parentheses, but the scope of the present invention will be exemplified in the embodiments. It is not intended to be limited.

本発明の音響解析装置は、音響信号（例えば音響信号Ｘ）から特定されたピッチの時間変化を表現する複数の候補軌跡（例えば候補軌跡Ｐ0）を周波数-時間領域に配置した解析画像を表示装置に表示させる表示制御手段（例えば表示制御部２４）と、周波数-時間領域に対する指示軌跡（例えば指示軌跡ＰA）の指定を利用者から受付ける指示受付手段（例えば指示受付部２６）と、指示軌跡に対応した位置の候補軌跡に沿う選択軌跡（例えば選択軌跡ＰB）を周波数-時間領域に表示させる軌跡設定手段（例えば軌跡設定部２８）とを具備する。以上の構成では、周波数-時間領域内に表示された候補軌跡に沿う選択軌跡が利用者からの指示（指示軌跡の指定）に応じて設定されるから、音響信号の特定の音響成分のピッチを利用者が直観的かつ容易に選択することが可能である。また、利用者が指定した指示軌跡に対応する位置の候補軌跡に沿う選択軌跡が設定されるから、候補軌跡に厳密に合致するように指示軌跡を指定しなくても適切な選択軌跡を選択できる。 The acoustic analysis device of the present invention displays an analysis image in which a plurality of candidate trajectories (for example, candidate trajectory P0) expressing a time change in pitch specified from an acoustic signal (for example, acoustic signal X) are arranged in the frequency-time domain. Display control means (for example, the display control section 24) to be displayed on the display, instruction receiving means (for example, the instruction receiving section 26) for receiving designation of the instruction locus (for example, the instruction locus PA) for the frequency-time domain from the user, Trajectory setting means (for example, trajectory setting unit 28) is provided for displaying a selected trajectory (for example, selected trajectory PB) along the candidate trajectory of the corresponding position in the frequency-time domain. In the above configuration, since the selection trajectory along the candidate trajectory displayed in the frequency-time domain is set according to the instruction from the user (designation of the instruction trajectory), the pitch of the specific acoustic component of the acoustic signal is set. The user can select intuitively and easily. Further, since a selection trajectory is set along a candidate trajectory at a position corresponding to the instruction trajectory designated by the user, an appropriate selection trajectory can be selected without designating the instruction trajectory so as to exactly match the candidate trajectory. .

本発明の好適な態様に係る音響解析装置は、選択軌跡に対応するピッチの分離信号（例えば分離信号Ｙ）を生成する音響処理手段（例えば音響処理部３０）を具備する。以上の構成では、選択軌跡に対応するピッチの分離信号が生成されるから、利用者は、選択軌跡に対応する再生音を聴取しながら指示軌跡を指定することが可能である。具体的には、音響処理手段は、音響信号のうち選択軌跡に対応した音響成分を分離するための分離フィルタを当該音響信号に作用させることで分離信号を生成する。以上の構成では、選択軌跡に対応した音響成分について高音質な再生音を再生できるという利点がある。他の態様において、音響処理手段は、選択軌跡に対応した周波数の純音を示す分離信号を生成する。すなわち、音響処理手段は、音響信号とは独立に分離信号を生成する。以上の態様によれば、分離フィルタを音響信号に作用させる構成と比較して音響処理手段の処理負荷が軽減されるという利点がある。 The acoustic analysis apparatus according to a preferred aspect of the present invention includes acoustic processing means (for example, the acoustic processing unit 30) that generates a separation signal (for example, the separation signal Y) having a pitch corresponding to the selected locus. In the above configuration, since a separation signal having a pitch corresponding to the selected locus is generated, the user can specify the instruction locus while listening to the reproduction sound corresponding to the selected locus. Specifically, the acoustic processing means generates a separated signal by causing a separation filter for separating an acoustic component corresponding to the selected locus in the acoustic signal to act on the acoustic signal. With the above configuration, there is an advantage that high-quality reproduced sound can be reproduced for the acoustic component corresponding to the selected locus. In another aspect, the sound processing means generates a separation signal indicating a pure tone having a frequency corresponding to the selected locus. That is, the sound processing means generates a separated signal independently of the sound signal. According to the above aspect, there exists an advantage that the processing load of an acoustic processing means is reduced compared with the structure which makes a separation filter act on an acoustic signal.

本発明の好適な態様において、表示制御手段は、音響信号から算定されたピッチ尤度（例えばピッチ尤度Ｌ(k,m)）の分布を、ピッチ尤度のピークの軌跡が候補軌跡となるように周波数-時間領域に表示し、軌跡設定手段は、指示軌跡上の各地点を包含する周波数軸方向の所定範囲（例えば範囲Ｒ）内でピッチ尤度が最大となる候補軌跡上の各地点の配列を選択軌跡として設定する。以上の態様によれば、簡易な処理で適切な選択軌跡を設定できるという利点がある。他の態様において、表示制御手段は、音響信号から算定されたピッチ尤度の分布を、ピッチ尤度のピークの軌跡が候補軌跡となるように周波数-時間領域に表示し、軌跡設定手段は、指示軌跡上の各地点に対して周波数軸方向に加重値の分布（例えば加重値ｗの分布Ｄ）を設定し、候補軌跡上の各地点のピッチ尤度に当該地点での加重値を付加した数値が最大となる候補軌跡上の各地点の配列を選択軌跡として設定する。以上の態様では、例えば候補軌跡のうちピッチ尤度が低い地点でも加重値が大きい場合には選択軌跡として選択されるから、利用者が指定した指示軌跡を充分に反映した選択軌跡を設定できるという利点がある。 In a preferred aspect of the present invention, the display control means uses the pitch likelihood distribution calculated from the acoustic signal (for example, the pitch likelihood L (k, m)) as a candidate trajectory. In the frequency-time domain, the trajectory setting means has each point on the candidate trajectory where the pitch likelihood is maximum within a predetermined range (for example, range R) in the frequency axis direction including each point on the indicated trajectory. Is set as the selection trajectory. According to the above aspect, there exists an advantage that a suitable selection locus | trajectory can be set with a simple process. In another aspect, the display control means displays the pitch likelihood distribution calculated from the acoustic signal in the frequency-time domain so that the pitch likelihood peak trajectory is a candidate trajectory, and the trajectory setting means A distribution of weight values (for example, distribution D of weight values w) is set in the frequency axis direction for each point on the indicated locus, and the weight value at that point is added to the pitch likelihood of each point on the candidate locus. An array of points on the candidate trajectory with the maximum numerical value is set as the selection trajectory. In the above aspect, for example, if the weight value is large even at a point where the pitch likelihood is low among the candidate trajectories, the selection trajectory can be set that sufficiently reflects the instruction trajectory designated by the user. There are advantages.

本発明の好適な態様に係る音響解析装置は、軌跡設定手段が設定した複数の選択軌跡を複数の軌跡グループに区分する軌跡分類手段（例えば軌跡分類部３２）を具備し、軌跡設定手段は、複数の選択軌跡の各々の表示態様を、軌跡分類手段が区分した軌跡グループ毎に相違させる。以上の態様では、軌跡グループ毎に個別の表示態様で各選択軌跡が表示されるから、所望の軌跡グループの選択軌跡を利用者が容易に把握できるという利点がある。なお、以上の態様の具体例は例えば第２実施形態として後述される。また、本明細書における画像の「表示態様」は、利用者が視覚的に区別できる画像の性状を意味し、画像の明度，色彩，色相等を典型例として包含する。 The acoustic analysis apparatus according to a preferred aspect of the present invention includes a trajectory classification unit (for example, a trajectory classification unit 32) that divides a plurality of selected trajectories set by the trajectory setting unit into a plurality of trajectory groups. The display mode of each of the plurality of selected trajectories is made different for each trajectory group divided by the trajectory classification means. In the above aspect, since each selection locus is displayed in an individual display manner for each locus group, there is an advantage that the user can easily grasp the selection locus of a desired locus group. In addition, the specific example of the above aspect is later mentioned as 2nd Embodiment, for example. The “display mode” of an image in the present specification means a property of the image that can be visually distinguished by the user, and includes the brightness, color, hue, and the like of the image as typical examples.

本発明の好適な態様において、軌跡設定手段は、候補軌跡の端点（例えば端点ｅ0）の周辺領域（例えば領域Ｃ）に指示軌跡の端点（例えば端点ｅA）が指定された場合に、当該候補軌跡の端点を端点として当該候補軌跡に沿う選択軌跡を設定する。以上の態様では、候補軌跡の端点の周辺領域に指示軌跡の端点が指定された場合に、候補軌跡の端点を端点とする選択軌跡が設定されるから、音響信号の各音響成分の発音の始点または終点を端点とする選択軌跡を容易に設定できるという利点がある。なお、以上の態様の具体例は、例えば第３実施形態として後述される。 In a preferred aspect of the present invention, when the end point (for example, the end point eA) of the designated locus is designated in the peripheral region (for example, the region C) around the end point (for example, the end point e0) of the candidate track, the trajectory setting means A selection trajectory along the candidate trajectory is set with the end point of. In the above aspect, when the end point of the instruction trajectory is specified in the peripheral region of the end point of the candidate trajectory, the selection trajectory with the end point of the candidate trajectory as the end point is set, so the starting point of pronunciation of each acoustic component of the acoustic signal Alternatively, there is an advantage that a selection locus having the end point as an end point can be easily set. In addition, the specific example of the above aspect is later mentioned, for example as 3rd Embodiment.

本発明の好適な態様において、表示制御手段は、各候補軌跡のうち時間軸に平行な定常部分（例えば定常部分Ｑ1）と定常部分以外の変動部分（例えば変動部分Ｑ2）とを相異なる表示態様で表示する。以上の態様では、各候補軌跡の定常部分と変動部分とが相異なる表示態様で表示されるから、ピッチが変動し易い音響成分とピッチが変動し難い音響成分とを利用者が直観的に把握できるという利点がある。なお、以上の態様の具体例は例えば第４実施形態として後述される。また、特定の調波構造に対応する候補軌跡を表示制御手段が強調表示する構成によれば、例えば特定の楽器に対応する候補軌跡を利用者が直観的に把握できるという利点がある。以上の態様の具体例は、例えば第５実施形態として後述される。 In a preferred aspect of the present invention, the display control means displays different display modes for each candidate trajectory that is different from a stationary part (for example, the stationary part Q1) parallel to the time axis and a varying part (for example, the varying part Q2) other than the stationary part. Is displayed. In the above aspect, since the steady part and the variable part of each candidate trajectory are displayed in different display forms, the user intuitively grasps the acoustic component in which the pitch easily varies and the acoustic component in which the pitch hardly varies. There is an advantage that you can. In addition, the specific example of the above aspect is later mentioned as 4th Embodiment, for example. Further, according to the configuration in which the display control unit highlights the candidate trajectory corresponding to the specific harmonic structure, there is an advantage that the user can intuitively grasp the candidate trajectory corresponding to the specific musical instrument, for example. A specific example of the above aspect will be described later as a fifth embodiment, for example.

本発明の好適な態様において、軌跡設定手段は、既存の一の選択軌跡を周波数軸方向に移動することが利用者から指示された場合に、当該一の選択軌跡に対して倍音関係にある候補軌跡に沿う選択軌跡を設定する。以上の態様では、既存の選択軌跡に対して倍音関係にある候補軌跡に沿う選択軌跡が設定されるから、所望のピッチから１オクターブだけずれた周波数に設定された選択軌跡の修正や調波構造を構成する各周波数の選択を容易に実現することが可能である。以上の態様の具体例は、例えば第６実施形態として後述される。 In a preferred aspect of the present invention, the trajectory setting means, when instructed by the user to move one existing selected trajectory in the frequency axis direction, is a candidate having a harmonic overtone relationship with respect to the one selected trajectory. Set the selected trajectory along the trajectory. In the above aspect, since the selection trajectory is set along the candidate trajectory having a harmonic overtone with respect to the existing selection trajectory, the correction of the selection trajectory set to a frequency shifted by one octave from the desired pitch or the harmonic structure Can be easily realized. A specific example of the above aspect will be described later as a sixth embodiment, for example.

以上の各態様に係る音響解析装置は、音響信号の処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、音響信号から特定されたピッチの時間変化を表現する複数の候補軌跡を周波数-時間領域に配置した解析画像を表示装置に表示させる表示制御処理と、周波数-時間領域に対する指示軌跡の指定を利用者から受付ける指示受付処理と、指示軌跡に対応した位置の候補軌跡に沿う選択軌跡を周波数-時間領域に表示させる軌跡設定処理とをコンピュータに実行させる。以上のプログラムによれば、本発明の音響解析装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされるほか、通信網を介した配信の形態で提供されてコンピュータにインストールされる。 The acoustic analysis apparatus according to each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing of an acoustic signal, or a general-purpose calculation such as a CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program of the present invention includes a display control process for displaying on a display device an analysis image in which a plurality of candidate trajectories expressing a time change of a pitch specified from an acoustic signal are arranged in a frequency-time domain, and an instruction for the frequency-time domain The computer executes an instruction receiving process for accepting designation of a trajectory from the user and a trajectory setting process for displaying a selected trajectory along a candidate trajectory at a position corresponding to the designated trajectory in the frequency-time domain. According to the above program, the same operation and effect as the acoustic analysis apparatus of the present invention are exhibited. The program of the present invention is provided in a form stored in a computer-readable recording medium and installed in the computer, or is provided in a form distributed via a communication network and installed in the computer.

本発明の第１実施形態に係る音響解析装置のブロック図である。1 is a block diagram of an acoustic analysis device according to a first embodiment of the present invention. 解析画像の模式図である。It is a schematic diagram of an analysis image. 軌跡設定部の動作の説明図である。It is explanatory drawing of operation | movement of a locus | trajectory setting part. 第２実施形態に係る音響解析装置のブロック図である。It is a block diagram of the acoustic analysis device concerning a 2nd embodiment. 第３実施形態における軌跡設定部の動作の説明図である。It is explanatory drawing of operation | movement of the locus | trajectory setting part in 3rd Embodiment. 第４実施形態における解析画像の模式図である。It is a schematic diagram of the analysis image in 4th Embodiment. 第６実施形態における軌跡設定部の動作の説明図である。It is explanatory drawing of operation | movement of the locus | trajectory setting part in 6th Embodiment. 変形例における軌跡設定部の動作の説明図である。It is explanatory drawing of operation | movement of the locus | trajectory setting part in a modification.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響解析装置１００のブロック図である。図１に示すように、音響解析装置１００は、演算処理装置１０と記憶装置１２と表示装置１４と入力装置１６と放音装置１８とを具備するコンピュータシステムで実現される。 <First Embodiment>
FIG. 1 is a block diagram of an acoustic analysis apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, the acoustic analysis device 100 is realized by a computer system including an arithmetic processing device 10, a storage device 12, a display device 14, an input device 16, and a sound emitting device 18.

記憶装置１２は、演算処理装置１０が実行するプログラムＰGMや演算処理装置１０が使用する各種のデータを記憶する。第１実施形態の記憶装置１２は、音響信号Ｘを記憶する。音響信号Ｘは、ピッチ（基本周波数）が相違し得る複数の音響成分の混合音を示すサンプル系列である。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶装置１２として任意に採用される。第１実施形態の音響解析装置１００は、記憶装置１２に記憶された音響信号Ｘのピッチを解析する信号処理装置である。 The storage device 12 stores a program PGM executed by the arithmetic processing device 10 and various data used by the arithmetic processing device 10. The storage device 12 of the first embodiment stores the acoustic signal X. The acoustic signal X is a sample series indicating a mixed sound of a plurality of acoustic components that may have different pitches (fundamental frequencies). A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 12. The acoustic analysis device 100 according to the first embodiment is a signal processing device that analyzes the pitch of the acoustic signal X stored in the storage device 12.

表示装置１４（例えば液晶表示パネル）は、演算処理装置１０から指示された画像を表示する。入力装置１６は、音響解析装置１００に対する利用者からの指示を受付ける機器であり、例えば利用者が操作する複数の操作子を含んで構成される。放音装置１８（スピーカやヘッドホン）は、演算処理装置１０が生成した分離信号Ｙに応じた音波を再生する。分離信号Ｙは、音響信号Ｘのうち特定の音響成分を強調（理想的には抽出）した信号である。なお、演算処理装置１０が生成した分離信号Ｙをデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略した。 The display device 14 (for example, a liquid crystal display panel) displays an image instructed from the arithmetic processing device 10. The input device 16 is a device that receives an instruction from the user to the acoustic analysis device 100, and includes, for example, a plurality of operators operated by the user. The sound emitting device 18 (speaker or headphones) reproduces sound waves according to the separation signal Y generated by the arithmetic processing device 10. The separated signal Y is a signal obtained by emphasizing (ideally extracting) a specific acoustic component in the acoustic signal X. In addition, illustration of the D / A converter which converts the separation signal Y which the arithmetic processing unit 10 produced | generated from digital to analog was abbreviate | omitted for convenience.

演算処理装置１０は、記憶装置１２に記憶されたプログラムＰGMを実行することで、音響信号Ｘを解析するための複数の機能（解析処理部２２，表示制御部２４，指示受付部２６，軌跡設定部２８，音響処理部３０）を実現する。なお、演算処理装置１０の各機能を複数の装置に分散した構成や、演算処理装置１０の一部の機能を専用の電子回路（ＤＳＰ）が実現する構成も採用され得る。 The arithmetic processing unit 10 executes a program PGM stored in the storage device 12 to thereby analyze a plurality of functions (analysis processing unit 22, display control unit 24, instruction receiving unit 26, trajectory setting). Unit 28 and acoustic processing unit 30). A configuration in which each function of the arithmetic processing device 10 is distributed to a plurality of devices, or a configuration in which a dedicated electronic circuit (DSP) realizes some functions of the arithmetic processing device 10 may be employed.

解析処理部２２は、周波数軸上の各周波数のピッチ尤度Ｌ(k,m)を時間軸上の単位期間毎（フレーム毎）に音響信号Ｘから算定する。記号ｋは、周波数軸上の任意の１個の周波数（周波数帯域）を意味し、記号ｍは、時間軸上の任意の１個の単位期間（フレーム）を意味する。１個のピッチ尤度Ｌ(k,m)は、第ｍ番目の単位期間にて周波数軸上の第ｋ番目の周波数が音響信号Ｘのピッチ（基本周波数）に該当する確度（尤度）に相当する。解析処理部２２によるピッチ尤度Ｌ(k,m)の算定には公知の技術（例えば非特許文献１や特許文献１の技術）が任意に採用され得る。 The analysis processing unit 22 calculates the pitch likelihood L (k, m) of each frequency on the frequency axis from the acoustic signal X for each unit period (for each frame) on the time axis. The symbol k means any one frequency (frequency band) on the frequency axis, and the symbol m means any one unit period (frame) on the time axis. One pitch likelihood L (k, m) has an accuracy (likelihood) that the kth frequency on the frequency axis corresponds to the pitch (fundamental frequency) of the acoustic signal X in the mth unit period. Equivalent to. For the calculation of the pitch likelihood L (k, m) by the analysis processing unit 22, a known technique (for example, the technique of Non-Patent Document 1 or Patent Document 1) can be arbitrarily employed.

表示制御部２４は、音響信号Ｘから推定されるピッチの時間変化を利用者が視認するための図２の画像（以下「解析画像」という）５０を表示装置１４に表示させる。図２に示すように、解析画像５０は、周波数-時間領域５２が配置されたＧＵＩ（Graphical User Interface）である。周波数-時間領域５２は、相互に交差する周波数軸ＡFと時間軸ＡTとが設定された座標平面である。 The display control unit 24 causes the display device 14 to display the image 50 shown in FIG. 2 (hereinafter referred to as “analysis image”) for the user to visually recognize the time change of the pitch estimated from the acoustic signal X. As shown in FIG. 2, the analysis image 50 is a GUI (Graphical User Interface) in which a frequency-time region 52 is arranged. The frequency-time region 52 is a coordinate plane in which a frequency axis AF and a time axis AT intersecting each other are set.

表示制御部２４は、解析処理部２２による解析結果を周波数-時間領域５２内に表示する。すなわち、表示制御部２４は、図２に示すように、解析処理部２２が単位期間毎に算定する各ピッチ尤度Ｌ(k,m)の分布を周波数-時間領域５２内に表示させる。具体的には、利用者が周波数-時間領域５２内での各ピッチ尤度Ｌ(k,m)の高低を視覚的に把握できるように、周波数-時間領域５２内の各地点の表示態様（例えば明度，彩度，色相）がピッチ尤度Ｌ(k,m)に応じて可変に設定される。例えば、周波数-時間領域５２内でピッチ尤度Ｌ(k,m)が高い地点ほど高階調で表示される。したがって、図２から理解されるように、単位期間毎のピッチ尤度Ｌ(k,m)のピークを時間軸ＡTの方向に配列した軌跡（以下「候補軌跡」という）Ｐ0が、利用者により視覚的に識別可能な状態で周波数-時間領域５２内に表示される。音響信号Ｘは複数の音響成分の混合音の信号であるから、周波数軸ＡFの方向の位置と時間軸ＡTの方向の位置とが相違する複数の候補軌跡Ｐ0が周波数-時間領域５２内に配置される。各候補軌跡Ｐ0は時間軸ＡT上で相互に重複し得る。周波数軸ＡF上でピッチ尤度Ｌ(k,m)がピークとなる周波数（候補軌跡Ｐ0上の各周波数）は音響信号Ｘのピッチに該当する可能性が高い。 The display control unit 24 displays the analysis result by the analysis processing unit 22 in the frequency-time region 52. That is, the display control unit 24 displays the distribution of each pitch likelihood L (k, m) calculated by the analysis processing unit 22 for each unit period in the frequency-time region 52 as shown in FIG. Specifically, the display mode of each point in the frequency-time domain 52 (so that the user can visually grasp the level of each pitch likelihood L (k, m) in the frequency-time domain 52 ( For example, brightness, saturation, hue) are variably set according to the pitch likelihood L (k, m). For example, the higher the pitch likelihood L (k, m) in the frequency-time region 52, the higher the gradation is displayed. Therefore, as can be understood from FIG. 2, a locus (hereinafter referred to as “candidate locus”) P0 in which the peaks of the pitch likelihood L (k, m) for each unit period are arranged in the direction of the time axis AT is determined by the user. It is displayed in the frequency-time region 52 in a visually distinguishable state. Since the acoustic signal X is a mixed sound signal of a plurality of acoustic components, a plurality of candidate trajectories P0 in which the position in the direction of the frequency axis AF and the position in the direction of the time axis AT are different are arranged in the frequency-time region 52. Is done. The candidate trajectories P0 can overlap each other on the time axis AT. The frequency at which the pitch likelihood L (k, m) peaks on the frequency axis AF (each frequency on the candidate trajectory P0) is likely to correspond to the pitch of the acoustic signal X.

表示制御部２４は、入力装置１６に対する利用者からの指示に応じて周波数-時間領域５２の表示倍率を可変に制御する。また、表示制御部２４は、音響信号Ｘのうち時間軸上の一部の区間に対応するピッチ尤度Ｌ(k,m)の分布を解析画像５０内に表示し、音響信号Ｘのうちピッチ尤度Ｌ(k,m)が解析画像５０内に表示される区間を、利用者からの指示に応じて変更（周波数-時間領域５２を時間軸ＡTの方向にスクロール）する。 The display control unit 24 variably controls the display magnification of the frequency-time region 52 in accordance with an instruction from the user to the input device 16. Further, the display control unit 24 displays the distribution of the pitch likelihood L (k, m) corresponding to a part of the time axis of the acoustic signal X in the analysis image 50, and the pitch of the acoustic signal X The section in which the likelihood L (k, m) is displayed in the analysis image 50 is changed (scrolls the frequency-time region 52 in the direction of the time axis AT) according to an instruction from the user.

利用者は、周波数-時間領域５２内の各ピッチ尤度Ｌ(k,m)の分布（各候補軌跡Ｐ0）を参照しながら入力装置１６を適宜に操作することで、図３の部分(A)に示すように、周波数-時間領域５２内の任意の位置に所望の軌跡（以下「指示軌跡」という）ＰAを指定することが可能である。具体的には、利用者は、音響信号Ｘのうち分離対象となる所望の音響成分の周波数に近い候補軌跡Ｐ0に概略的に沿うように入力装置１６のマウスをドラッグすることで指示軌跡ＰAを指定する。利用者は、所望の候補軌跡Ｐ0に概略的に沿うように指示軌跡ＰAを指定すれば足り、候補軌跡Ｐ0に厳密に合致するように指示軌跡ＰAを指定する必要はない。図１の指示受付部２６は、周波数-時間領域５２に対する指示軌跡ＰAの指定を利用者から受付ける要素である。なお、図３の部分(A)では便宜的に指示軌跡ＰAを図示したが、実際の解析画像５０では指示軌跡ＰAは表示されない。ただし、指示軌跡ＰAを候補軌跡Ｐ0とともに表示することも可能である。 The user appropriately operates the input device 16 while referring to the distribution of the pitch likelihoods L (k, m) (each candidate trajectory P0) in the frequency-time region 52, so that the part (A in FIG. ), A desired trajectory (hereinafter referred to as “instruction trajectory”) PA can be designated at an arbitrary position in the frequency-time region 52. Specifically, the user drags the mouse of the input device 16 so as to roughly follow the candidate trajectory P0 close to the frequency of the desired acoustic component to be separated in the acoustic signal X, thereby indicating the instruction trajectory PA. specify. It is sufficient for the user to designate the instruction locus PA so as to roughly follow the desired candidate locus P0, and it is not necessary to designate the instruction locus PA so as to exactly match the candidate locus P0. The instruction receiving unit 26 in FIG. 1 is an element that receives designation of an instruction locus PA for the frequency-time region 52 from a user. In FIG. 3 (A), the instruction locus PA is shown for convenience, but the instruction locus PA is not displayed in the actual analysis image 50. However, the instruction trajectory PA can be displayed together with the candidate trajectory P0.

図１の軌跡設定部２８は、表示装置１４に表示された各候補軌跡Ｐ0と指示受付部２６が利用者から受付けた指示軌跡ＰAとに応じた選択軌跡ＰBを設定して周波数-時間領域５２内に配置する。選択軌跡ＰBと候補軌跡Ｐ0とは、利用者が両軌跡を視覚的に区別できるように相異なる表示態様（例えば明度，彩度，色相）で表示される。 The trajectory setting unit 28 in FIG. 1 sets a selection trajectory PB corresponding to each candidate trajectory P0 displayed on the display device 14 and the instruction trajectory PA received from the user by the instruction accepting unit 26 to set the frequency-time region 52. Place in. The selected trajectory PB and the candidate trajectory P0 are displayed in different display modes (for example, brightness, saturation, hue) so that the user can visually distinguish both trajectories.

第１実施形態の選択軌跡ＰBは、周波数-時間領域５２内で指示軌跡ＰAに対応した位置の候補軌跡Ｐ0に沿う軌跡（曲線または折線）である。第１実施形態の軌跡設定部２８は、図３の部分(B)に示すように、指示受付部２６が利用者から受付けた指示軌跡ＰA上の１個の地点ｐAを包含する周波数軸ＡFの方向の所定の範囲Ｒ内でピッチ尤度Ｌ(k,m)が最大となる候補軌跡Ｐ0上の地点ｐ0を、指示軌跡ＰA上の各地点ｐAについて配列した軌跡を、選択軌跡ＰBとして設定する。したがって、選択軌跡ＰBは、時間軸上で指示軌跡ＰAと同等の区間（指示軌跡ＰAの端点間の区間）にわたる。 The selection trajectory PB of the first embodiment is a trajectory (curved line or broken line) along the candidate trajectory P0 at a position corresponding to the instruction trajectory PA in the frequency-time region 52. As shown in part (B) of FIG. 3, the trajectory setting unit 28 according to the first embodiment has a frequency axis AF including one point pA on the instruction trajectory PA received from the user by the instruction receiving unit 26. A trajectory in which the point p0 on the candidate trajectory P0 having the maximum pitch likelihood L (k, m) within the predetermined range R in the direction is arranged for each point pA on the designated trajectory PA is set as the selection trajectory PB. . Therefore, the selection locus PB covers a section equivalent to the instruction locus PA on the time axis (a section between end points of the instruction locus PA).

以上の説明から理解されるように、周波数-時間領域５２内に利用者が概略的に指定した指示軌跡ＰAが既定の候補軌跡Ｐ0にスナップされる。ただし、指示軌跡ＰA上の地点ｐAを含む範囲Ｒ内に候補軌跡Ｐ0が存在しない場合（指示軌跡ＰAが何れの候補軌跡Ｐ0からも離れている場合）には、軌跡設定部２８は、利用者が指定した指示軌跡ＰA自体を選択軌跡ＰBとして確定する。指示軌跡ＰAの指定毎に軌跡設定部２８が選択軌跡ＰBを設定することで、周波数-時間領域５２内には複数の選択軌跡ＰBが配置される。 As understood from the above description, the instruction locus PA roughly designated by the user in the frequency-time region 52 is snapped to the predetermined candidate locus P0. However, when the candidate trajectory P0 does not exist within the range R including the point pA on the indicated trajectory PA (when the indicated trajectory PA is away from any candidate trajectory P0), the trajectory setting unit 28 The designated locus PA itself designated by is determined as the selected locus PB. The trajectory setting unit 28 sets the selection trajectory PB for each designation of the instruction trajectory PA, whereby a plurality of selection trajectories PB are arranged in the frequency-time region 52.

利用者は、周波数-時間領域５２内に配置された１個以上の選択軌跡ＰBを処理対象に選択して編集を指示することが可能である。具体的には、軌跡設定部２８は、利用者が選択した選択軌跡ＰBに対して、選択軌跡ＰBの延長（付足し）や短縮，削除，複製，貼付等の編集作業を実行する。なお、編集対象の選択軌跡ＰBを利用者からの指示に応じて選択する方法は任意である。例えば、周波数-時間領域５２のうち利用者がラバーバンド（矩形領域の対角線長を制御）やフリーハンドで指定した領域内の各選択軌跡ＰBを軌跡設定部２８が選択する構成が採用される。また、周波数軸ＡFや時間軸ＡTの軸上に利用者が指定した範囲内の各選択軌跡ＰBを軌跡設定部２８が選択する構成や、利用者が指定した数値範囲内の時間長の選択軌跡ＰBを軌跡設定部２８が選択する構成も好適である。また、入力装置１６の所定の操作子を押下しながら各選択軌跡ＰBを順次に指定することで複数の選択軌跡ＰBを選択することも可能である。 The user can select one or more selection trajectories PB arranged in the frequency-time region 52 as a processing target and instruct editing. Specifically, the trajectory setting unit 28 executes editing operations such as extending (adding), shortening, deleting, duplicating, and pasting the selected trajectory PB with respect to the selected trajectory PB selected by the user. A method for selecting the selection locus PB to be edited in accordance with an instruction from the user is arbitrary. For example, a configuration in which the trajectory setting unit 28 selects each selected trajectory PB in a region specified by the user in a rubber band (controlling the diagonal length of the rectangular region) or freehand in the frequency-time region 52 is employed. Further, a configuration in which the trajectory setting unit 28 selects each selection trajectory PB within the range specified by the user on the axis of the frequency axis AF and the time axis AT, and a selection trajectory of the time length within the numerical range specified by the user. A configuration in which the trajectory setting unit 28 selects PB is also suitable. It is also possible to select a plurality of selection trajectories PB by sequentially designating each selection trajectory PB while pressing a predetermined operator of the input device 16.

利用者が入力装置１６を適宜に操作して選択軌跡ＰBの保存（エクスポート）を指示すると、軌跡設定部２８は、現段階で周波数-時間領域５２内に設定されている各選択軌跡ＰBを指定するファイル（以下「軌跡ファイル」という）を生成して記憶装置１２に格納する。他方、利用者が入力装置１６を操作して軌跡ファイルの読込（インポート）を指示すると、軌跡設定部２８は、記憶装置１２に記憶された軌跡ファイルで指定された各選択軌跡ＰBを周波数-時間領域５２内に配置する。なお、以上の説明では、解析処理部２２による解析結果に対して指定および保存（エクスポート）された軌跡ファイルの読込を例示したが、公知の各種のピッチ推定技術で特定されたピッチの時間軌跡を示す軌跡ファイル（すなわち、軌跡設定部２８が生成した軌跡ファイル以外の軌跡ファイル）を読込むことも可能である。 When the user appropriately operates the input device 16 to instruct to save (export) the selected trajectory PB, the trajectory setting unit 28 designates each selected trajectory PB set in the frequency-time region 52 at the current stage. To be generated (hereinafter referred to as “trajectory file”) and stored in the storage device 12. On the other hand, when the user operates the input device 16 to instruct reading (import) of the trajectory file, the trajectory setting unit 28 sets each selected trajectory PB specified by the trajectory file stored in the storage device 12 to frequency-time. Arrange in the region 52. In the above description, reading of the trajectory file designated and saved (exported) with respect to the analysis result by the analysis processing unit 22 has been exemplified. However, the time trajectory of the pitch specified by various known pitch estimation techniques is described. It is also possible to read the indicated trajectory file (that is, a trajectory file other than the trajectory file generated by the trajectory setting unit 28).

図１の音響処理部３０は、軌跡設定部２８が設定した選択軌跡ＰBに対応するピッチの分離信号Ｙを生成する。具体的には、音響処理部３０は、利用者が入力装置１６の操作で音響信号Ｘの分離を音響解析装置１００に指示した場合に分離信号Ｙの生成を開始する。第１実施形態の分離信号Ｙは、音響信号Ｘのうち選択軌跡ＰBが示すピッチの音響成分を強調した音響信号である。 The acoustic processing unit 30 in FIG. 1 generates a separation signal Y having a pitch corresponding to the selected locus PB set by the locus setting unit 28. Specifically, the acoustic processing unit 30 starts generating the separation signal Y when the user instructs the acoustic analysis device 100 to separate the acoustic signal X by operating the input device 16. The separation signal Y of the first embodiment is an acoustic signal that emphasizes the acoustic component of the pitch indicated by the selected locus PB in the acoustic signal X.

具体的には、音響処理部３０は、各選択軌跡ＰBに対応した音響成分を強調するためのフィルタ（以下「分離フィルタ」という）を単位期間毎に生成して音響信号Ｘに順次に作用させることで分離信号Ｙを生成する。分離フィルタは、周波数軸上の相異なる周波数に対応する複数の係数値の系列である。第１実施形態の分離フィルタは、選択軌跡ＰBが示すピッチとその倍音周波数とに対応する係数値を１に設定するとともに残余の係数値を０に設定したバイナリマスクである。音響処理部３０は、音響信号Ｘの各単位期間の周波数スペクトルに分離フィルタを乗算して時間領域に変換することで分離信号Ｙを生成する。音響処理部３０が生成した分離信号Ｙが放音装置１８に供給されて音波として再生される。したがって、利用者は、自身が指定した指示軌跡ＰAに対応する選択軌跡ＰBのピッチの音響成分が強調された再生音を聴取することが可能である。 Specifically, the acoustic processing unit 30 generates a filter (hereinafter referred to as “separation filter”) for emphasizing an acoustic component corresponding to each selected locus PB for each unit period, and sequentially acts on the acoustic signal X. Thus, the separation signal Y is generated. The separation filter is a series of a plurality of coefficient values corresponding to different frequencies on the frequency axis. The separation filter of the first embodiment is a binary mask in which coefficient values corresponding to the pitch indicated by the selection locus PB and its harmonic frequency are set to 1 and the remaining coefficient values are set to 0. The acoustic processing unit 30 generates a separated signal Y by multiplying the frequency spectrum of each unit period of the acoustic signal X by a separation filter and converting it into the time domain. The separated signal Y generated by the acoustic processing unit 30 is supplied to the sound emitting device 18 and reproduced as a sound wave. Therefore, the user can listen to the reproduced sound in which the acoustic component of the pitch of the selection locus PB corresponding to the designated locus PA designated by the user is emphasized.

他方、表示制御部２４は、図２に示すように、現在の再生地点を示す再生指示子（カーソル）５４を周波数-時間領域５２内に配置し、分離信号Ｙの再生の進行とともに再生指示子５４の時間軸上の位置を変化させる。利用者は、分離信号Ｙの再生音を聴取するとともに解析画像５０（各選択軌跡ＰBと再生指示子５４との関係）を視認することで、周波数-時間領域５２内の各候補軌跡Ｐ0や各選択軌跡ＰBと、音響信号Ｘの各音響成分の発音および消音の時点や音高との関係を直観的に把握することが可能である。また、分離信号Ｙの再生音の聴取や解析画像５０の視認に並行して、利用者は、各選択軌跡ＰBの編集や新規な選択軌跡ＰBの追加（指示軌跡ＰAの指定）を音響解析装置１００に指示することが可能である。 On the other hand, the display control unit 24 arranges a reproduction indicator (cursor) 54 indicating the current reproduction point in the frequency-time region 52 as shown in FIG. The position on the time axis of 54 is changed. The user listens to the reproduced sound of the separated signal Y and visually recognizes the analysis image 50 (relationship between each selected locus PB and the reproduction indicator 54), thereby allowing each candidate locus P0 in the frequency-time region 52 and each It is possible to intuitively grasp the relationship between the selected trajectory PB and the sound generation and silencing time points and pitches of each acoustic component of the acoustic signal X. In parallel with listening to the reproduced sound of the separated signal Y and visually recognizing the analysis image 50, the user can edit each selection locus PB and add a new selection locus PB (designation of the instruction locus PA). 100 can be instructed.

以上に説明した第１実施形態では、周波数-時間領域５２内に配置された候補軌跡Ｐ0が利用者からの指示（指示軌跡ＰAの指定）に応じて選択軌跡ＰBとして選択されるから、音響信号Ｘの特定の音響成分のピッチを利用者が直観的かつ容易に選択することが可能である。第１実施形態では特に、利用者が指定した指示軌跡ＰAを候補軌跡Ｐ0にスナップすることで選択軌跡ＰBが設定されるから、所望の選択軌跡ＰBを利用者が容易に選択できるという効果は格別に顕著である。 In the first embodiment described above, the candidate trajectory P0 arranged in the frequency-time region 52 is selected as the selection trajectory PB in accordance with an instruction from the user (designation of the instruction trajectory PA). The user can intuitively and easily select the pitch of the specific acoustic component of X. Particularly in the first embodiment, since the selection locus PB is set by snapping the designated locus PA designated by the user to the candidate locus P0, the effect that the user can easily select the desired selection locus PB is exceptional. It is remarkable.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。なお、以下に例示する各形態において作用や機能が第１実施形態と同等である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In addition, about the element in which an effect | action and a function are equivalent to 1st Embodiment in each form illustrated below, the code | symbol referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図４は、第２実施形態における音響解析装置１００のブロック図である。図４に示すように、第２実施形態の音響解析装置１００は、第１実施形態に軌跡分類部３２を追加した構成である。軌跡分類部３２は、軌跡設定部２８が周波数-時間領域５２内に設定した複数の選択軌跡ＰBを複数のグループ（以下「軌跡グループ」という）Ｇに区分する。１個の軌跡グループＧには１個以上の選択軌跡ＰBが包含される。具体的には、軌跡分類部３２は、利用者が入力装置１６の操作で任意に指定した複数の選択軌跡ＰBをひとつの軌跡グループＧに分類する。例えば、利用者は、所望の楽器の音響成分に該当すると推定される複数の選択軌跡ＰB（例えば利用者の所望の楽器の音域内に存在する複数の選択軌跡ＰB）を１個の軌跡グループＧの要素として選択することが可能である。なお、各軌跡グループＧに分類される選択軌跡ＰBの選択には第１実施形態と同様の方法（例えば利用者が周波数-時間領域５２内にラバーバンドやフリーハンドで指定した領域内の選択軌跡ＰBを選択する構成）が採用され得る。 FIG. 4 is a block diagram of the acoustic analysis device 100 according to the second embodiment. As illustrated in FIG. 4, the acoustic analysis device 100 according to the second embodiment has a configuration in which a trajectory classification unit 32 is added to the first embodiment. The trajectory classification unit 32 classifies the plurality of selection trajectories PB set in the frequency-time region 52 by the trajectory setting unit 28 into a plurality of groups (hereinafter referred to as “trajectory groups”) G. One trajectory group G includes one or more selected trajectories PB. Specifically, the trajectory classification unit 32 classifies a plurality of selected trajectories PB arbitrarily designated by the user through the operation of the input device 16 into one trajectory group G. For example, the user selects a plurality of selection trajectories PB estimated to correspond to the acoustic components of a desired musical instrument (for example, a plurality of selection trajectories PB existing in the range of the user's desired musical instrument) as one trajectory group G. It is possible to select as an element. Note that the selection trajectory PB classified into each trajectory group G is selected in the same manner as in the first embodiment (for example, the selection trajectory in the region designated by the user in the frequency-time region 52 with a rubber band or freehand). A configuration in which PB is selected may be employed.

表示制御部２４は、周波数-時間領域５２内に配置された複数の選択軌跡ＰBの各々の表示態様を軌跡グループＧ毎に相違させる。また、利用者は、各選択軌跡ＰBについて実行され得る処理の有無（有効／無効）を軌跡グループＧ毎に個別に指示することが可能である。具体的には、表示制御部２４は、選択軌跡ＰBの表示の有無を軌跡グループＧ毎に利用者からの指示に応じて制御する。したがって、例えば利用者が指示した軌跡グループＧの各選択軌跡ＰBのみが周波数-時間領域５２内に表示され、残余の軌跡グループＧの各選択軌跡ＰBは非表示に制御される。以上の説明から理解されるように、各選択軌跡ＰBの表示に着目すると、軌跡グループＧ毎に別個のレイヤが設定されるとも換言され得る。各軌跡グループＧの選択軌跡ＰBが配置された複数のレイヤを相互に重ねることで解析画像５０が形成される。また、音響処理部３０は、分離信号Ｙでの強調の有無を軌跡グループＧ毎に利用者からの指示に応じて制御する。例えば、音響処理部３０は、音響信号Ｘのうち利用者が指示した軌跡グループＧの各選択軌跡ＰBに対応する音響成分のみを選択的に強調することで分離信号Ｙを生成する。 The display control unit 24 changes the display mode of each of the plurality of selection trajectories PB arranged in the frequency-time region 52 for each trajectory group G. Further, the user can individually instruct for each trajectory group G whether or not there is a process that can be executed for each selected trajectory PB (valid / invalid). Specifically, the display control unit 24 controls the presence / absence of the display of the selected locus PB for each locus group G according to an instruction from the user. Therefore, for example, only each selected locus PB of the locus group G instructed by the user is displayed in the frequency-time region 52, and each selected locus PB of the remaining locus group G is controlled to be hidden. As can be understood from the above description, when attention is paid to the display of each selected locus PB, it can be said that a separate layer is set for each locus group G. An analysis image 50 is formed by overlapping a plurality of layers on which the selected locus PB of each locus group G is arranged. In addition, the acoustic processing unit 30 controls the presence / absence of emphasis in the separation signal Y for each trajectory group G according to an instruction from the user. For example, the acoustic processing unit 30 generates the separation signal Y by selectively enhancing only the acoustic component corresponding to each selected locus PB of the locus group G instructed by the user from the acoustic signal X.

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態では、各選択軌跡ＰBが複数の軌跡グループＧに分類されて軌跡グループＧ毎に相異なる表示態様で表示されるから、特定の軌跡グループＧの各選択軌跡ＰB（例えば特定の楽器の音響成分に対応する選択軌跡ＰB）のみを利用者が直観的に把握できるという利点がある。また、軌跡グループＧ毎に各種の処理の有無が個別に制御されるから、特定の軌跡グループＧの選択軌跡ＰBに対応する音響成分の再生音を聴取しながらその軌跡グループＧの選択軌跡ＰBのみを選択的に表示および編集するという具合に、多様かつ容易な操作が実現されるという利点もある。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, since each selected locus PB is classified into a plurality of locus groups G and displayed in a different display manner for each locus group G, each selected locus PB (for example, specific locus) of a particular locus group G is displayed. There is an advantage that the user can intuitively grasp only the selection trajectory PB) corresponding to the acoustic component of the instrument. In addition, since the presence / absence of various processes is individually controlled for each trajectory group G, only the selected trajectory PB of the trajectory group G is heard while listening to the reproduction sound of the acoustic component corresponding to the selected trajectory PB of the specific trajectory group G. There is also an advantage that various and easy operations can be realized, such as selectively displaying and editing.

＜第３実施形態＞
図５は、第３実施形態における軌跡設定部２８の動作の説明図である。図５の部分(A)に示すように、第３実施形態の表示制御部２４は、音響信号Ｘの各ピッチ尤度Ｌ(k,m)に対応する候補軌跡Ｐ0の各端点ｅ0を特定する。端点ｅ0は、周波数-時間領域５２内でピッチ尤度Ｌ(k,m)が時間方向に不連続または急激に変化（増加または減少）する地点である。以上の説明から理解されるように、音響信号Ｘの各音響成分の発音の始点および終点が端点ｅ0に相当する。 <Third Embodiment>
FIG. 5 is an explanatory diagram of the operation of the trajectory setting unit 28 in the third embodiment. As shown in part (A) of FIG. 5, the display control unit 24 of the third embodiment specifies each end point e0 of the candidate locus P0 corresponding to each pitch likelihood L (k, m) of the acoustic signal X. . The end point e 0 is a point where the pitch likelihood L (k, m) discontinuously or rapidly changes (increases or decreases) in the time direction in the frequency-time region 52. As understood from the above description, the start point and the end point of the sound component of the sound signal X correspond to the end point e0.

図５の部分(A)に示すように、候補軌跡Ｐ0の端点ｅ0の周辺領域（例えば端点ｅ0を中心とする所定径の円形領域）Ｃ内に利用者が指示軌跡ＰAの端点ｅAを指定した場合、軌跡設定部２８は、図５の部分(B)に示すように、候補軌跡Ｐ0の端点ｅ0を端点としてその候補軌跡Ｐ0に沿う選択軌跡ＰBを設定する。すなわち、利用者が指定した指示軌跡ＰAの端点ｅAが候補軌跡Ｐ0の端点ｅ0にスナップされる。なお、各候補軌跡Ｐ0の端点ｅ0の周辺領域Ｃの外側に指示軌跡ＰAの端点ｅAが指定された場合、指示軌跡ＰAの端点ｅAを端点とする選択軌跡ＰBが設定される。 As shown in part (A) of FIG. 5, the user designates the end point eA of the instruction locus PA in the peripheral area C (for example, a circular area having a predetermined diameter centered on the end point e0) C of the candidate locus P0. In this case, the trajectory setting unit 28 sets a selection trajectory PB along the candidate trajectory P0 with the end point e0 of the candidate trajectory P0 as an end point, as shown in part (B) of FIG. That is, the end point eA of the designated locus PA designated by the user is snapped to the end point e0 of the candidate locus P0. When the end point eA of the instruction locus PA is designated outside the peripheral area C of the end point e0 of each candidate locus P0, the selection locus PB having the end point eA of the instruction locus PA as an end point is set.

第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態では、選択軌跡ＰBの端点が候補軌跡Ｐ0の端点ｅ0に自動的に調整されるから、音響信号Ｘの各音響成分の発音の始点から終点までの選択軌跡ＰBを利用者が容易に設定できる。したがって、音響信号Ｘの各音響成分を発音の始点から終点まで分離信号Ｙにて高精度に強調できるという利点がある。なお、第３実施形態を第２実施形態に適用することも可能である。 In the third embodiment, the same effect as in the first embodiment is realized. In the third embodiment, since the end point of the selection locus PB is automatically adjusted to the end point e0 of the candidate locus P0, the selection locus PB from the start point to the end point of the sound component of the acoustic signal X is used by the user. Can be set easily. Therefore, there is an advantage that each acoustic component of the acoustic signal X can be emphasized with high accuracy by the separated signal Y from the start point to the end point of the sound generation. The third embodiment can also be applied to the second embodiment.

＜第４実施形態＞
図６は、第４実施形態における解析画像５０の模式図である。なお、図６では、選択軌跡ＰBの図示は便宜的に省略されている。図６に示すように、第４実施形態の表示制御部２４は、周波数-時間領域５２内の各候補軌跡Ｐ0のうち時間軸ＡTに略平行に延在する部分（以下「定常部分」という）Ｑ1と定常部分Ｑ1以外の部分（以下「変動部分」という）Ｑ2とを相異なる表示態様（図６では実線／破線）で表示する。具体的には、各候補軌跡Ｐ0の定常部分Ｑ1は変動部分Ｑ2に対して強調表示（例えば太線や高階調で表示）される。表示制御部２４が定常部分Ｑ1と変動部分Ｑ2とを区別する方法は任意であるが、例えば、候補軌跡Ｐ0のうち所定長にわたり連続して周波数が所定幅の帯域内に維持される区間を定常部分Ｑ1と判定する方法が好適である。利用者が指定した指示軌跡ＰAに応じて候補軌跡Ｐ0に沿った選択軌跡ＰBが設定される点は前述の各形態と同様である。 <Fourth embodiment>
FIG. 6 is a schematic diagram of an analysis image 50 in the fourth embodiment. In FIG. 6, the selection trajectory PB is omitted for convenience. As shown in FIG. 6, the display control unit 24 of the fourth embodiment includes a portion (hereinafter referred to as “steady portion”) that extends substantially parallel to the time axis AT in each candidate locus P 0 in the frequency-time region 52. Q1 and a portion other than the steady portion Q1 (hereinafter referred to as “variable portion”) Q2 are displayed in different display modes (solid line / broken line in FIG. 6). Specifically, the steady portion Q1 of each candidate locus P0 is highlighted (for example, displayed with a thick line or high gradation) with respect to the varying portion Q2. The display control unit 24 can discriminate between the steady portion Q1 and the variable portion Q2, but, for example, the section in which the frequency is continuously maintained within a predetermined width band in the candidate trajectory P0 is steady. A method of determining the portion Q1 is preferable. The point that the selection trajectory PB along the candidate trajectory P0 is set according to the instruction trajectory PA designated by the user is the same as in each of the embodiments described above.

各音響成分のピッチの時間変動の度合は音響成分毎（楽曲のパート毎）に相違する。例えば、歌唱曲の主旋律を担当する歌唱音はビブラートやポルタメント等によりピッチが変動し易いが、歌唱曲の伴奏を担当する楽器音（例えばピアノやギターやベース等の演奏音）は、１回の発音の間でピッチが変動し難いという概略的な傾向がある。第４実施形態では、定常部分Ｑ1（例えば伴奏の演奏音である可能性が高い部分）と変動部分Ｑ2（例えば主旋律の歌唱音である可能性が高い部分）とを利用者が視覚的に区別しながら指示軌跡ＰA（更には選択軌跡ＰB）を指定できるという利点がある。 The degree of time variation of the pitch of each acoustic component is different for each acoustic component (each musical piece part). For example, the singing sound responsible for the main melody of the song is likely to vary in pitch due to vibrato, portamento, etc., but the instrumental sound responsible for accompaniment of the song (for example, the performance sound of a piano, guitar, bass, etc.) There is a general tendency that the pitch hardly changes between pronunciations. In the fourth embodiment, the user visually distinguishes between a steady portion Q1 (for example, a portion that is highly likely to be an accompaniment performance sound) and a variation portion Q2 (for example, a portion that is likely to be a main melody singing sound). However, there is an advantage that the instruction locus PA (and the selection locus PB) can be specified.

＜第５実施形態＞
第５実施形態の音響解析装置１００に対し、利用者は、入力装置１６を適宜に操作することで特定の調波構造を指定することが可能である。例えば、クラリネット等の閉管楽器の調波構造が指定され得る。 <Fifth Embodiment>
The user can designate a specific harmonic structure by appropriately operating the input device 16 with respect to the acoustic analysis device 100 of the fifth embodiment. For example, the harmonic structure of a closed wind instrument such as a clarinet can be specified.

表示制御部２４が表示装置１４に表示させる解析画像５０の周波数-時間領域５２内には、音響信号Ｘの各音響成分のピッチに対応する候補軌跡Ｐ0に加えて各ピッチの倍音周波数に対応する周波数軸ＡF上の位置にも候補軌跡Ｐ0が表示され得る。表示制御部２４は、周波数-時間領域５２内の複数の候補軌跡Ｐ0のうち利用者が指定した調波構造に対応する候補軌跡Ｐ0を強調表示する。例えば、閉管楽器の調波構造には、奇数次倍音が偶数次倍音と比較して顕著であるという傾向がある。したがって、利用者が閉管楽器の調波構造を指定した場合、表示制御部２４は、特定の候補軌跡Ｐ0（例えば選択軌跡ＰBが設定された候補軌跡Ｐ0）に対して奇数倍の周波数に位置する候補軌跡Ｐ0を強調表示する。また、オーボエ等の管楽器の調波構造には、低次側の倍音成分が基音成分と比較して顕著であるという傾向がある。したがって、利用者がオーボエ等の管楽器の調波構造を指定した場合、表示制御部２４は、低次側の倍音成分のピッチ尤度Ｌ(k,m)が基音成分のピッチ尤度Ｌ(k,m)と比較して高い関係にある各候補軌跡Ｐ0を強調表示する。 In the frequency-time region 52 of the analysis image 50 displayed on the display device 14 by the display control unit 24, in addition to the candidate trajectory P0 corresponding to the pitch of each acoustic component of the acoustic signal X, it corresponds to the harmonic frequency of each pitch. Candidate locus P0 can also be displayed at a position on frequency axis AF. The display control unit 24 highlights the candidate trajectory P 0 corresponding to the harmonic structure specified by the user among the plurality of candidate trajectories P 0 in the frequency-time region 52. For example, in the harmonic structure of a closed wind instrument, odd harmonics tend to be more prominent than even harmonics. Therefore, when the user designates the harmonic structure of the closed wind instrument, the display control unit 24 is located at a frequency that is an odd multiple of the specific candidate locus P0 (for example, the candidate locus P0 in which the selection locus PB is set). Candidate locus P0 is highlighted. Further, in the harmonic structure of wind instruments such as oboe, there is a tendency that the lower harmonic component is more remarkable than the fundamental component. Therefore, when the user specifies the harmonic structure of a wind instrument such as oboe, the display control unit 24 determines that the pitch likelihood L (k, m) of the lower harmonic component is the pitch likelihood L (k, m) of the fundamental component. , m), each candidate trajectory P0 having a higher relationship is highlighted.

第５実施形態においても第１実施形態と同様の効果が実現される。また、第５実施形態では、周波数-時間領域５２内の複数の候補軌跡Ｐ0のうち特定の調波構造に対応する候補軌跡Ｐ0が強調表示されるから、所望の調波構造に対応する選択軌跡ＰBを利用者が容易に選択できるという利点がある。なお、第５実施形態を第２実施形態から第４実施形態に適用することも可能である。 In the fifth embodiment, the same effect as in the first embodiment is realized. In the fifth embodiment, the candidate trajectory P0 corresponding to the specific harmonic structure among the plurality of candidate trajectories P0 in the frequency-time region 52 is highlighted, so that the selected trajectory corresponding to the desired harmonic structure is selected. There is an advantage that the user can easily select PB. Note that the fifth embodiment can be applied to the second to fourth embodiments.

＜第６実施形態＞
図７は、第６実施形態における軌跡設定部２８の動作の説明図である。利用者は、入力装置１６を操作することで、周波数-時間領域５２内の所望の選択軌跡ＰBについて周波数軸ＡF方向に対する移動を指示する（例えば既存の選択軌跡ＰBを入力装置１６のマウスでドラッグする）ことが可能である。 <Sixth Embodiment>
FIG. 7 is an explanatory diagram of the operation of the trajectory setting unit 28 in the sixth embodiment. The user operates the input device 16 to instruct movement of the desired selection locus PB in the frequency-time region 52 in the direction of the frequency axis AF (for example, drag the existing selection locus PB with the mouse of the input device 16). Is possible).

図７の部分(A)に例示された選択軌跡ＰB[1]について周波数軸ＡF方向への移動が指示された場合、軌跡設定部２８は、図７の部分(B)に例示されるように、選択軌跡ＰB[1]に対して倍音関係（選択軌跡ＰB[1]が示すピッチの整数倍の関係）にある候補軌跡Ｐ0に沿う選択軌跡ＰB[2]を設定する。具体的には、軌跡設定部２８は、移動前の選択軌跡ＰB[1]が示すピッチの整数倍の各周波数に対応する複数の候補軌跡Ｐ0のうち、利用者が指定した移動先の近傍に位置する候補軌跡Ｐ0に沿うように選択軌跡ＰB[2]を設定する。すなわち、選択軌跡ＰB[1]が移動先の候補軌跡Ｐ0にスナップされるとも換言され得る。選択軌跡ＰB[2]の時間軸ＡT上の区間は選択軌跡ＰB[1]と同様である。ただし、第３実施形態と同様に、選択軌跡ＰB[2]の端点を移動先の候補軌跡Ｐ0の端点ｅ0に調整することも可能である。また、図７の部分(B)から理解されるように、移動先の選択軌跡ＰB[2]が設定されると移動元の選択軌跡ＰB[1]は消去される。なお、以上の例示では選択軌跡ＰB[1]を移動先の候補軌跡Ｐ0に自動的にスナップしたが、選択軌跡ＰB[1]を倍音関係の周波数に単純に平行移動させる構成（すなわち、選択軌跡ＰBの形状が移動の前後で変化しない構成）も採用され得る。 When the movement in the direction of the frequency axis AF is instructed with respect to the selection locus PB [1] illustrated in the part (A) of FIG. 7, the locus setting unit 28 is as illustrated in the part (B) of FIG. Then, the selection trajectory PB [2] is set along the candidate trajectory P0 which has a harmonic overtone relationship (a relationship of an integral multiple of the pitch indicated by the selection trajectory PB [1]) with respect to the selection trajectory PB [1]. Specifically, the trajectory setting unit 28 is in the vicinity of the destination specified by the user among a plurality of candidate trajectories P0 corresponding to each frequency that is an integral multiple of the pitch indicated by the selection trajectory PB [1] before the movement. The selection trajectory PB [2] is set along the candidate trajectory P0. That is, it can also be said that the selected locus PB [1] is snapped to the destination candidate locus P0. The section of the selected locus PB [2] on the time axis AT is the same as the selected locus PB [1]. However, as in the third embodiment, the end point of the selected locus PB [2] can be adjusted to the end point e0 of the destination candidate locus P0. Further, as understood from the part (B) in FIG. 7, when the movement destination selection locus PB [2] is set, the movement source selection locus PB [1] is deleted. In the above example, the selection trajectory PB [1] is automatically snapped to the destination candidate trajectory P0. However, the selection trajectory PB [1] is simply translated to a harmonic-related frequency (that is, the selection trajectory). A configuration in which the shape of PB does not change before and after movement can also be employed.

第６実施形態においても第１実施形態と同様の効果が実現される。また、第６実施形態では、既存の選択軌跡ＰB[1]が倍音関係にある選択軌跡ＰB[2]に移動されるから、利用者の所望のピッチから１オクターブだけずれた周波数に選択軌跡ＰBが設定された場合（例えば、利用者が所望の音響成分のピッチから１オクターブずれた位置に指示軌跡ＰAを指定した場合）に、選択軌跡ＰBの位置を利用者の所望の周波数に容易に修正できるという利点がある。なお、第６実施形態を第２実施形態から第５実施形態に適用することも可能である。 In the sixth embodiment, the same effect as in the first embodiment is realized. Further, in the sixth embodiment, since the existing selection locus PB [1] is moved to the selection locus PB [2] having a harmonic relationship, the selection locus PB is shifted to a frequency shifted by one octave from the user's desired pitch. Is set (for example, when the user specifies the designated locus PA at a position shifted by one octave from the pitch of the desired acoustic component), the position of the selected locus PB is easily corrected to the user's desired frequency. There is an advantage that you can. Note that the sixth embodiment can be applied to the second to fifth embodiments.

以上の例示では、移動前の選択軌跡ＰB[1]を消去する形式の移動を例示したが、移動前の選択軌跡ＰB[1]を維持したまま移動後の選択軌跡ＰB[2]を設定すること（複製）も可能である。また、選択軌跡ＰBの複製を複数回にわたり反復することで、基音成分と各倍音成分とに対応する選択軌跡ＰBを容易に設定することが可能である。また、第２実施形態の軌跡分類部３２が、選択軌跡ＰB[1]とこれを複製した各選択軌跡ＰB[2]とを１個の軌跡グループＧ（第２実施形態）に分類する構成も好適である。以上の説明から理解されるように、選択軌跡ＰBの「移動」は、操作前の選択軌跡ＰBが消去される狭義の「移動」と、移動前の選択軌跡ＰBが維持される「複製」との双方を包含する概念である。 In the above example, the movement in the form of deleting the selection trajectory PB [1] before the movement is illustrated, but the selection trajectory PB [2] after the movement is set while maintaining the selection trajectory PB [1] before the movement. (Replication) is also possible. Further, it is possible to easily set the selection trajectory PB corresponding to the fundamental component and each harmonic component by repeating the duplication of the selection trajectory PB a plurality of times. The trajectory classification unit 32 of the second embodiment also classifies the selected trajectory PB [1] and each selected trajectory PB [2] duplicated into one trajectory group G (second embodiment). Is preferred. As understood from the above description, the “movement” of the selection trajectory PB includes “movement” in a narrow sense in which the selection trajectory PB before the operation is deleted, and “duplication” in which the selection trajectory PB before the movement is maintained. It is a concept that includes both.

また、以上の例示では、選択軌跡ＰB[1]に対して任意の次数の倍音関係にある選択軌跡ＰB[2]を設定可能な構成を例示したが、選択軌跡ＰB[1]に対する倍音成分の次数毎に選択軌跡ＰB[1]の移動の可否を利用者からの指示に応じて設定可能な構成も採用され得る。例えば、クラリネット等の閉管楽器については、選択軌跡ＰB[1]の周波数に対して奇数次倍音の周波数の位置に移動後の選択軌跡ＰB[2]を設定することは許可されるが、偶数次倍音の周波数の位置に選択軌跡ＰB[2]を設定することは禁止されるという具合である。以上の構成によれば、特定の調波構造に対応する複数の選択軌跡ＰBを容易に設定できるという利点がある。 In the above example, the configuration in which the selection trajectory PB [2] having a harmonic relationship of an arbitrary order with respect to the selection trajectory PB [1] can be set is illustrated. A configuration that can set whether or not the selected trajectory PB [1] can be moved for each order in accordance with an instruction from the user may be employed. For example, for a closed wind instrument such as a clarinet, it is permitted to set the selected trajectory PB [2] after moving to the position of the frequency of the odd-order overtone with respect to the frequency of the selected trajectory PB [1]. Setting the selection trajectory PB [2] at the position of the harmonic frequency is prohibited. According to the above structure, there exists an advantage that the some selection locus | trajectory PB corresponding to a specific harmonic structure can be set easily.

＜変形例＞
以上に例示した各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から選択された２以上の態様は相互に矛盾しない範囲で適宜に併合され得る。 <Modification>
Each form illustrated above can be variously modified. Specific modifications are exemplified below. Two or more aspects selected from the following examples can be appropriately combined as long as they do not contradict each other.

（１）前述の各形態では、各選択軌跡ＰBに対応した分離フィルタを音響信号Ｘに作用させて分離信号Ｙを生成したが、音響処理部３０が分離信号Ｙを生成する方法は適宜に変更される。例えば、前述の各形態では、各係数値が２値に設定されたバイナリフィルタを分離フィルタとして例示したが、各係数値が多値的に設定されたフィルタ（例えばウィーナフィルタ）を分離フィルタとして利用することも可能である。 (1) In each of the above embodiments, the separation signal Y is generated by applying the separation filter corresponding to each selection locus PB to the acoustic signal X. However, the method by which the acoustic processing unit 30 generates the separation signal Y is appropriately changed. Is done. For example, in each of the above embodiments, a binary filter in which each coefficient value is set to binary is exemplified as a separation filter, but a filter (for example, a Wiener filter) in which each coefficient value is set in multiple values is used as a separation filter. It is also possible to do.

また、選択軌跡ＰBに対応する周波数の純音（ビープ音）を示す分離信号Ｙを音響処理部３０が音響信号Ｘとは独立に生成する構成も採用される。選択軌跡ＰBに対応する周波数の純音とその倍音周波数の各純音との混合音の分離信号Ｙを音響処理部３０が音響信号Ｘとは独立に生成することも可能である。以上のように分離信号Ｙを簡易的に生成する構成（音響信号Ｘに対する音響処理を実行しない構成）によれば、分離フィルタを生成して音響信号Ｘに作用させる構成と比較して音響処理部３０の処理負荷が軽減されるという利点がある。また、分離フィルタを利用した分離信号Ｙの生成と純音の分離信号Ｙの簡易的な生成とを、利用者からの指示に応じて音響処理部３０が選択的に実行することも可能である。以上の説明から理解されるように、音響処理部３０は、選択軌跡ＰBに対応するピッチの分離信号Ｙを生成する要素として包括される。 In addition, a configuration in which the acoustic processing unit 30 generates the separated signal Y indicating the pure tone (beep sound) corresponding to the selected locus PB independently of the acoustic signal X is also employed. It is also possible for the acoustic processing unit 30 to generate the separated signal Y of the mixed sound of the pure tone having the frequency corresponding to the selection locus PB and each pure tone having the harmonic frequency independently of the acoustic signal X. As described above, according to the configuration that simply generates the separation signal Y (the configuration that does not execute the acoustic processing on the acoustic signal X), the acoustic processing unit is compared with the configuration that generates the separation filter and acts on the acoustic signal X. There is an advantage that the processing load of 30 is reduced. It is also possible for the acoustic processing unit 30 to selectively execute generation of the separation signal Y using the separation filter and simple generation of the pure tone separation signal Y in accordance with an instruction from the user. As can be understood from the above description, the acoustic processing unit 30 is included as an element that generates the separation signal Y having a pitch corresponding to the selected locus PB.

（２）前述の各形態では、図３を参照して説明した通り、利用者が指定した指示軌跡ＰA上の１個の地点ｐAを包含する範囲Ｒ内でピッチ尤度Ｌ(k,m)が最大となる候補軌跡Ｐ0上の各地点ｐ0の配列を選択軌跡ＰBとして設定したが、軌跡設定部２８が選択軌跡ＰBを設定する方法は以上の例示に限定されない。例えば、図８に示すように、指示軌跡ＰA上の各地点ｐAに対して周波数軸ＡF方向に加重値ｗ（ｗ[1]，ｗ[2]）の分布Ｄを設定し、周波数-時間領域５２内の各候補軌跡Ｐ0上の地点ｐ0のピッチ尤度Ｌ(k,m)に対して分布Ｄのうちその地点ｐ0での加重値ｗを付加（例えば乗算）した数値が最大となる候補軌跡Ｐ0上の地点ｐ0の配列を、軌跡設定部２８が選択軌跡ＰBとして設定することも可能である。 (2) In each of the above-described embodiments, as described with reference to FIG. 3, the pitch likelihood L (k, m) within a range R including one point pA on the designated locus PA designated by the user. Although the array of the points p0 on the candidate locus P0 that maximizes is set as the selection locus PB, the method by which the locus setting unit 28 sets the selection locus PB is not limited to the above example. For example, as shown in FIG. 8, a distribution D of weight values w (w [1], w [2]) is set in the direction of the frequency axis AF for each point pA on the indicated locus PA, and the frequency-time domain 52. The candidate trajectory in which the numerical value obtained by adding (for example, multiplying) the weight value w at the point p0 in the distribution D to the pitch likelihood L (k, m) of the point p0 on each candidate trajectory P0 in 52 is maximized. It is also possible for the locus setting unit 28 to set the array of the points p0 on P0 as the selected locus PB.

例えば、図８に例示された候補軌跡Ｐ0[1]と候補軌跡Ｐ0[2]とに着目すると、候補軌跡Ｐ0[1]上の地点ｐ0のピッチ尤度Ｌ(k1,m)にその地点ｐ0での加重値ｗ[1]を乗算した数値ｗ[1]・Ｌ(k1,m)と、候補軌跡Ｐ0[2]上の地点ｐ0のピッチ尤度Ｌ(k2,m)にその地点ｐ0での加重値ｗ[2]を乗算した数値ｗ[2]・Ｌ(k2,m)とを比較して、大きい方に対応する地点ｐ0（候補軌跡Ｐ0[1]および候補軌跡Ｐ0[2]の一方の地点ｐ0）が選択軌跡ＰB上の地点として採択される。加重値ｗの分布Ｄは、指示軌跡ＰA上の地点ｐAで加重値ｗが最大となり、地点ｐAから離れるほど加重値ｗが小さい数値となるように選定される。例えば正規分布等の公知の確率分布が加重値の分布Ｄとして好適に採用され得る。以上の構成では、ピッチ尤度Ｌ(k,m)は低いが指示軌跡ＰA上の地点ｐAに近い候補軌跡Ｐ0が選択軌跡ＰBとして選択されるから、利用者の意図（指示軌跡ＰA）を充分に反映した選択軌跡ＰBを設定できるという利点がある。 For example, when attention is paid to the candidate trajectory P0 [1] and the candidate trajectory P0 [2] illustrated in FIG. 8, the point p0 is set to the pitch likelihood L (k1, m) of the point p0 on the candidate trajectory P0 [1]. The value w [1] · L (k1, m) multiplied by the weight value w [1] and the pitch likelihood L (k2, m) of the point p0 on the candidate trajectory P0 [2] at the point p0 Is compared with the numerical value w [2] · L (k2, m) multiplied by the weight value w [2] of the point p0 (candidate locus P0 [1] and candidate locus P0 [2] corresponding to the larger one) One point p0) is adopted as a point on the selection locus PB. The distribution D of the weight value w is selected so that the weight value w is maximized at the point pA on the designated locus PA, and the weight value w becomes smaller as the distance from the point pA increases. For example, a known probability distribution such as a normal distribution can be suitably employed as the weight distribution D. In the above configuration, a candidate trajectory P0 that is low in pitch likelihood L (k, m) but close to the point pA on the indicated trajectory PA is selected as the selected trajectory PB, so that the user's intention (instructed trajectory PA) is sufficient. There is an advantage that the selection trajectory PB reflected in can be set.

（３）周波数-時間領域５２内に設定された複数の選択軌跡ＰBの各々について分離信号Ｙを個別に生成して記憶装置１２に格納する構成が採用され得る。例えば、音響処理部３０は、音響信号Ｘのうち各選択軌跡ＰBに対応する区間にその選択軌跡ＰBに応じた分離フィルタを作用させることで選択軌跡ＰB毎に分離信号Ｙを生成して記憶装置１２に格納する。 (3) A configuration in which the separation signal Y is individually generated for each of the plurality of selection trajectories PB set in the frequency-time region 52 and stored in the storage device 12 may be employed. For example, the acoustic processing unit 30 generates a separation signal Y for each selected trajectory PB by applying a separation filter corresponding to the selected trajectory PB to a section corresponding to each selected trajectory PB in the acoustic signal X. 12.

そして、音響処理部３０は、周波数-時間領域５２に対する利用者からの指示に応じて各選択軌跡ＰBの分離信号Ｙを放音装置１８に供給して再生する。例えば、周波数-時間領域５２内の指定点（例えばマウスポインタの位置やタッチパネルを入力装置１６として利用した場合のタッチ位置）を利用者が入力装置１６の操作で特定の選択軌跡ＰBに沿って移動させると、音響処理部３０は、その選択軌跡ＰBに対応した分離信号Ｙを指定点の移動速度に応じた再生速度で放音装置１８に供給する。なお、再生速度の制御には公知の技術が任意に採用され得るが、再生音のピッチが音響信号Ｘでのピッチに維持されるように再生速度を変化させる技術（例えばフェーズボコーダ）が好適である。以上の構成によれば、音響信号Ｘのうち利用者の所望の音響成分（例えば特定の楽器音）を任意の速度で再生できるという利点がある。 Then, the acoustic processing unit 30 supplies the separation signal Y of each selection locus PB to the sound emitting device 18 and reproduces it in accordance with an instruction from the user with respect to the frequency-time region 52. For example, the user moves a specified point in the frequency-time region 52 (for example, the position of the mouse pointer or the touch position when the touch panel is used as the input device 16) along the specific selection locus PB by the operation of the input device 16. Then, the sound processing unit 30 supplies the separation signal Y corresponding to the selected locus PB to the sound emitting device 18 at a reproduction speed corresponding to the moving speed of the designated point. Although a known technique can be arbitrarily adopted for controlling the reproduction speed, a technique (for example, a phase vocoder) that changes the reproduction speed so that the pitch of the reproduced sound is maintained at the pitch of the acoustic signal X is suitable. is there. According to the above structure, there exists an advantage that the user's desired acoustic component (for example, specific musical instrument sound) can be reproduced | regenerated at arbitrary speeds among the acoustic signals X. FIG.

また、分離信号Ｙの再生方向（順方向／逆方向）を指定点の移動方向に応じて制御することも可能である。例えば、音響処理部３０は、時間軸ＡTの下流側（時間が経過する方向）に指定点を移動させた場合には分離信号Ｙを順方向に再生し、時間軸ＡTの上流側（時間が遡及する方向）に指定点を移動させた場合には分離信号Ｙを逆方向に再生する。 It is also possible to control the reproduction direction (forward direction / reverse direction) of the separation signal Y according to the moving direction of the designated point. For example, the acoustic processing unit 30 reproduces the separation signal Y in the forward direction when the designated point is moved downstream of the time axis AT (the direction in which the time elapses), and upstream of the time axis AT (the time When the designated point is moved in the retroactive direction), the separation signal Y is reproduced in the reverse direction.

（４）前述の各形態では表示装置１４とは別体の入力装置１６を例示したが、表示装置１４と一体に構成されたタッチパネルを入力装置１６として採用することも可能である。例えば、利用者は、表示装置１４の表示面（操作面）に対するドラッグ操作（表示面を指でなぞる操作）で指示軌跡ＰAを指定することが可能である。ただし、表示面に対するドラッグ操作を周波数-時間領域５２のスクロールに割当てた場合、指示軌跡ＰAを指定するドラッグ操作と周波数-時間領域５２をスクロールさせるドラッグ操作とを区別する必要がある。例えば、周波数-時間領域５２に対して所定のタップ操作（例えばダブルタップやロングタップ）が付与された場合に指示軌跡ＰAの指定を受付ける受付状態に遷移し、指示受付部２６は、利用者によるドラッグ操作に応じた指示軌跡ＰAの指定を受付ける。受付状態では、各候補軌跡Ｐ0の表示態様を変化させる（例えば各候補軌跡Ｐ0を点滅させる）構成が好適である。また、周波数軸ＡFのうち表示面に接触した２本の指の間に対応する帯域内に限定して選択軌跡ＰBの設定（指示軌跡ＰAの指定）を許可することも可能である。表示面に接触した２本の指の間隔を変化させるピンチイン操作やピンチアウト操作に応じて表示制御部２４が周波数-時間領域５２の表示倍率を変化（ズームイン／ズームアウト）させる構成も好適である。 (4) In each of the above-described embodiments, the input device 16 that is separate from the display device 14 is illustrated. However, a touch panel that is integrated with the display device 14 may be employed as the input device 16. For example, the user can specify the instruction locus PA by a drag operation (an operation of tracing the display surface with a finger) on the display surface (operation surface) of the display device 14. However, when the drag operation on the display surface is assigned to the scroll of the frequency-time region 52, it is necessary to distinguish between the drag operation for designating the instruction locus PA and the drag operation for scrolling the frequency-time region 52. For example, when a predetermined tap operation (for example, a double tap or a long tap) is given to the frequency-time region 52, the state transitions to a reception state in which the designation of the instruction locus PA is accepted, and the instruction reception unit 26 is changed by the user. The designation of the instruction locus PA according to the drag operation is accepted. In the reception state, a configuration in which the display mode of each candidate locus P0 is changed (for example, each candidate locus P0 is blinked) is preferable. It is also possible to permit the setting of the selection locus PB (designation of the instruction locus PA) only within the band corresponding to the interval between the two fingers in contact with the display surface of the frequency axis AF. A configuration in which the display control unit 24 changes the display magnification of the frequency-time region 52 (zoom in / zoom out) in accordance with a pinch-in operation or a pinch-out operation that changes the interval between two fingers in contact with the display surface is also suitable. .

また、周波数-時間領域５２内に設定された選択軌跡ＰBに対する所定のタッチ操作（例えばダブルタップやロングタップ）でその選択軌跡ＰBを選択状態に遷移させ、選択状態の選択軌跡ＰBに関連する処理を表示面に対するタッチ操作で指示することも可能である。例えば、選択状態の選択軌跡ＰBに対してタップ操作（表示面を指で叩く操作）が付与された場合にその選択軌跡ＰBの音響成分が再生される構成や、表示面に接触する２本の指の間隔に応じて選択状態の選択軌跡ＰBを時間軸ＡTの方向に伸縮する構成、あるいは、選択状態の選択軌跡ＰBに対してフリック操作（表示面に接触した指を弾く操作）が付与された場合にその選択軌跡ＰBを削除する構成が採用される。また、指２本でのドラッグ操作が選択軌跡ＰBに対して付与された場合にはその選択軌跡ＰBを移動し、指１本でのドラッグ操作が選択軌跡ＰBに対して付与された場合にはその選択軌跡ＰBを複製するという具合に、表示面に対するタッチ操作の種類に応じて選択状態の選択軌跡ＰBに対する編集内容を相違させることも可能である。なお、表示面に所定のタップ操作（例えば選択状態の選択軌跡ＰBから離れた位置のタップ操作）が付与された場合に選択軌跡ＰBの選択状態は解除される。 In addition, the selected locus PB is changed to a selected state by a predetermined touch operation (for example, a double tap or a long tap) on the selected locus PB set in the frequency-time region 52, and processing related to the selected locus PB in the selected state is performed. Can be instructed by a touch operation on the display surface. For example, when a tap operation (operation to tap the display surface with a finger) is given to the selected trajectory PB, a configuration in which the acoustic component of the selected trajectory PB is reproduced, or two touches on the display surface A configuration in which the selected trajectory PB in the selected state is expanded or contracted in the direction of the time axis AT according to the finger interval, or a flick operation (an operation to play the finger touching the display surface) is given to the selected trajectory PB in the selected state In such a case, a configuration is adopted in which the selected locus PB is deleted. Further, when a drag operation with two fingers is given to the selection locus PB, the selection locus PB is moved, and when a drag operation with one finger is given to the selection locus PB. It is also possible to change the editing contents for the selected trajectory PB in the selected state according to the type of touch operation on the display surface, such as duplicating the selected trajectory PB. Note that the selection state of the selection locus PB is released when a predetermined tap operation (for example, a tap operation at a position away from the selection locus PB in the selection state) is given to the display surface.

周波数-時間領域５２内に設定された任意の選択軌跡ＰBを表示面に対するタッチ操作に応じて選択する構成も好適である。例えば、表示面に接触する２本の指を対角とする矩形領域（ラバーバンド）内の選択軌跡ＰBを選択する構成や、表示面に対する３個以上の接触点を通過する閉領域内の選択軌跡ＰBを選択する構成が採用される。 A configuration in which an arbitrary selection locus PB set in the frequency-time region 52 is selected according to a touch operation on the display surface is also suitable. For example, a configuration for selecting a selection locus PB in a rectangular area (rubber band) diagonally with two fingers in contact with the display surface, or a selection in a closed region that passes through three or more contact points on the display surface A configuration for selecting the locus PB is employed.

（５）前述の各形態では、音響信号Ｘのうち選択軌跡ＰBに対応する音響成分を強調する構成を例示したが、音響信号Ｘのうち選択軌跡ＰBに対応する音響成分を抑圧することで分離信号Ｙを生成することも可能である。例えば、音響処理部３０は、相異なる周波数に対応する複数の係数値のうち選択軌跡ＰBが示すピッチとその倍音周波数とに対応する係数値を０に設定するとともに残余の係数値を１に設定した分離フィルタを音響信号Ｘに作用させる。 (5) In the above-described embodiments, the configuration in which the acoustic component corresponding to the selected locus PB in the acoustic signal X is exemplified, but the acoustic signal X is separated by suppressing the acoustic component corresponding to the selected locus PB. It is also possible to generate the signal Y. For example, the acoustic processing unit 30 sets the coefficient value corresponding to the pitch indicated by the selection trajectory PB and the overtone frequency among a plurality of coefficient values corresponding to different frequencies to 0 and sets the remaining coefficient value to 1. The separated filter is applied to the acoustic signal X.

（６）前述の各形態では、音響信号Ｘから算定されたピッチ尤度Ｌ(k,m)を周波数-時間領域５２内に表示したが、例えば公知のピッチ検出技術（単音ピッチ検出または複音ピッチ検出）を利用して音響信号Ｘから検出されたピッチの時間変化を周波数-時間領域５２内に候補軌跡Ｐ0として表示することも可能である。以上の説明から理解されるように、候補軌跡Ｐ0は、音響信号Ｘから特定（典型的には推定または検出）されたピッチの時間変化を表現する軌跡として包括される。 (6) In each of the above embodiments, the pitch likelihood L (k, m) calculated from the acoustic signal X is displayed in the frequency-time region 52. For example, a known pitch detection technique (single pitch detection or multiple pitch) It is also possible to display the time variation of the pitch detected from the acoustic signal X using the detection) as a candidate locus P 0 in the frequency-time region 52. As can be understood from the above description, the candidate trajectory P0 is included as a trajectory that expresses a time change in pitch specified (typically estimated or detected) from the acoustic signal X.

（７）前述の各形態では、音響処理部３０を具備する音響解析装置１００を例示したが、周波数-時間領域５２内の各候補軌跡Ｐ0について利用者からの指示（指示軌跡ＰA）に応じた選択軌跡ＰBを設定するという機能のみに着目すると、音響処理部３０は省略され得る。また、ピッチ尤度Ｌ(k,m)が外部装置から通知される構成では解析処理部２２が省略される。 (7) In each of the above-described embodiments, the acoustic analysis apparatus 100 including the acoustic processing unit 30 is illustrated. However, according to the instruction (instruction trajectory PA) from the user for each candidate trajectory P0 in the frequency-time region 52. Focusing only on the function of setting the selection locus PB, the acoustic processing unit 30 can be omitted. Further, in the configuration in which the pitch likelihood L (k, m) is notified from the external device, the analysis processing unit 22 is omitted.

１００……音響解析装置、１０……演算処理装置、１２……記憶装置、１４……表示装置、１６……入力装置、１８……放音装置、２２……解析処理部、２４……表示制御部、２６……指示受付部、２８……軌跡設定部、３０……音響処理部、３２……軌跡分類部、５０……解析画像、５２……周波数-時間領域、Ｐ0……候補軌跡、ＰA……指示軌跡、ＰB……選択軌跡。 DESCRIPTION OF SYMBOLS 100 ... Acoustic analysis device, 10 ... Arithmetic processing device, 12 ... Memory | storage device, 14 ... Display device, 16 ... Input device, 18 ... Sound emission device, 22 ... Analysis processing part, 24 ... Display Control unit 26... Instruction accepting unit 28... Trajectory setting unit 30... Acoustic processing unit 32... Trajectory classification unit 50 .. Analyzed image 52. , PA: Instruction locus, PB: Selection locus.

Claims

Display control means for displaying, on a display device, an analysis image in which a plurality of candidate trajectories expressing a time change of a pitch specified from an acoustic signal are arranged in a frequency-time domain;
Instruction accepting means for accepting designation of an instruction locus for the frequency-time domain from a user;
A trajectory setting means for displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region ;
The trajectory setting means sets the selection trajectory along the candidate trajectory with the end point of the candidate trajectory as an end point when the end point of the instruction trajectory is designated in the peripheral region of the end point of the candidate trajectory.
Acoustic analysis device.

The acoustic analysis apparatus according to claim 1, wherein the display control unit displays a stationary part parallel to a time axis and a changing part other than the stationary part in the candidate trajectories in different display modes.

Display control means for displaying, on a display device, an analysis image in which a plurality of candidate trajectories expressing a time change of a pitch specified from an acoustic signal are arranged in a frequency-time domain;
Instruction accepting means for accepting designation of an instruction locus for the frequency-time domain from a user;
A trajectory setting means for displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region ;
The display control means displays the stationary part parallel to the time axis in the candidate trajectories and the changing part other than the stationary part in different display modes.
Acoustic analysis device.

The trajectory setting unit sets a selection trajectory along a candidate trajectory having a harmonic overtone with respect to the one selected trajectory when the user instructs to move the existing one selected trajectory in the frequency axis direction. The acoustic analysis device according to any one of claims 1 to 3 .

Display control means for displaying, on a display device, an analysis image in which a plurality of candidate trajectories expressing a time change of a pitch specified from an acoustic signal are arranged in a frequency-time domain;
Instruction accepting means for accepting designation of an instruction locus for the frequency-time domain from a user;
A trajectory setting means for displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region ;
The trajectory setting unit sets a selection trajectory along a candidate trajectory having a harmonic overtone with respect to the one selected trajectory when the user instructs to move the existing one selected trajectory in the frequency axis direction. Do
Acoustic analysis device.

The acoustic analysis apparatus according to claim 1, further comprising: an acoustic processing unit that generates a separation signal having a pitch corresponding to the selected locus.

Computer system
An analysis image in which a plurality of candidate trajectories expressing the time change of the pitch specified from the acoustic signal are arranged in the frequency-time domain is displayed on the display device,
The designation of the instruction locus for the frequency-time domain is accepted from the user,
Displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region;
In the display of the selection locus, when the end point of the instruction locus is specified in the peripheral region of the end point of the candidate locus, the selection locus along the candidate locus is set with the end point of the candidate locus as an end point.
Acoustic analysis method.

Computer system
An analysis image in which a plurality of candidate trajectories expressing the time change of the pitch specified from the acoustic signal are arranged in the frequency-time domain is displayed on the display device,
The designation of the instruction locus for the frequency-time domain is accepted from the user,
Displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region;
In the display of the analysis image, the stationary part parallel to the time axis and the fluctuation part other than the stationary part are displayed in different display modes among the candidate trajectories.
Acoustic analysis method.

Computer system
An analysis image in which a plurality of candidate trajectories expressing the time change of the pitch specified from the acoustic signal are arranged in the frequency-time domain is displayed on the display device,
The designation of the instruction locus for the frequency-time domain is accepted from the user,
Displaying a selection trajectory along the candidate trajectory at a position corresponding to the indicated trajectory in the frequency-time region;
In the display of the selection trajectory, when the user instructs to move one existing selection trajectory in the frequency axis direction, the selection trajectory along the candidate trajectory having a harmonic overtone relationship with respect to the one selection trajectory is displayed. Set
Acoustic analysis method.