JP5733044B2

JP5733044B2 - Masking analysis device, masker sound selection device, masking device and program

Info

Publication number: JP5733044B2
Application number: JP2011132631A
Authority: JP
Inventors: 三樹夫東山; 佳史原
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-06-14
Filing date: 2011-06-14
Publication date: 2015-06-10
Anticipated expiration: 2031-06-14
Also published as: JP2013003271A

Description

本発明は、各種のマスカー音を適用したマスキングの効果を評価する技術に関する。 The present invention relates to a technique for evaluating the effect of masking using various masker sounds.

秘匿性の高い会話音等のターゲット音（maskee）にマスカー音（masker）を重畳することでターゲット音の漏洩を妨害するサウンドマスキング技術が従来から提案されている。白色雑音等の各種の雑音のほか、ターゲット音を加工した音声もマスカー音として利用される。例えば特許文献１や特許文献２には、ターゲット音を時間軸上で区分した各区間の時間波形を逆転するとともに各区間の順序を変更することでマスカー音を生成する技術が開示されている。 Conventionally, a sound masking technique for preventing leakage of a target sound by superimposing a masker sound on a target sound (maskee) such as a highly confidential conversation sound has been proposed. In addition to various types of noise such as white noise, the sound obtained by processing the target sound is also used as a masker sound. For example, Patent Literature 1 and Patent Literature 2 disclose a technique for generating a masker sound by reversing the time waveform of each section obtained by dividing the target sound on the time axis and changing the order of each section.

音声の漏洩を効果的に防止し得るマスカー音の生成や選定のためにはマスキング効果の定量的な評価が重要である。マスキング効果を評価する典型的な方法としては、マスキングされた音声を受聴した被験者がターゲット音を理解できる割合（会話了解度：speech intelligibility）を測定する主観評価が挙げられるが、高精度な評価には非常に手間が掛かるという問題がある。そこで、非特許文献１や非特許文献２の技術では、マスキングの前後の音声の狭帯域包絡線の相関値（以下「狭帯域包絡相関」という）がマスキングの効果の定量的な評価指標として採用される。狭帯域包絡線は、人間の聴覚の臨界帯域に対応する各帯域（例えば１/４オクターブの帯域）での音声波形の包絡線である。 Quantitative evaluation of the masking effect is important for the generation and selection of masker sounds that can effectively prevent voice leakage. A typical method for evaluating the masking effect is a subjective evaluation that measures the rate at which subjects who listened to the masked speech can understand the target sound (speech intelligibility). Has the problem of being very time consuming. Therefore, in the techniques of Non-Patent Document 1 and Non-Patent Document 2, the correlation value of the narrowband envelope of the speech before and after masking (hereinafter referred to as “narrowband envelope correlation”) is adopted as a quantitative evaluation index of the effect of masking. Is done. The narrow-band envelope is an envelope of a speech waveform in each band (for example, a quarter octave band) corresponding to the critical band of human hearing.

特開２００８−２３３６７１号公報JP 2008-233671 A 特開２０１０−２１７８８３号公報JP 2010-217883 A Houtgast T et al. "Predicting speech intelligibility in rooms from the Modulation Transfer Function. I. General room acoustics", Acustica, 46: 60-72, 1980Houtgast T et al. "Predicting speech intelligibility in rooms from the Modulation Transfer Function. I. General room acoustics", Acustica, 46: 60-72, 1980 Drullman R. "Temporal envelope and fine structure cues for speech intelligibility", J. Acoust. Soc. Am 97: 585-592, 1995Drullman R. "Temporal envelope and fine structure cues for speech intelligibility", J. Acoust. Soc. Am 97: 585-592, 1995

ところで、サウンドマスキングの作用にはエネルギーマスキングと情報マスキングとがある。エネルギーマスキングは、ターゲット音とは無関係に生成されたマスカー音を比較的に高いエネルギーでターゲット音に重畳することでターゲット音の聴き取りを妨害する作用であり、情報マスキングは、前掲の特許文献１や特許文献２の技術のように、音響特性がターゲット音に類似するマスカー音（攪乱音）をターゲット音に重畳することでターゲット音の聴き取りを妨害する作用である。エネルギーマスキングに有効なマスカー音の典型例は白色雑音であり、情報マスキングに有効なマスカー音の典型例は、ターゲット音の発声者の音声波形を時間軸の方向に反転した逆転音声である。 By the way, the action of sound masking includes energy masking and information masking. Energy masking is an action that obstructs listening of a target sound by superimposing a masker sound generated independently of the target sound on the target sound with relatively high energy, and information masking is disclosed in Patent Document 1 described above. As in the technique of Japanese Patent Application Laid-Open No. H11-133260, the target sound is disturbed by superimposing a masker sound (disturbance sound) whose acoustic characteristics are similar to the target sound on the target sound. A typical example of a masker sound effective for energy masking is white noise, and a typical example of a masker sound effective for information masking is an inverted voice obtained by inverting the voice waveform of the speaker of the target sound in the direction of the time axis.

図１１は、マスカー音に対するターゲット音のエネルギー比（以下「Ｔ/Ｍ比」という）を相違させた複数の場合について狭帯域包絡相関の計算値と会話了解度の実測値との関係を示すグラフである。図１１では、エネルギーマスキングに有効な白色雑音をマスカー音として利用した場合と、情報マスキングに有効な逆転音声をマスカー音として利用した場合とが個別に図示されている。 FIG. 11 is a graph showing the relationship between the calculated value of the narrowband envelope correlation and the actually measured value of the intelligibility for a plurality of cases where the energy ratio of the target sound to the masker sound (hereinafter referred to as “T / M ratio”) is different. It is. In FIG. 11, a case where white noise effective for energy masking is used as a masker sound and a case where inverted voice effective for information masking is used as a masker sound are separately illustrated.

白色雑音をマスカー音として使用した場合、図１１に線Ｚ1で示す通り、狭帯域包絡相関の変化に対して会話了解度は敏感に変化し、狭帯域包絡相関が大きいほど会話了解度が高いという傾向が顕著に観測される。しかし、逆転音声をマスカー音として使用した場合、図１１に線Ｚ2で示す通り、特に狭帯域包絡相関の０.３から０.８までの範囲内において、狭帯域包絡相関の変化に対して会話了解度が明確に変化しないという傾向が確認される。すなわち、非特許文献１や非特許文献２に開示された狭帯域包絡相関は、エネルギーマスキングの評価指標としては適切であるものの、情報マスキングの評価指標としては必ずしも適切ではない。以上の事情を考慮して、本発明は、情報マスキングの効果の適切な評価を目的とする。 When white noise is used as a masker sound, as shown by the line Z1 in FIG. 11, the conversation intelligibility changes sensitively to changes in the narrowband envelope correlation, and the greater the narrowband envelope correlation, the higher the conversation intelligibility. The trend is noticeable. However, when the reverse speech is used as a masker sound, as shown by the line Z2 in FIG. 11, especially in the range from 0.3 to 0.8 of the narrowband envelope correlation, the conversation with respect to the change of the narrowband envelope correlation is performed. The tendency that the intelligibility does not change clearly is confirmed. That is, the narrowband envelope correlation disclosed in Non-Patent Document 1 and Non-Patent Document 2 is appropriate as an evaluation index for energy masking, but is not necessarily appropriate as an evaluation index for information masking. In view of the above circumstances, an object of the present invention is to appropriately evaluate the effect of information masking.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の要素と後述の実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。 Means employed by the present invention to solve the above problems will be described. In order to facilitate the understanding of the present invention, in the following description, the correspondence between the elements of the present invention and the elements of the embodiments described later will be indicated in parentheses, but the scope of the present invention will be exemplified in the embodiments. It is not intended to be limited.

本発明のマスキング解析装置は、マスカー音によるターゲット音のマスキングを解析する装置であって、音響信号のスペクトルの各ピークに対応する線スペクトル列（例えば線スペクトル列Ｌi[m]）の自己相関数列（例えば自己相関数列Ａi[m]）を、ターゲット音を示す第１音響信号（例えば音響信号ｓ1(t)）と、ターゲット音およびマスカー音の混合音を示す第２音響信号（例えば音響信号ｓ2(t)）との各々について時間軸上のフレーム毎に算定する自己相関算定手段（例えば自己相関算定部２２）と、第１音響信号および第２音響信号において相互に対応するフレーム毎に第１音響信号の自己相関数列と第２音響信号の自己相関数列との相互相関係数（例えば相互相関係数ρ[m]）を算定する相互相関算定手段（相互相関算定部２４）と、各フレームについて算定された複数の相互相関係数の代表値をマスキングの効果指標（例えば効果指標α）として算定する指標算定手段（例えば指標算定部２６）とを具備する。以上の構成では、第１音響信号の自己相関数列と第２音響信号の自己相関数列とで相互に対応するフレーム間の相互相関係数に応じて効果指標が算定されるから、狭帯域包絡相関を利用した場合と比較して情報マスキングの効果を適切に評価することが可能である。 The masking analysis apparatus of the present invention is an apparatus for analyzing masking of a target sound by a masker sound, and an autocorrelation sequence of a line spectrum sequence (for example, a line spectrum sequence Li [m]) corresponding to each peak of a spectrum of an acoustic signal. (For example, autocorrelation sequence Ai [m]), a first acoustic signal (for example, acoustic signal s1 (t)) indicating the target sound and a second acoustic signal (for example, acoustic signal s2) indicating the mixed sound of the target sound and masker sound. (t)) and autocorrelation calculating means (for example, autocorrelation calculating unit 22) for each frame on the time axis, and first for each frame corresponding to each other in the first acoustic signal and the second acoustic signal. Cross-correlation calculating means (cross-correlation calculating unit 24) for calculating a cross-correlation coefficient (for example, cross-correlation coefficient ρ [m]) between the autocorrelation sequence of the acoustic signal and the autocorrelation sequence of the second acoustic signal; Comprising the index calculation means for calculating a calculated representative value of the plurality of cross-correlation coefficients as effect index for masking (e.g., effect index alpha) (e.g. the index calculator 26) for beam. In the above configuration, since the effect index is calculated according to the cross-correlation coefficient between frames corresponding to each other in the autocorrelation sequence of the first acoustic signal and the autocorrelation sequence of the second acoustic signal, the narrowband envelope correlation is calculated. It is possible to appropriately evaluate the effect of information masking compared with the case of using.

複数の相互相関係数から効果指標として算定される代表値の種類は任意であるが、例えば、複数の相互相関係数の平均値（例えば相加平均）を効果指標として算定する構成や、複数の相互相関係数の所定のパーセンタイル値（例えば７５パーセンタイル値）を効果指標として算定する構成が好適である。また、指標算定手段が算定した効果指標を利用する方法は本発明において任意であるが、例えば、効果指標を表示装置に表示させる表示制御手段（例えば表示制御部２８）を具備する構成が好適である。 The type of representative value calculated as an effect index from a plurality of cross-correlation coefficients is arbitrary. For example, a configuration for calculating an average value (for example, arithmetic mean) of a plurality of cross-correlation coefficients as an effect index, A configuration in which a predetermined percentile value (for example, 75th percentile value) of the cross-correlation coefficient is calculated as an effect index is suitable. The method of using the effect index calculated by the index calculating means is arbitrary in the present invention. For example, a configuration including display control means (for example, the display control unit 28) for displaying the effect index on the display device is suitable. is there.

本発明の好適な態様において、自己相関算定手段は、第１音響信号の自己相関数列と、マスカー音の種類とターゲット音およびマスカー音のエネルギー比（Ｔ/Ｍ比）との少なくとも一方が相違する複数の第２音響信号の各々の自己相関数列とをフレーム毎に算定し、相互相関算定手段は、複数の第２音響信号の各々について第１音響信号の自己相関数列と当該第２音響信号の自己相関数列との相互相関係数をフレーム毎に算定し、指標算定手段は、複数の第２音響信号の各々について、当該第２音響信号について算定された複数の相互相関係数に応じた効果指標を算定する。以上の態様では、種類や音圧が相異なる複数のマスカー音について効果指標が算定されるから、複数のマスカー音の各々の効果指標を比較することで、情報マスキングの有効性という観点から最適なマスカー音を選択することが可能である。 In a preferred aspect of the present invention, the autocorrelation calculating means differs in at least one of the autocorrelation sequence of the first acoustic signal and the type of masker sound and the energy ratio (T / M ratio) of the target sound and masker sound. The autocorrelation number sequence of each of the plurality of second acoustic signals is calculated for each frame, and the cross-correlation calculating means calculates the autocorrelation number sequence of the first acoustic signal and the second acoustic signal for each of the plurality of second acoustic signals. The cross-correlation coefficient with the autocorrelation sequence is calculated for each frame, and the index calculation means has, for each of the plurality of second acoustic signals, an effect corresponding to the plurality of cross-correlation coefficients calculated for the second acoustic signal. Calculate the indicators. In the above aspect, since the effect index is calculated for a plurality of masker sounds of different types and sound pressures, it is optimal from the viewpoint of the effectiveness of information masking by comparing each effect index of the plurality of masker sounds. A masker sound can be selected.

本発明の好適な態様のマスキング解析装置は、周波数軸と時間軸とが設定された領域にて自己相関数列の時系列を示す相関遷移画像（例えば相関遷移画像６２）と、自己相関数列の各相関値を複数のフレームについて周波数毎に合計した数値の周波数軸上での分布を示す相関分布画像（例えば相関分布画像６４）との少なくとも一方を、第１音響信号と第２音響信号との各々について表示装置に表示させる表示制御手段（例えば表示制御部２８）とを具備する。以上の態様では、第１音響信号と第２音響信号との間で相関遷移画像を対比することで、利用者は、調波構造の時間遷移がマスキングの前後で変化する度合（すなわち情報マスキングの度合）を直観的に把握することが可能である。また、第１音響信号と第２音響信号との間で相関分布画像を対比することで、利用者は、複数のフレームにわたる長期的な調波構造の変化を直観的に把握することが可能である。 A masking analysis apparatus according to a preferred aspect of the present invention includes a correlation transition image (for example, correlation transition image 62) indicating a time series of an autocorrelation sequence in a region in which a frequency axis and a time axis are set, and each autocorrelation sequence. At least one of a correlation distribution image (for example, a correlation distribution image 64) indicating a distribution on the frequency axis of a numerical value obtained by summing up correlation values for each frequency for a plurality of frames is used as each of the first acoustic signal and the second acoustic signal. Display control means (for example, the display control unit 28) for displaying on the display device. In the above aspect, by comparing the correlation transition image between the first acoustic signal and the second acoustic signal, the user can adjust the degree to which the temporal transition of the harmonic structure changes before and after masking (that is, information masking). It is possible to grasp the degree) intuitively. Also, by comparing the correlation distribution image between the first acoustic signal and the second acoustic signal, the user can intuitively grasp the long-term change in the harmonic structure over a plurality of frames. is there.

本発明は、以上の各態様に係るマスキング解析装置を利用して複数種のマスカー音の何れかを選択するマスカー音選択装置としても実現される。本発明のマスカー音選択装置は、音響信号のスペクトルの各ピークに対応する線スペクトル列の自己相関数列を、ターゲット音を示す第１音響信号と、相異なる種類のマスカー音とターゲット音との混合音を示す複数の第２音響信号の各々とについて、時間軸上のフレーム毎に算定する自己相関算定手段と、複数の第２音響信号の各々について、当該第２音響信号の自己相関数列と第１音響信号の自己相関数列との相互相関係数を相互に対応するフレーム毎に算定する相互相関算定手段と、複数の第２音響信号の各々について、当該第２音響信号の各フレームについて算定された複数の相互相関係数の代表値をマスキングの効果指標として算定する指標算定手段と、指標算定手段が算定した効果指標に応じて複数種のマスカー音の何れかを選択する選択手段（例えば選択部４０）とを具備する。以上の構成でも、本発明のマスキング解析装置と同様の作用および効果が実現される。 The present invention is also realized as a masker sound selection device that selects any one of a plurality of types of masker sounds using the masking analysis device according to each of the above aspects. The masker sound selection apparatus of the present invention uses the autocorrelation sequence of the line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal, the first acoustic signal indicating the target sound, and the mixture of the different types of masker sound and target sound. Autocorrelation calculating means for calculating each frame on the time axis for each of the plurality of second acoustic signals indicating sound, and for each of the plurality of second acoustic signals, the autocorrelation number sequence of the second acoustic signal and the second A cross-correlation calculating means for calculating a cross-correlation coefficient with an autocorrelation sequence of one acoustic signal for each corresponding frame; and for each of a plurality of second acoustic signals, a calculation is performed for each frame of the second acoustic signal. The index calculation means for calculating the representative value of multiple cross-correlation coefficients as the masking effect index, and one of multiple types of masker sound is selected according to the effect index calculated by the index calculation means. To that and a selection means (e.g., selecting section 40). Even with the above configuration, the same operation and effect as the masking analysis apparatus of the present invention can be realized.

また、本発明は、複数種のマスカー音の何れかを利用してターゲット音をマスキングするマスキング装置（例えばマスキング装置２００）としても実現される。本発明のマスキング装置は、音響信号のスペクトルの各ピークに対応する線スペクトル列の自己相関数列を、ターゲット音を示す第１音響信号と、相異なる種類のマスカー音とターゲット音との混合音を示す複数の第２音響信号の各々とについて、時間軸上のフレーム毎に算定する自己相関算定手段と、複数の第２音響信号の各々について、当該第２音響信号の自己相関数列と第１音響信号の自己相関数列との相互相関係数を相互に対応するフレーム毎に算定する相互相関算定手段と、複数の第２音響信号の各々について、当該第２音響信号の各フレームについて算定された複数の相互相関係数の代表値をマスキングの効果指標として算定する指標算定手段と、指標算定手段が算定した効果指標に応じて複数種のマスカー音の何れかを選択して放音装置から放音する選択手段（例えば選択部４０）とを具備する。以上の構成でも、本発明のマスキング解析装置と同様の作用および効果が実現される。 The present invention is also realized as a masking device (for example, masking device 200) that masks a target sound using any one of a plurality of types of masker sounds. The masking device of the present invention uses a first acoustic signal indicating a target sound, a mixed sound of different types of masker sound and target sound, an autocorrelation number sequence of a line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal. Autocorrelation calculating means for calculating each of the plurality of second acoustic signals shown for each frame on the time axis, and for each of the plurality of second acoustic signals, the autocorrelation sequence of the second acoustic signal and the first acoustic A cross-correlation calculating means for calculating a cross-correlation coefficient with the autocorrelation sequence of the signal for each frame corresponding to each other, and a plurality of the plurality of second acoustic signals calculated for each frame of the second acoustic signal An index calculation means for calculating a representative value of the cross-correlation coefficient as an effect index for masking, and selecting one of multiple types of masker sounds according to the effect index calculated by the index calculation means Selecting means for sound from the sound emitting device (for example, the selection unit 40); and a. Even with the above configuration, the same operation and effect as the masking analysis apparatus of the present invention can be realized.

以上の各態様に係るマスキング解析装置は、音声の合成に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）で実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働でも実現される。本発明のプログラムは、マスカー音によるターゲット音のマスキングを解析するために、音響信号のスペクトルの各ピークに対応する線スペクトル列の自己相関数列を、ターゲット音を示す第１音響信号と、ターゲット音およびマスカー音の混合音を示す第２音響信号との各々についてフレーム毎に算定する自己相関算定処理と、第１音響信号および第２音響信号において相互に対応するフレーム毎に第１音響信号の自己相関数列と第２音響信号の自己相関数列との相互相関係数を算定する相互相関算定処理と、各フレームについて算定された複数の相互相関係数の代表値を、マスカー音によるターゲット音のマスキングの効果指標として算定する指標算定処理とをコンピュータに実行させる。以上のプログラムによれば、本発明のマスキング解析装置と同様の作用および効果が実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。 The masking analysis apparatus according to each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to speech synthesis, and general-purpose arithmetic processing such as a CPU (Central Processing Unit). It is also realized by cooperation between the device and the program. In order to analyze masking of a target sound by a masker sound, the program of the present invention uses an autocorrelation sequence of a line spectrum sequence corresponding to each peak of a spectrum of an acoustic signal, a first acoustic signal indicating the target sound, and a target sound. And autocorrelation calculation processing for each frame for each of the second acoustic signal indicating the mixed sound of the masker sound, and the self of the first acoustic signal for each frame corresponding to each other in the first acoustic signal and the second acoustic signal. Cross correlation calculation processing for calculating the cross correlation coefficient between the correlation number sequence and the autocorrelation number sequence of the second acoustic signal, and representative values of a plurality of cross correlation coefficients calculated for each frame are used to mask the target sound by masker sound. The computer executes an index calculation process for calculating as an effect index of the computer. According to the above program, the same operation and effect as the masking analysis apparatus of the present invention are realized. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

本発明の第１実施形態に係るマスキング解析装置のブロック図である。1 is a block diagram of a masking analysis apparatus according to a first embodiment of the present invention. 自己相関算定部のブロック図である。It is a block diagram of an autocorrelation calculation part. マスキング解析装置の動作の説明図である。It is explanatory drawing of operation | movement of a masking analyzer. 線スペクトル列を生成する動作のフローチャートである。It is a flowchart of the operation | movement which produces | generates a line spectrum sequence. 第１実施形態の効果指標と会話了解度の実測値との関係を示すグラフである。It is a graph which shows the relationship between the effect index of 1st Embodiment, and the measured value of conversation intelligibility. 第２実施形態におけるマスキング解析装置のブロック図である。It is a block diagram of the masking analysis apparatus in 2nd Embodiment. 第３実施形態の効果指標と会話了解度の実測値との関係を示すグラフである。It is a graph which shows the relationship between the effect index of 3rd Embodiment, and the measured value of conversation intelligibility. 第３実施形態における表示例を示す模式図である。It is a schematic diagram which shows the example of a display in 3rd Embodiment. 第４実施形態における表示例を示す模式図である。It is a schematic diagram which shows the example of a display in 4th Embodiment. 第５実施形態に係るマスキング装置のブロック図である。It is a block diagram of the masking apparatus which concerns on 5th Embodiment. 狭帯域包絡相関の計算値と会話了解度の実測値との関係を示すグラフである。It is a graph which shows the relationship between the calculated value of a narrow-band envelope correlation, and the measured value of conversation intelligibility.

＜第１実施形態＞
図１は、本発明の第１実施形態に係るマスキング解析装置１００のブロック図である。マスキング解析装置１００は、マスカー音ＶMを使用したターゲット音ＶTのマスキングの効果を解析する音響処理装置であり、図１に示すように、演算処理装置１２と記憶装置１４と表示装置１６とを含むコンピュータシステムで実現される。表示装置１６は、例えば液晶表示パネルで構成され、演算処理装置１２から指示された画像を表示する。 <First Embodiment>
FIG. 1 is a block diagram of a masking analysis apparatus 100 according to the first embodiment of the present invention. The masking analysis device 100 is an acoustic processing device that analyzes the effect of masking the target sound VT using the masker sound VM, and includes an arithmetic processing device 12, a storage device 14, and a display device 16, as shown in FIG. Realized in a computer system. The display device 16 is composed of a liquid crystal display panel, for example, and displays an image instructed from the arithmetic processing device 12.

記憶装置１４は、演算処理装置１２が実行するプログラムＰGMと演算処理装置１２が使用する各種のデータとを記憶する。例えば半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶装置１４として採用され得る。 The storage device 14 stores a program PGM executed by the arithmetic processing device 12 and various data used by the arithmetic processing device 12. For example, a known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media may be employed as the storage device 14.

記憶装置１４には、音響信号ｓ1(t)および音響信号ｓ2(t)が格納される。音響信号ｓ1(t)は、マスキングの対象となるターゲット音ＶTの時間波形を示す音声信号である。他方、音響信号ｓ2(t)は、音響信号ｓ1(t)が示すターゲット音ＶTにマスカー音ＶMを重畳（加算）した音の時間波形を示す信号（すなわちマスキング後の信号）である。すなわち、音響信号ｓ1(t)はマスキング前の音声に対応する。例えば収音機器を利用して事前に収録された音響信号ｓ1(t)および音響信号ｓ2(t)が記憶装置１４に格納される。なお、収音機器が収音した音声信号を逐次的に（例えば所定の時間長の区間毎に）音響信号ｓ1(t)や音響信号ｓ2(t)として取得して略実時間的に処理することも可能である。 The storage device 14 stores an acoustic signal s1 (t) and an acoustic signal s2 (t). The acoustic signal s1 (t) is an audio signal indicating a time waveform of the target sound VT to be masked. On the other hand, the acoustic signal s2 (t) is a signal indicating a time waveform of a sound obtained by superimposing (adding) the masker sound VM on the target sound VT indicated by the acoustic signal s1 (t) (that is, a signal after masking). That is, the acoustic signal s1 (t) corresponds to the sound before masking. For example, an acoustic signal s1 (t) and an acoustic signal s2 (t) recorded in advance using a sound collection device are stored in the storage device 14. Note that the sound signal collected by the sound collecting device is acquired sequentially (for example, for each section of a predetermined time length) as the acoustic signal s1 (t) or the acoustic signal s2 (t) and processed in substantially real time. It is also possible.

演算処理装置１２は、記憶装置１４に格納されたプログラムＰGMを実行することで、マスカー音ＶMによるマスキングの効果を示す指標値（以下では「効果指標」と表記する）αを算定および出力するための複数の機能（自己相関算定部２２，相互相関算定部２４，指標算定部２６，表示制御部２８）を実現する。効果指標αは、情報マスキングに関するマスカー音ＶMの有効性の指標として好適な変数であり、概略的にはマスキングの前後の音響信号ｓ1(t)および音響信号ｓ2(t)を対比することで算定される。なお、演算処理装置１２の一部の機能を専用の電子回路（ＤＳＰ）が実現する構成や、演算処理装置１２の各機能を複数の集積回路に分散した構成も採用され得る。 The arithmetic processing unit 12 calculates and outputs an index value (hereinafter referred to as “effect index”) α indicating the effect of masking by the masker sound VM by executing the program PGM stored in the storage device 14. Are realized (autocorrelation calculation unit 22, cross-correlation calculation unit 24, index calculation unit 26, display control unit 28). The effect index α is a suitable variable as an index of the effectiveness of the masker sound VM for information masking, and is roughly calculated by comparing the acoustic signal s1 (t) and the acoustic signal s2 (t) before and after masking. Is done. A configuration in which a dedicated electronic circuit (DSP) realizes a part of the functions of the arithmetic processing device 12 or a configuration in which the functions of the arithmetic processing device 12 are distributed over a plurality of integrated circuits may be employed.

ところで、ターゲット音ＶTと同じ発声者の音声波形を時間軸方向に逆転させた逆転音声をマスカー音ＶMとして適用した場合に情報マスキングの効果は顕著である。逆転音声とターゲット音ＶTとは発声者が共通するから、逆転音声をマスカー音ＶMとして利用したマスキングの前後の音声では音声の長期的な調波構造（基音成分と複数の倍音成分との系列）は殆ど変化しない。以上の傾向を考慮すると、情報マスキングの作用は、調波構造の時間遷移がマスキングの前後で相違することに関係すると推察される。すなわち、調波構造の時間遷移がマスキングの前後で変化するほど情報マスキングの効果は大きい。以上の知見から、本実施形態では、音響信号ｓ1(t)の調波構造の時間遷移と音響信号ｓ2(t)の調波構造の時間遷移とを相互に対比することで効果指標αを算定する。 By the way, the effect of the information masking is remarkable when the reverse voice obtained by reversing the voice waveform of the same speaker as the target sound VT in the time axis direction is applied as the masker sound VM. Since the reverse voice and the target sound VT are the same for the speaker, the long-term harmonic structure of the voice before and after masking using the reverse voice as the masker sound VM (sequence of fundamental and multiple harmonic components) Hardly changes. Considering the above tendency, it is surmised that the action of information masking is related to the time transition of the harmonic structure being different before and after masking. That is, the effect of information masking is greater as the time transition of the harmonic structure changes before and after masking. From the above knowledge, in this embodiment, the effect index α is calculated by comparing the time transition of the harmonic structure of the acoustic signal s1 (t) with the time transition of the harmonic structure of the acoustic signal s2 (t). To do.

図１の自己相関算定部２２は、所定の時間長のＭ個のフレームの各々について音響信号ｓ1(t)の自己相関数列Ａ1[m]（Ａ1[1]〜Ａ1[M]）と音響信号ｓ2(t)の自己相関数列Ａ2[m]（Ａ2[1]〜Ａ2[M]）とを算定する（ｍ＝１〜Ｍ）。自己相関数列Ａ1[m]は、音響信号ｓ1(t)のうち第ｍ番目のフレームでの調波構造を反映した数値列であり、自己相関数列Ａ2[m]は、音響信号ｓ2(t)のうち第ｍ番目のフレームでの調波構造を反映した数値列である。なお、自己相関算定部２２では、音響信号ｓ1(t)および音響信号ｓ2(t)の各々について同様の処理が実行される。そこで、以下の説明では、音響信号ｓ1(t)および音響信号ｓ2(t)の各々を添字ｉ（ｉ＝１,２）により便宜的に音響信号ｓi(t)と表記して、音響信号ｓ1(t)および音響信号ｓ2(t)の双方に共通する事項を包括的に説明する。 The autocorrelation calculation unit 22 in FIG. 1 performs the autocorrelation sequence A1 [m] (A1 [1] to A1 [M]) of the acoustic signal s1 (t) and the acoustic signal for each of M frames having a predetermined time length. The autocorrelation sequence A2 [m] (A2 [1] to A2 [M]) of s2 (t) is calculated (m = 1 to M). The autocorrelation sequence A1 [m] is a numeric sequence reflecting the harmonic structure in the mth frame of the acoustic signal s1 (t), and the autocorrelation sequence A2 [m] is the acoustic signal s2 (t). Is a numerical string reflecting the harmonic structure in the m-th frame. The autocorrelation calculation unit 22 performs the same processing for each of the acoustic signal s1 (t) and the acoustic signal s2 (t). Therefore, in the following description, each of the acoustic signal s1 (t) and the acoustic signal s2 (t) is represented as an acoustic signal si (t) for convenience by the suffix i (i = 1, 2), and the acoustic signal s1. Matters common to both (t) and the acoustic signal s2 (t) will be comprehensively described.

図２は、自己相関算定部２２の詳細なブロック図である。図２に示すように、自己相関算定部２２は、区間設定部３２と周波数分析部３４と相関分析部３６とを含んで構成される。区間設定部３２は、音響信号ｓi(t)に所定の時間窓を乗算することで、音響信号ｓi(t)を、図３に示すように、相異なるフレームに対応するＭ個の区間信号ｑi[m]（ｑi[1]〜ｑi[M]）に区分する。各フレームは、例えば２０ミリ秒から３０ミリ秒程度の時間長に設定されて時間軸上で相互に重複する。なお、音響信号ｓi(t)の例えば基本周波数に応じて各フレームの時間長を可変に制御することも可能である。 FIG. 2 is a detailed block diagram of the autocorrelation calculation unit 22. As shown in FIG. 2, the autocorrelation calculation unit 22 includes a section setting unit 32, a frequency analysis unit 34, and a correlation analysis unit 36. The section setting unit 32 multiplies the acoustic signal si (t) by a predetermined time window, so that the acoustic signal si (t) is converted into M section signals qi corresponding to different frames as shown in FIG. [m] (qi [1] to qi [M]). Each frame is set to a time length of about 20 milliseconds to 30 milliseconds, for example, and overlaps each other on the time axis. Note that the time length of each frame can be variably controlled in accordance with, for example, the fundamental frequency of the acoustic signal si (t).

図２の周波数分析部３４は、Ｍ個のフレームの各々について区間信号ｑi[m]のスペクトルＱi[m]の各ピークに対応する線スペクトル列Ｌi[m]（Ｌi[1]〜Ｌi[M]）を算定する。線スペクトル列Ｌi[m]は、図２に示すように、区間信号ｑi[m]のスペクトルＱi[m]の振幅値（絶対値）がピークとなるＬN個の周波数Ｆpの各々に配置されて強度が所定値（１）に正規化されたスペクトル線の系列である。 The frequency analysis unit 34 in FIG. 2 performs a line spectrum sequence Li [m] (Li [1] to Li [M] corresponding to each peak of the spectrum Qi [m] of the section signal qi [m] for each of M frames. ]). As shown in FIG. 2, the line spectrum sequence Li [m] is arranged at each of the LN frequencies Fp at which the amplitude value (absolute value) of the spectrum Qi [m] of the section signal qi [m] peaks. It is a series of spectral lines whose intensity is normalized to a predetermined value (1).

図４は、周波数分析部３４が音響信号ｓi(t)の第ｍ番目のフレーム（区間信号ｑi[m]）について線スペクトル列Ｌi[m]を生成する処理のフローチャートである。各音響信号ｓi(t)のＭ個の区間信号ｑi[1]〜ｑi[M]の各々について図４の処理が実行される。 FIG. 4 is a flowchart of processing in which the frequency analysis unit 34 generates a line spectrum sequence Li [m] for the mth frame (section signal qi [m]) of the acoustic signal si (t). The process shown in FIG. 4 is executed for each of the M section signals qi [1] to qi [M] of each acoustic signal si (t).

周波数分析部３４は、１本のスペクトル線を指示する変数ｘを１に初期化し（ＳA1）、変数ｘが所定値ＬNを下回るか否かを判定する（ＳA2）。図４の処理を開始した直後の段階では変数ｘは所定値ＬNを下回る。変数ｘが所定値ＬNを下回る場合、周波数分析部３４は、区間信号ｑi[m]のスペクトル（複素スペクトル）Ｑi[m]を算定する（ＳA3）。スペクトルＱi[m]の算定には、離散フーリエ変換等の公知の周波数分析が任意に採用される。 The frequency analysis unit 34 initializes a variable x indicating one spectral line to 1 (SA1), and determines whether the variable x falls below a predetermined value LN (SA2). At the stage immediately after the start of the process of FIG. 4, the variable x is below the predetermined value LN. When the variable x falls below the predetermined value LN, the frequency analysis unit 34 calculates the spectrum (complex spectrum) Qi [m] of the section signal qi [m] (SA3). For calculation of the spectrum Qi [m], a known frequency analysis such as discrete Fourier transform is arbitrarily employed.

周波数分析部３４は、ステップＳA3で算定したスペクトルＱi[m]の振幅スペクトル|Ｑi[m]|において振幅値が最大となる１個のピークの周波数Ｆpを特定および記憶し（ＳA4）、ステップＳA3で算定したスペクトルＱi[m]のうちステップＳA4で特定した周波数Ｆp以外の各周波数の強度をゼロに設定したスペクトルＲi[m]を生成する（ＳA5）。そして、周波数分析部３４は、スペクトルＲi[m]を例えば逆フーリエ変換で時間領域の音響信号ｒi[m]に変換し（ＳA6）、変換後の音響信号ｒi[m]を現段階の区間信号ｑi[m]から減算する（ＳA7）。 The frequency analysis unit 34 specifies and stores the frequency Fp of one peak having the maximum amplitude value in the amplitude spectrum | Qi [m] | of the spectrum Qi [m] calculated in step SA3 (SA4), and step SA3. A spectrum Ri [m] in which the intensity of each frequency other than the frequency Fp specified in step SA4 is set to zero among the spectrum Qi [m] calculated in step S4 is generated (SA5). Then, the frequency analysis unit 34 converts the spectrum Ri [m] into an acoustic signal ri [m] in the time domain by, for example, inverse Fourier transform (SA6), and the converted acoustic signal ri [m] is the current section signal. Subtract from qi [m] (SA7).

周波数分析部３４は、変数ｘに１を加算したうえで処理をステップＳA2に移行し（ＳA8）、加算後の変数ｘが依然として所定値ＬNを下回る場合には（ＳA2：YES）、直前のステップＳA7での処理後の区間信号ｑi[m]についてステップＳA3からステップＳA8の処理を反復する。すなわち、区間信号ｑi[m]について特定した周波数Ｆpの総数が所定値ＬNに到達するまで、区間信号ｑi[m]から周波数Ｆpの音響成分を逐次的に除外しながらスペクトルＱi[m]の振幅値のピークの周波数Ｆpを特定する処理が反復される。 The frequency analysis unit 34 adds 1 to the variable x and proceeds to step SA2 (SA8). If the variable x after addition still falls below the predetermined value LN (SA2: YES), the immediately preceding step The processing from step SA3 to step SA8 is repeated for the section signal qi [m] after the processing at SA7. That is, the amplitude of the spectrum Qi [m] is sequentially excluded while the acoustic components of the frequency Fp are sequentially excluded from the section signal qi [m] until the total number of frequencies Fp specified for the section signal qi [m] reaches a predetermined value LN. The process of specifying the peak frequency Fp of the value is repeated.

周波数Ｆpの総数が所定値ＬNに到達すると（ＳA2：NO）、周波数分析部３４は、周波数軸上に離散的に設定されたＫ個の周波数（周波数帯域）のうちステップＳA4で区間信号ｑi[m]について特定したＬN個の周波数Ｆpの各々に強度１に正規化されたスペクトル線を設定した線スペクトル列Ｌi[m]を生成する（ＳA9）。Ｋ個の周波数のうちＬN個の周波数Ｆp以外の各周波数の強度はゼロに設定される。以上が線スペクトル列Ｌi[m]の算定方法である。なお、線スペクトル列Ｌi[m]の算定については例えばY.Hara, M. Matsumoto, and K. Miyoshi, "Method for estimating pitch independently from power spectrum envelope for speech and music signal", J. Temporal Design in Architecuture and the Environment 9(1) 121-124 (2009)にも開示されている。 When the total number of the frequencies Fp reaches the predetermined value LN (SA2: NO), the frequency analyzing unit 34 selects the section signal qi [in step SA4 among the K frequencies (frequency bands) discretely set on the frequency axis. A line spectrum sequence Li [m] in which a spectrum line normalized to an intensity of 1 is set to each of the LN frequencies Fp specified for m] is generated (SA9). Among the K frequencies, the intensity of each frequency other than LN frequencies Fp is set to zero. The above is the calculation method of the line spectrum sequence Li [m]. For example, Y. Hara, M. Matsumoto, and K. Miyoshi, “Method for controlling pitch independently from power spectrum envelope for speech and music signal”, J. Temporal Design in Architecuture. and the Environment 9 (1) 121-124 (2009).

図２の相関分析部３６は、図３に示すように、周波数分析部３４が各音響信号ｓi(t)のフレーム毎に生成した線スペクトル列Ｌi[m]について自己相関数列Ａi[m]（Ａi[1]〜Ａi[M]）を算定する。自己相関数列（自己相関関数）Ａi[m]は、周波数軸上のＫ個の周波数の各々に対応する自己相関係数ｐi[m,k]（ｐi[m,1]〜ｐi[m,K]）の系列（Ｋ次ベクトル）である。 As shown in FIG. 3, the correlation analysis unit 36 in FIG. 2 uses the autocorrelation number sequence Ai [m] (for the line spectrum sequence Li [m] generated by the frequency analysis unit 34 for each frame of each acoustic signal si (t). Ai [1] to Ai [M]) are calculated. The autocorrelation sequence (autocorrelation function) Ai [m] is an autocorrelation coefficient pi [m, k] (pi [m, 1] to pi [m, K) corresponding to each of the K frequencies on the frequency axis. ]) Series (Kth order vector).

周波数分析部３４が生成する線スペクトル列Ｌi[m]は、区間信号ｑi[m]において振幅値がピークとなる各周波数Ｆpに配置されたスペクトル線で構成されるから、線スペクトル列Ｌi[m]の自己相関数列Ａi[m]は、音響信号ｓi(t)の各フレームでの調波構造を強調したスペクトルを近似する。すなわち、自己相関数列Ａi[m]の自己相関係数ｐi[m,1]〜ｐi[m,K]の系列には、音響信号ｓi(t)の基本周波数に相当する間隔でピークが出現する。音響信号ｓ1(t)および音響信号ｓ2(t)の各々についてフレーム毎（区間信号ｑi[m]毎）に以上の処理が実行されることで、音響信号ｓ1(t)の各フレームに対応するＭ個の自己相関数列Ａ1[1]〜Ａ1[M]と、音響信号ｓ2(t)の各フレームに対応するＭ個の自己相関数列Ａ2[1]〜Ａ2[M]とが生成される。 Since the line spectrum sequence Li [m] generated by the frequency analysis unit 34 is composed of spectral lines arranged at the respective frequencies Fp at which the amplitude value peaks in the section signal qi [m], the line spectrum sequence Li [m]. ] Of the autocorrelation sequence Ai [m] approximates a spectrum that emphasizes the harmonic structure in each frame of the acoustic signal si (t). That is, peaks appear at intervals corresponding to the fundamental frequency of the acoustic signal si (t) in the autocorrelation coefficient pi [m, 1] to pi [m, K] series of the autocorrelation sequence Ai [m]. . For each of the acoustic signal s1 (t) and the acoustic signal s2 (t), the above processing is executed for each frame (for each section signal qi [m]), thereby corresponding to each frame of the acoustic signal s1 (t). M autocorrelation sequence A1 [1] to A1 [M] and M autocorrelation sequence A2 [1] to A2 [M] corresponding to each frame of the acoustic signal s2 (t) are generated.

図１の相互相関算定部２４は、図３に示すように、音響信号ｓ1(t)と音響信号ｓ2(t)とにおいて時間軸上で相互に対応するフレーム間で、音響信号ｓ1(t)の自己相関数列Ａ1[m]と音響信号ｓ2(t)の自己相関数列Ａ2[m]との相互相関係数ρ[m]（ρ[1]〜ρ[M]）をＭ個のフレームの各々について算定する。相互相関係数ρ[m]は、音響信号ｓ1(t)の第ｍ番目のフレームの自己相関数列Ａ1[m]と音響信号ｓ2(t)の第ｍ番目のフレームの自己相関数列Ａ2[m]との類似の度合（すなわち音響信号ｓ1(t)と音響信号ｓ2(t)との間の調波構造の時間遷移の類似度）を示すスカラー量である。相互相関算定部２４は、例えば以下の数式(1)の演算で相互相関係数ρ[m]を算定する。
数式(1)の演算子Ｅ{ }は、周波数軸上のＫ個の周波数にわたる平均（典型的には相加平均）を意味する。また、数式(1)の記号δi[m,k]（δ1[m,k]，δ2[m,k]）は、以下の数式(2)の演算で算定される偏差を意味する。数式(2)の記号μi（μ1，μ2）は、第ｍ番目のフレームにおけるＫ個の周波数にわたる自己相関係数ｐi[m,k]（ｐi[m,1]〜ｐi[m,K]の系列）の平均である。
As shown in FIG. 3, the cross-correlation calculating unit 24 in FIG. 1 performs the acoustic signal s1 (t) between frames corresponding to each other on the time axis in the acoustic signal s1 (t) and the acoustic signal s2 (t). The cross-correlation coefficient ρ [m] (ρ [1] to ρ [M]) between the autocorrelation sequence A1 [m] and the autocorrelation sequence A2 [m] of the acoustic signal s2 (t) Calculate for each. The cross-correlation coefficient ρ [m] is an autocorrelation sequence A1 [m] of the mth frame of the acoustic signal s1 (t) and an autocorrelation sequence A2 [m] of the mth frame of the acoustic signal s2 (t). ] (Ie, the degree of temporal transition similarity of the harmonic structure between the acoustic signal s1 (t) and the acoustic signal s2 (t)). The cross-correlation calculating unit 24 calculates the cross-correlation coefficient ρ [m] by, for example, the following equation (1).
The operator E {} in Equation (1) means an average (typically an arithmetic average) over K frequencies on the frequency axis. Further, the symbol δi [m, k] (δ1 [m, k], δ2 [m, k]) in the equation (1) means a deviation calculated by the following equation (2). The symbol .mu.i (.mu.1, .mu.2) in Equation (2) is the autocorrelation coefficient pi [m, k] (pi [m, 1] to pi [m, K] over K frequencies in the mth frame. Series).

また、数式(1)の記号Ｐi[m]（Ｐ1[m]，Ｐ2[m]）は、以下の数式(3)で定義される通り、第ｍ番目のフレームに対応するＫ個の偏差δi[m,k]（δi[m,1]〜δi[m,K]）の自乗の平均であり、数式(3)の演算でフレーム毎に個別に算定される。したがって、自己相関数列Ａ1[m]と自己相関数列Ａ2[m]との相関が低いほど数式(1)の相互相関係数ρ[m]は小さい数値となる。
In addition, the symbol Pi [m] (P1 [m], P2 [m]) in Equation (1) is defined as K deviations δi corresponding to the mth frame, as defined by Equation (3) below. [m, k] (Δi [m, 1] to Δi [m, K]) is an average of squares, and is calculated for each frame by the calculation of Equation (3). Accordingly, the lower the correlation between the autocorrelation sequence A1 [m] and the autocorrelation sequence A2 [m], the smaller the cross-correlation coefficient ρ [m] in Equation (1).

図１の指標算定部２６は、相互相関算定部２４が算定したＭ個の相互相関係数ρ[1]〜ρ[M]を利用してマスカー音ＶMによるマスキングの効果指標αを算定する。第１実施形態の指標算定部２６は、Ｍ個の相互相関係数ρ[1]〜ρ[M]の平均（例えば相加平均）を効果指標αとして算定する。したがって、概略的には、音響信号ｓ1(t)と音響信号ｓ2(t)との間（マスキングの前後）で調波構造の時間遷移の相関が低い（すなわち情報マスキングの効果が高い）ほど効果指標αは小さい数値になるという傾向がある。表示制御部２８は、指標算定部２６が算定した効果指標αを表示装置１６に表示させる。 The index calculation unit 26 of FIG. 1 calculates the masking effect index α using the masker sound VM using the M cross-correlation coefficients ρ [1] to ρ [M] calculated by the cross-correlation calculation unit 24. The index calculation unit 26 of the first embodiment calculates an average (for example, an arithmetic average) of M cross-correlation coefficients ρ [1] to ρ [M] as an effect index α. Therefore, generally speaking, the lower the correlation of the time transition of the harmonic structure between the acoustic signal s1 (t) and the acoustic signal s2 (t) (before and after masking) (that is, the higher the effect of information masking), the more effective it is. The index α tends to be a small numerical value. The display control unit 28 causes the display device 16 to display the effect index α calculated by the index calculation unit 26.

図５は、マスカー音ＶMに対するターゲット音ＶTのエネルギー比（以下では「Ｔ/Ｍ比」と表記する）を変化させた複数の場合について、第１実施形態で算定された効果指標αと会話了解度の実測値との関係を示すグラフである。図１１と同様に、白色雑音をマスカー音ＶMとして利用した場合と逆転音声をマスカー音ＶMとして使用した場合とが図５では併記されている。各数値はＴ/Ｍ比の昇順で連結されている。 FIG. 5 shows the effect index α calculated in the first embodiment and the conversation understanding for a plurality of cases where the energy ratio of the target sound VT to the masker sound VM (hereinafter referred to as “T / M ratio”) is changed. It is a graph which shows the relationship with the measured value of a degree. Similarly to FIG. 11, the case where the white noise is used as the masker sound VM and the case where the reverse sound is used as the masker sound VM are shown in FIG. Each numerical value is connected in ascending order of T / M ratio.

図５から理解されるように、エネルギーマスキングに有効な白色雑音をマスカー音ＶMとして使用した場合、図５に線Ｚ1で示す通り、効果指標αの変化に対する会話了解度の変化は緩慢である。他方、逆転音声をマスカー音ＶMとして使用した場合、図５に線Ｚ2で示す通り、効果指標αの数値範囲の全域にわたり、効果指標αの変化に対して会話了解度は敏感に変化する。すなわち、情報マスキングに有効な逆転音声については、効果指標αが大きいほど会話了解度が高いという傾向が顕著に把握される。以上の傾向から、第１実施形態の効果指標αは、非特許文献１や非特許文献２の狭帯域包絡相関と比較すると、情報マスキングの定量的な評価指標として適切であることが理解される。すなわち、第１実施形態によれば、表示装置１６に表示された効果指標αを参照することで、マスカー音ＶMを利用した情報マスキングの効果を利用者が適切に評価できるという利点がある。 As can be understood from FIG. 5, when white noise effective for energy masking is used as the masker sound VM, the change of the speech intelligibility with respect to the change of the effect index α is slow as indicated by the line Z1 in FIG. On the other hand, when the reverse voice is used as the masker sound VM, as shown by the line Z2 in FIG. 5, the speech intelligibility changes sensitively with respect to the change of the effect index α over the entire numerical range of the effect index α. That is, for the reverse voice effective for information masking, the tendency that the greater the effect index α is, the higher the degree of intelligibility of the conversation is noticeable. From the above tendency, it is understood that the effect index α of the first embodiment is appropriate as a quantitative evaluation index of information masking when compared with the narrowband envelope correlation of Non-Patent Document 1 and Non-Patent Document 2. . That is, according to the first embodiment, referring to the effect index α displayed on the display device 16, there is an advantage that the user can appropriately evaluate the effect of information masking using the masker sound VM.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。なお、以下に例示する各形態において作用や機能が第１実施形態と同等である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In addition, about the element in which an effect | action and a function are equivalent to 1st Embodiment in each form illustrated below, the code | symbol referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図６は、第２実施形態に係るマスキング解析装置１００のブロック図である。図６に示すように、第２実施形態の記憶装置１４は、ターゲット音ＶT（マスキング前の音声）を示す音響信号ｓ1(t)のほか、ターゲット音ＶTとマスカー音ＶMとの混合音（マスキング後の音声）を示す２種類の音響信号ｓ2(t)（ｓ2(t)_A，ｓ2(t)_B）を記憶する。音響信号ｓ2(t)_Aのマスカー音ＶM_Aと音響信号ｓ2(t)_Bのマスカー音ＶM_Bとは種類（生成方法）が相違する。例えば、音響信号ｓ2(t)_Aのマスカー音ＶM_Aは逆転音声であり、音響信号ｓ2(t)_Bのマスカー音ＶM_Bは白色雑音である。 FIG. 6 is a block diagram of the masking analysis apparatus 100 according to the second embodiment. As shown in FIG. 6, the storage device 14 of the second embodiment has a mixed sound (masking) of the target sound VT and the masker sound VM in addition to the acoustic signal s1 (t) indicating the target sound VT (sound before masking). Two kinds of acoustic signals s2 (t) (s2 (t) _A, s2 (t) _B) indicating the later voice) are stored. The type (generation method) of the masker sound VM_A of the acoustic signal s2 (t) _A and the masker sound VM_B of the acoustic signal s2 (t) _B are different. For example, the masker sound VM_A of the acoustic signal s2 (t) _A is a reverse sound, and the masker sound VM_B of the acoustic signal s2 (t) _B is white noise.

第２実施形態では、音響信号ｓ1(t)と音響信号ｓ2(t)_Aとの間の効果指標αAと、音響信号ｓ1(t)と音響信号ｓ2(t)_Bとの間の効果指標αBとが個別に算定される。具体的には、自己相関算定部２２は、音響信号ｓ1(t)と音響信号ｓ2(t)_Aと音響信号ｓ2(t)_Bとの各々について自己相関数列Ａi[m]を算定し、相互相関算定部２４は、音響信号ｓ1(t)と音響信号ｓ2(t)_Aの間の相互相関係数ρA[1]〜ρA[M]と、音響信号ｓ1(t)と音響信号ｓ2(t)_Bとの間の相互相関係数ρB[1]〜ρB[M]とを算定する。指標算定部２６は、Ｍ個の相互相関係数ρA[1]〜ρA[M]からマスカー音ＶM_Aの効果指標αAを算定し、Ｍ個の相互相関係数ρB[1]〜ρB[M]からマスカー音ＶM_Bの効果指標αBを算定する。表示制御部２８は、効果指標αAと効果指標αBとを表示装置１６に表示させる。 In the second embodiment, an effect index αA between the acoustic signal s1 (t) and the acoustic signal s2 (t) _A, and an effect index αB between the acoustic signal s1 (t) and the acoustic signal s2 (t) _B. And are calculated separately. Specifically, the autocorrelation calculation unit 22 calculates the autocorrelation sequence Ai [m] for each of the acoustic signal s1 (t), the acoustic signal s2 (t) _A, and the acoustic signal s2 (t) _B, and The correlation calculation unit 24 performs cross-correlation coefficients ρA [1] to ρA [M] between the acoustic signal s1 (t) and the acoustic signal s2 (t) _A, the acoustic signal s1 (t), and the acoustic signal s2 (t ) _B, the cross-correlation coefficients ρB [1] to ρB [M] are calculated. The index calculator 26 calculates the effect index αA of the masker sound VM_A from the M cross-correlation coefficients ρA [1] to ρA [M], and M cross-correlation coefficients ρB [1] to ρB [M]. To calculate the effect index αB of the masker sound VM_B. The display control unit 28 causes the display device 16 to display the effect index αA and the effect index αB.

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態では、マスカー音ＶM_Aの効果指標αAとマスカー音ＶM_Bの効果指標αBとが個別に算定および表示されるから、マスカー音ＶM_Aおよびマスカー音ＶM_Bのうち情報マスキングに有効なマスカー音ＶM（効果指標αが小さいマスカー音）を利用者が容易に確認できるという利点がある。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, since the effect index αA of the masker sound VM_A and the effect index αB of the masker sound VM_B are calculated and displayed separately, the masker sound effective for information masking among the masker sound VM_A and the masker sound VM_B. There is an advantage that the user can easily confirm VM (a masker sound having a small effect index α).

＜第３実施形態＞
第３実施形態の記憶装置１４は、第２実施形態と同様に、音響信号ｓ1(t)と音響信号ｓ2(t)_Aと音響信号ｓ2(t)_Bとを記憶する。相互相関算定部２４は、音響信号ｓ1(t)と音響信号ｓ2(t)_Aとの間のＭ個の相互相関係数ρA[1]〜ρA[M]と、音響信号ｓ1(t)と音響信号ｓ2(t)_Bとの間のＭ個の相互相関係数ρB[1]〜ρB[M]とを第１実施形態と同様の方法で算定する。 <Third Embodiment>
The storage device 14 of the third embodiment stores the acoustic signal s1 (t), the acoustic signal s2 (t) _A, and the acoustic signal s2 (t) _B, as in the second embodiment. The cross-correlation calculating unit 24 includes M cross-correlation coefficients ρA [1] to ρA [M] between the acoustic signal s1 (t) and the acoustic signal s2 (t) _A, and the acoustic signal s1 (t). M cross-correlation coefficients ρB [1] to ρB [M] with the acoustic signal s2 (t) _B are calculated by the same method as in the first embodiment.

第３実施形態の指標算定部２６は、相互相関算定部２４が算定したＭ個の相互相関係数ρA[1]〜ρA[M]のσパーセンタイル値（以下の例示では変数σを７５（％）に設定した７５パーセンタイル値）をマスカー音ＶM_Aの効果指標αAとして算定する。すなわち、Ｍ個の相互相関係数ρA[1]〜ρA[M]を昇順に配列した場合にＭ個のσ％に相当する個数番目の相互相関係数ρA[m]が効果指標αAとして算定される。同様に、指標算定部２６は、Ｍ個の相互相関係数ρB[1]〜ρB[M]のσパーセンタイル値をマスカー音ＶM_Bの効果指標αBとして算定する。表示制御部２８は、指標算定部２６が算定した効果指標αAおよび効果指標αBを表示装置１６に表示させる。 The index calculation unit 26 of the third embodiment sets the σ percentile value of the M cross-correlation coefficients ρA [1] to ρA [M] calculated by the cross-correlation calculation unit 24 (in the following example, the variable σ is set to 75 (% ) Is calculated as an effect index αA of the masker sound VM_A. That is, when M cross-correlation coefficients ρA [1] to ρA [M] are arranged in ascending order, the number-th cross-correlation coefficient ρA [m] corresponding to M σ% is calculated as the effect index αA. Is done. Similarly, the index calculation unit 26 calculates the σ percentile value of the M cross-correlation coefficients ρB [1] to ρB [M] as the effect index αB of the masker sound VM_B. The display control unit 28 causes the display device 16 to display the effect index αA and the effect index αB calculated by the index calculation unit 26.

図７は、第３実施形態で算定される効果指標α（αA，αB）と会話了解度の実測値との関係を図５と同様の方法で示すグラフである。第１実施形態の効果指標αと同様に、第３実施形態の効果指標α（σパーセンタイル値）には、情報マスキングに有効な逆転音声をマスカー音ＶMとした場合に効果指標αの変化に対して会話了解度が敏感に変化するという傾向がある。したがって、第３実施形態においても、第１実施形態と同様に、情報マスキングの効果を適切に評価できるという利点がある。 FIG. 7 is a graph showing the relationship between the effect index α (αA, αB) calculated in the third embodiment and the measured value of the intelligibility in the same manner as in FIG. Similar to the effect index α of the first embodiment, the effect index α (σ percentile value) of the third embodiment is based on the change of the effect index α when the reverse voice effective for information masking is a masker sound VM. There is a tendency that conversation intelligibility changes sensitively. Therefore, the third embodiment also has an advantage that the effect of information masking can be appropriately evaluated as in the first embodiment.

第３実施形態の表示制御部２８は、相互相関算定部２４が算定したＭ個の相互相関係数ρA[1]〜ρA[M]について所定の階級毎の度数を算定するとともにＭ個の相互相関係数ρB[1]〜ρB[M]についても同様に度数を算定し、図８に示すように、相互相関係数ρA[1]〜ρA[M]の度数分布５０Aと相互相関係数ρB[1]〜ρB[M]の度数分布５０Bとを表示装置１６に表示させる。 The display control unit 28 of the third embodiment calculates the frequency for each predetermined class for the M cross-correlation coefficients ρA [1] to ρA [M] calculated by the cross-correlation calculation unit 24 and M mutual values. The frequencies are similarly calculated for the correlation coefficients ρB [1] to ρB [M], and as shown in FIG. 8, the frequency distribution 50A of the cross-correlation coefficients ρA [1] to ρA [M] and the cross-correlation coefficient The frequency distribution 50B of ρB [1] to ρB [M] is displayed on the display device 16.

利用者は、表示装置１６に表示された度数分布５０Aと度数分布５０Bとを対比することで、マスカー音ＶM_Aおよびマスカー音ＶM_Bの各々による情報マスキングの効果を直観的に対比することが可能である。例えば、音響信号ｓ1(t)と音響信号ｓ2(t)とで自己相関数列Ａi[m]の相関が低い（すなわちマスカー音ＶMによる情報マスキングの効果が高い）ほど、相互相関係数ρ[m]は数値が小さい範囲に偏在するという傾向がある。図８の度数分布５０Aと度数分布５０Bとを対比すると、数値が小さい範囲に度数が偏在するという傾向は度数分布５０Aのほうが顕著である。したがって、利用者は、度数分布５０Aに対応するマスカー音ＶM_A（逆転音声）がマスカー音ＶM_B（白色雑音）と比較して情報マスキングに有効であると直観的に判断できる。 The user can intuitively compare the effect of information masking by each of the masker sound VM_A and the masker sound VM_B by comparing the frequency distribution 50A and the frequency distribution 50B displayed on the display device 16. . For example, the lower the correlation of the autocorrelation sequence Ai [m] between the acoustic signal s1 (t) and the acoustic signal s2 (t) (that is, the higher the effect of information masking by the masker sound VM), the cross-correlation coefficient ρ [m ] Tend to be unevenly distributed in a small range. When the frequency distribution 50A and the frequency distribution 50B in FIG. 8 are compared, the frequency distribution 50A has a more prominent tendency that the frequencies are unevenly distributed in a small numerical value range. Therefore, the user can intuitively determine that the masker sound VM_A (reverse sound) corresponding to the frequency distribution 50A is more effective for information masking than the masker sound VM_B (white noise).

また、第３実施形態の表示制御部２８は、図８に示すように、相互相関係数ρA[1]〜ρA[M]の累積度数分布５２Aと相互相関係数ρB[1]〜ρB[M]の累積度数分布５２Bとを表示装置１６に表示させ、Ｍ個のσ％に相当する度数を示す直線５４を累積度数分布５２Aおよび累積度数分布５２Bに重ねて配置する。累積度数分布５２Aと直線５４との交点ＣAに対応する階級値が効果指標αA（すなわち相互相関係数ρA[m]のσパーセンタイル値）に相当し、累積度数分布５２Bと直線５４との交点ＣBに対応する階級値が効果指標αBに相当する。 In addition, as shown in FIG. 8, the display control unit 28 of the third embodiment has a cumulative frequency distribution 52A of cross-correlation coefficients ρA [1] to ρA [M] and cross-correlation coefficients ρB [1] to ρB [ The cumulative frequency distribution 52B of M] is displayed on the display device 16, and a straight line 54 indicating the frequency corresponding to M σ% is arranged so as to overlap the cumulative frequency distribution 52A and the cumulative frequency distribution 52B. The class value corresponding to the intersection CA of the cumulative frequency distribution 52A and the straight line 54 corresponds to the effect index αA (that is, the σ percentile value of the cross-correlation coefficient ρA [m]), and the intersection CB of the cumulative frequency distribution 52B and the straight line 54 The class value corresponding to is equivalent to the effect index αB.

利用者は、累積度数分布５２Aと累積度数分布５２Bとを対比することで、マスカー音ＶM_Aおよびマスカー音ＶM_Bの各々による情報マスキングの効果を直観的に把握することが可能である。例えば、交点ＣAは交点ＣBと比較して小さい階級値に対応するから、数値の小さい範囲に度数が偏在するという傾向は、相互相関係数ρB[m]よりも相互相関係数ρA[m]のほうが顕著である。したがって、利用者は、累積度数分布５２Aに対応するマスカー音ＶM_A（逆転音声）がマスカー音ＶM_B（白色雑音）と比較して情報マスキングに有効であると直観的に判断できる。 The user can intuitively grasp the effect of information masking by the masker sound VM_A and the masker sound VM_B by comparing the cumulative frequency distribution 52A and the cumulative frequency distribution 52B. For example, since the intersection CA corresponds to a small class value compared to the intersection CB, the tendency that the frequency is unevenly distributed in a small numerical value range is the cross-correlation coefficient ρA [m] rather than the cross-correlation coefficient ρB [m]. Is more prominent. Therefore, the user can intuitively determine that the masker sound VM_A (reverse sound) corresponding to the cumulative frequency distribution 52A is more effective for information masking than the masker sound VM_B (white noise).

＜第４実施形態＞
第４実施形態の記憶装置１４は、第２実施形態や第３実施形態と同様に、音響信号ｓ1(t)と音響信号ｓ2(t)_Aと音響信号ｓ2(t)_Bとを記憶する。表示制御部２８は、図９に示すように、音響信号ｓ1(t)と音響信号ｓ2(t)_Aと音響信号ｓ2(t)_Bとの各々について相関遷移画像６２と相関分布画像６４とを表示装置１６に表示させる。 <Fourth embodiment>
The storage device 14 of the fourth embodiment stores the acoustic signal s1 (t), the acoustic signal s2 (t) _A, and the acoustic signal s2 (t) _B, as in the second embodiment and the third embodiment. As shown in FIG. 9, the display control unit 28 generates a correlation transition image 62 and a correlation distribution image 64 for each of the acoustic signal s1 (t), the acoustic signal s2 (t) _A, and the acoustic signal s2 (t) _B. It is displayed on the display device 16.

相関遷移画像６２は、周波数軸（縦軸）と時間軸（横軸）とが設定された領域内に自己相関数列Ａi[m]の時系列（すなわち調波構造の時間遷移）を表現した画像である。自己相関数列Ａi[m]の各自己相関係数ｐi[m,k]の数値は、相関遷移画像６２内の各点の階調や色彩で表現される。すなわち、時間軸上で第ｍ番目のフレームに相当する地点と周波数軸上で第ｋ番目の周波数に相当する地点とに対応する座標点の階調や色彩は、第ｍ番目のフレームの自己相関数列Ａi[m]を構成するＫ個の自己相関係数ｐi[m,1]〜ｐi[m,K]のうち第ｋ番目の周波数に対応する自己相関係数ｐi[m,k]の数値に応じて決定される。 The correlation transition image 62 is an image representing a time series of autocorrelation sequence Ai [m] (that is, time transition of a harmonic structure) in a region in which a frequency axis (vertical axis) and a time axis (horizontal axis) are set. It is. The numerical value of each autocorrelation coefficient pi [m, k] in the autocorrelation sequence Ai [m] is expressed by the gradation and color of each point in the correlation transition image 62. That is, the gradation and color of the coordinate point corresponding to the point corresponding to the mth frame on the time axis and the point corresponding to the kth frequency on the frequency axis are the autocorrelation of the mth frame. Numerical value of the autocorrelation coefficient pi [m, k] corresponding to the kth frequency among the K autocorrelation coefficients pi [m, 1] to pi [m, K] constituting the sequence Ai [m]. It is decided according to.

相関分布画像６４は、Ｍ個のフレームにわたる自己相関係数ｐi[1,k]〜ｐi[M,k]の合計値（累積度数）を周波数軸上（縦軸）上で示す画像である。すなわち、相関分布画像６４は、周波数軸上のＫ個の周波数の各々に対応する直線を含んで構成され、第ｋ番目の周波数に対応する１本の直線の全長は、その周波数の自己相関係数ｐi[m,k]をＭ個のフレームについて合計した数値（ｐi[1,k]＋ｐi[2,k]＋……＋ｐi[M,k]）に応じて選定される。 The correlation distribution image 64 is an image showing the total value (cumulative frequency) of the autocorrelation coefficients pi [1, k] to pi [M, k] over M frames on the frequency axis (vertical axis). That is, the correlation distribution image 64 is configured to include straight lines corresponding to each of K frequencies on the frequency axis, and the total length of one straight line corresponding to the kth frequency is the self-phase relationship of the frequencies. The number pi [m, k] is selected in accordance with a numerical value (pi [1, k] + pi [2, k] +... + Pi [M, k]) obtained by summing up the M frames.

自己相関数列Ａi[m]の時系列（調波構造の時間遷移）がマスキングの前後で変化するほど情報マスキングの効果が大きいという傾向がある。したがって、利用者は、音響信号ｓ2(t)_Aおよび音響信号ｓ2(t)_Bの各々の相関遷移画像６２を音響信号ｓ1(t)の相関遷移画像６２と対比することで、マスカー音ＶM_Aおよびマスカー音ＶM_Bの各々による情報マスキングの効果（自己相関数列Ａi[m]の時系列の異同）を視覚的に把握することが可能である。例えば、図９の例示において、音響信号ｓ2(t)_Aの相関遷移画像６２は、音響信号ｓ2(t)_Bの相関遷移画像６２と比較すると、音響信号ｓ1(t)の相関遷移画像６２との相違が大きい。すなわち、自己相関数列Ａi[m]の時系列がマスキングの前後で変化する度合は、音響信号ｓ2(t)_Bよりも音響信号ｓ2(t)_Aのほうが顕著である。したがって、利用者は、マスカー音ＶM_A（逆転音声）がマスカー音ＶM_B（白色雑音）と比較して情報マスキングに有効であると直観的に判断できる。また、音響信号ｓ2(t)_Aおよび音響信号ｓ2(t)_Bの各々の相関分布画像６４を音響信号ｓ1(t)の相関分布画像６４と対比することで、利用者は、Ｍ個のフレームにわたる長期的な調波構造の変化を、マスカー音ＶM_Aを利用した場合とマスカー音ＶM_Bを利用した場合とについて直観的に把握することが可能である。 As the time series of the autocorrelation sequence Ai [m] (time transition of the harmonic structure) changes before and after masking, the information masking effect tends to increase. Therefore, the user compares the correlation transition image 62 of each of the acoustic signal s2 (t) _A and the acoustic signal s2 (t) _B with the correlation transition image 62 of the acoustic signal s1 (t), so that the masker sound VM_A and It is possible to visually grasp the effect of information masking by each of the masker sounds VM_B (the time series difference of the autocorrelation sequence Ai [m]). For example, in the illustration of FIG. 9, the correlation transition image 62 of the acoustic signal s2 (t) _A is compared with the correlation transition image 62 of the acoustic signal s1 (t) as compared to the correlation transition image 62 of the acoustic signal s2 (t) _B. The difference is great. That is, the acoustic signal s2 (t) _A is more prominent than the acoustic signal s2 (t) _B to the extent that the time series of the autocorrelation sequence Ai [m] changes before and after masking. Therefore, the user can intuitively determine that the masker sound VM_A (reverse sound) is more effective for information masking than the masker sound VM_B (white noise). Further, by comparing the correlation distribution image 64 of each of the acoustic signal s2 (t) _A and the acoustic signal s2 (t) _B with the correlation distribution image 64 of the acoustic signal s1 (t), the user can obtain M frames. It is possible to intuitively grasp the long-term change in the harmonic structure over a case where the masker sound VM_A is used and a case where the masker sound VM_B is used.

＜第５実施形態＞
図１０は、本発明の第５実施形態に係るマスキング装置２００のブロック図である。第５実施形態のマスキング装置２００は、生成方法や大きさ（音圧）が相違する複数種のマスカー音ＶM（ＶM_A，ＶM_B）の何れかを選択して放音する装置であり、第２実施形態や第３実施形態のマスキング解析装置１００に選択部４０と放音装置４２とを追加した構成である。記憶装置１４は、マスカー音ＶM_Aの音声波形を示すマスカー音信号ｖ(t)_Aとマスカー音ＶM_Bの音声波形を示すマスカー音信号ｖ(t)_Bとを記憶する。 <Fifth Embodiment>
FIG. 10 is a block diagram of a masking apparatus 200 according to the fifth embodiment of the present invention. The masking device 200 of the fifth embodiment is a device that selects and emits a plurality of types of masker sounds VM (VM_A, VM_B) having different generation methods and magnitudes (sound pressures). This is a configuration in which a selection unit 40 and a sound emitting device 42 are added to the masking analysis device 100 of the embodiment or the third embodiment. The storage device 14 stores a masker sound signal v (t) _A indicating the sound waveform of the masker sound VM_A and a masker sound signal v (t) _B indicating the sound waveform of the masker sound VM_B.

第５実施形態の指標算定部２６は、第２実施形態または第３実施形態と同様に、マスカー音ＶM_Aの効果指標αAとマスカー音ＶM_Bの効果指標αBとを算定する。選択部４０は、指標算定部２６が算定した効果指標α（αA，αB）に応じてマスカー音ＶM_Aおよびマスカー音ＶM_Bの何れかを選択する。具体的には、選択部４０は、効果指標αAが小さいマスカー音ＶM（すなわち情報マスキングに有効なマスカー音ＶM）を選択する。そして、選択部４０は、効果指標αに応じて選択したマスカー音ＶMに対応するマスカー音信号ｖ(t)（ｖ(t)_A，ｖ(t)_B）を記憶装置１４から取得して放音装置４２に供給する。放音装置４２（例えばスピーカ装置）は、選択部４０から供給されるマスカー音信号ｖ(t)に応じてマスカー音ＶM（ＶM_A，ＶM_B）を音波として放射する。 The index calculation unit 26 of the fifth embodiment calculates the effect index αA of the masker sound VM_A and the effect index αB of the masker sound VM_B, as in the second or third embodiment. The selection unit 40 selects either the masker sound VM_A or the masker sound VM_B according to the effect index α (αA, αB) calculated by the index calculation unit 26. Specifically, the selection unit 40 selects a masker sound VM having a small effect index αA (that is, a masker sound VM effective for information masking). Then, the selection unit 40 acquires the masker sound signal v (t) (v (t) _A, v (t) _B) corresponding to the masker sound VM selected according to the effect index α from the storage device 14 and releases it. The sound device 42 is supplied. The sound emitting device 42 (for example, a speaker device) radiates a masker sound VM (VM_A, VM_B) as a sound wave according to the masker sound signal v (t) supplied from the selection unit 40.

第５実施形態においても第１実施形態と同様の効果が実現される。また、第５実施形態では、情報マスキングに有効なマスカー音ＶMが効果指標αに応じて自動的に選択および放音されるから、ターゲット音ＶTをマスキングしようとする利用者の負担を軽減することが可能である。 In the fifth embodiment, the same effect as in the first embodiment is realized. In the fifth embodiment, since the masker sound VM effective for information masking is automatically selected and emitted according to the effect index α, the burden on the user who tries to mask the target sound VT is reduced. Is possible.

なお、第５実施形態では、表示制御部２８および表示装置１６を省略することも可能である。選択部４０が効果指標αに応じたマスカー音ＶMを選択して例えば表示装置１６の表示により利用者に報知する構成（すなわち、マスカー音ＶMの放音を要件としないマスカー音選択装置）も採用され得る。また、以上の説明では、２種類のマスカー音ＶM（ＶM_A，ＶM_B）の何れかを選択する場合を例示したが、選択候補となるマスカー音ＶMの種類数は任意である。 In the fifth embodiment, the display control unit 28 and the display device 16 can be omitted. A configuration in which the selection unit 40 selects a masker sound VM corresponding to the effect index α and notifies the user by, for example, display on the display device 16 (that is, a masker sound selection device that does not require the sound emission of the masker sound VM) is also adopted. Can be done. In the above description, the case where one of the two types of masker sounds VM (VM_A, VM_B) is selected has been exemplified, but the number of types of masker sounds VM as selection candidates is arbitrary.

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を適宜に併合することも可能である。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more modes arbitrarily selected from the following examples can be appropriately combined.

（１）前述の各形態では、効果指標α（αA，αB）を表示装置１６に表示させたが、効果指標αの利用の方法は利用者に対する表示に限定されない。具体的には、第５実施形態のようにマスカー音ＶMの選択に効果指標αを適用する構成のほか、指標算定部２６が算定した効果指標αを音声で出力する構成や効果指標αを用紙に印刷する構成、あるいは通信網を介して他の通信端末に送信する構成も採用される。 (1) In each embodiment described above, the effect index α (αA, αB) is displayed on the display device 16, but the method of using the effect index α is not limited to display to the user. Specifically, in addition to the configuration in which the effect index α is applied to the selection of the masker sound VM as in the fifth embodiment, the configuration in which the effect index α calculated by the index calculation unit 26 is output by voice and the effect index α A configuration for printing on a network or a configuration for transmitting to another communication terminal via a communication network is also employed.

（２）第２実施形態から第５実施形態では２種類の音響信号ｓ2(t)（ｓ2(t)_A，ｓ2(t)_B）を例示したが、３種類以上の音響信号ｓ2(t)を用意した構成でも、各音響信号ｓ2(t)について以上の各形態と同様の処理を実行することで、各音響信号ｓ2(t)のマスカー音ＶMによる情報マスキングの効果を評価することが可能である。 (2) In the second to fifth embodiments, two types of acoustic signals s2 (t) (s2 (t) _A, s2 (t) _B) are illustrated, but three or more types of acoustic signals s2 (t) Even if the configuration is prepared, it is possible to evaluate the effect of the information masking by the masker sound VM of each acoustic signal s2 (t) by executing the same processing as in each of the above embodiments for each acoustic signal s2 (t). It is.

（３）第２実施形態から第５実施形態では、音響信号ｓ2(t)_Aと音響信号ｓ2(t)_Bとでマスカー音ＶMの種類を相違させたが、音響信号ｓ2(t)_Aと音響信号ｓ2(t)_BとでＴ/Ｍ比を相違させた構成も採用される。例えば、同種のマスカー音ＶMを相異なるＴ/Ｍ比でターゲット音ＶTのマスキングに適用して音響信号ｓ2(t)_Aおよび音響信号ｓ2(t)_Bを生成した場合、前述の各形態と同様に各音響信号ｓ(t)について効果指標αを算定および評価することで、情報マスキングの有効化という観点から最適なＴ/Ｍ比を特定することが可能である。すなわち、マスカー音の種類およびＴ/Ｍ比の少なくとも一方が相違する複数の音響信号ｓ2(t)の各々について音響信号ｓ1(t)との間で効果指標を算定する構成が好適である。 (3) In the second to fifth embodiments, the type of masker sound VM is different between the acoustic signal s2 (t) _A and the acoustic signal s2 (t) _B, but the acoustic signal s2 (t) _A A configuration in which the T / M ratio is different from that of the acoustic signal s2 (t) _B is also employed. For example, when the acoustic signal s2 (t) _A and the acoustic signal s2 (t) _B are generated by applying the same type of masker sound VM to the masking of the target sound VT with different T / M ratios, the same as the above-described embodiments In addition, by calculating and evaluating the effect index α for each acoustic signal s (t), it is possible to specify the optimum T / M ratio from the viewpoint of enabling information masking. That is, it is preferable to calculate an effect index between the acoustic signal s1 (t) and each of the plurality of acoustic signals s2 (t) having different masker sound types and / or T / M ratios.

１００……マスキング解析装置、２００……マスキング装置、１２……演算処理装置、１４……記憶装置、１６……表示装置、２２……自己相関算定部、２４……相互相関算定部、２６……指標算定部、２８……表示制御部、３２……区間設定部、３４……周波数分析部、３６……相関分析部。
DESCRIPTION OF SYMBOLS 100 ... Masking analysis apparatus, 200 ... Masking apparatus, 12 ... Arithmetic processing apparatus, 14 ... Memory | storage device, 16 ... Display apparatus, 22 ... Autocorrelation calculation part, 24 ... Cross correlation calculation part, 26 ... ... index calculation part, 28 ... display control part, 32 ... section setting part, 34 ... frequency analysis part, 36 ... correlation analysis part.

Claims

A device that analyzes masking of target sound by masker sound,
The autocorrelation number sequence of the line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal is determined for each of the first acoustic signal indicating the target sound and the second acoustic signal indicating the mixed sound of the target sound and the masker sound. An autocorrelation calculation means for calculating each frame on the time axis;
Cross-correlation calculation for calculating a cross-correlation coefficient between the autocorrelation sequence of the first acoustic signal and the autocorrelation sequence of the second acoustic signal for each frame corresponding to each other in the first acoustic signal and the second acoustic signal. Means,
A masking analysis apparatus comprising: index calculation means for calculating a representative value of a plurality of cross-correlation coefficients calculated for each frame as the masking effect index.

The masking analysis apparatus according to claim 1, wherein the index calculation unit calculates an average value of the plurality of cross-correlation coefficients as the effect index.

The masking analysis apparatus according to claim 1, wherein the index calculation unit calculates a predetermined percentile value of the plurality of cross-correlation coefficients as the effect index.

The autocorrelation calculating means includes an autocorrelation sequence of the first acoustic signal, and an autocorrelation sequence of each of a plurality of second acoustic signals in which at least one of the type of masker sound and the energy ratio of the target sound and the masker sound is different. For each frame,
The cross-correlation calculating means calculates, for each frame, a cross-correlation coefficient between an autocorrelation sequence of the first acoustic signal and an autocorrelation sequence of the second acoustic signal for each of the plurality of second acoustic signals.
The index calculation means calculates, for each of the plurality of second acoustic signals, the effect index according to a plurality of cross-correlation coefficients calculated for the second acoustic signal. Masking analysis device.

Correlation transition image showing the time series of the autocorrelation sequence in the area where the frequency axis and time axis are set, and a numerical value on the frequency axis that is the sum of each correlation value of the autocorrelation sequence for each frequency for multiple frames The masking analysis apparatus according to claim 1, further comprising: a display control unit configured to display a correlation distribution image indicating a distribution on a display device for each of the first acoustic signal and the second acoustic signal. .

A masker sound selection device for selecting one of a plurality of masker sounds used for masking a target sound ,
The autocorrelation sequence of the line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal is represented by a plurality of second sounds indicating a mixed sound of the first acoustic signal indicating the target sound, a different type of masker sound, and the target sound. For each of the signals, an autocorrelation calculating means for calculating for each frame on the time axis,
For each of the plurality of second acoustic signals, a cross-correlation calculating means for calculating a cross-correlation coefficient between the autocorrelation sequence of the second acoustic signal and the autocorrelation sequence of the first acoustic signal for each corresponding frame When,
For each of the plurality of second acoustic signals, index calculation means for calculating representative values of a plurality of cross-correlation coefficients calculated for each frame of the second acoustic signal as the masking effect index;
A masker sound selecting device comprising: selecting means for selecting any of the plurality of types of masker sounds according to the effect index calculated by the index calculating means.

A masking device for masking a target sound using any of a plurality of types of masker sounds,
The autocorrelation sequence of the line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal is represented by a plurality of second sounds indicating a mixed sound of the first acoustic signal indicating the target sound, a different type of masker sound, and the target sound. For each of the signals, an autocorrelation calculating means for calculating for each frame on the time axis,
For each of the plurality of second acoustic signals, a cross-correlation calculating means for calculating a cross-correlation coefficient between the autocorrelation sequence of the second acoustic signal and the autocorrelation sequence of the first acoustic signal for each corresponding frame When,
For each of the plurality of second acoustic signals, index calculation means for calculating representative values of a plurality of cross-correlation coefficients calculated for each frame of the second acoustic signal as the masking effect index;
A masking device comprising: selecting means for selecting any one of the plurality of types of masker sounds according to the effect index calculated by the index calculating means and emitting sound from the sound emitting device.

To analyze the masking of the target sound by the masker sound,
The autocorrelation number sequence of the line spectrum sequence corresponding to each peak of the spectrum of the acoustic signal is determined for each of the first acoustic signal indicating the target sound and the second acoustic signal indicating the mixed sound of the target sound and the masker sound. Autocorrelation calculation processing for each frame on the time axis,
Cross-correlation calculation for calculating a cross-correlation coefficient between the autocorrelation sequence of the first acoustic signal and the autocorrelation sequence of the second acoustic signal for each frame corresponding to each other in the first acoustic signal and the second acoustic signal. Processing,
A program for executing index calculation processing for calculating representative values of a plurality of cross-correlation coefficients calculated for each frame as the masking effect index.