JP5714076B2

JP5714076B2 - Auditory saliency evaluation apparatus, auditory saliency evaluation method, and program

Info

Publication number: JP5714076B2
Application number: JP2013221237A
Authority: JP
Inventors: 俊介木谷; 茂人古川; シンイリャオ; 惇米家; 牧夫柏野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-10-24
Filing date: 2013-10-24
Publication date: 2015-05-07
Anticipated expiration: 2033-10-24
Also published as: JP2015082079A

Description

本発明は、音の注意の引き付け具合を表す聴覚的顕著性の評価に用いることができる聴覚的顕著性評価装置、聴覚的顕著性評価方法、プログラムに関する。 The present invention relates to an auditory saliency evaluation apparatus, an auditory saliency evaluation method, and a program that can be used for evaluation of auditory saliency that represents the degree of attention to sound.

聴覚的顕著性は、音の注意の引き付け具合を表す重要な特性である。任意の音セグメントの聴覚的顕著性を評価する方法として、一対比較法が用いられてきた（非特許文献１、特に図２）。また、音の目立ちやすさによる拍の刻みやすさの特性を利用し、ターゲット音の顕著性を評価する方法として、非特許文献２が知られている。 Auditory saliency is an important characteristic that expresses the degree of attention of sound. A paired comparison method has been used as a method for evaluating the auditory saliency of an arbitrary sound segment (Non-Patent Document 1, particularly FIG. 2). Further, Non-Patent Document 2 is known as a method for evaluating the saliency of a target sound by using the characteristics of ease of beats by the conspicuousness of sound.

Kayser et al., “Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map,” Current Biology, Vol. 15, pp.1945-1947 (2005)Kayser et al., “Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map,” Current Biology, Vol. 15, pp.1945-1947 (2005) Chon & McAdams, “Investigation of timbre saliency, the attention-capturing quality of timbre,” J. Acoust. Soc. Am., Vol.131 (4), pp.3433, (2012)Chon & McAdams, “Investigation of timbre saliency, the attention-capturing quality of timbre,” J. Acoust. Soc. Am., Vol.131 (4), pp.3433, (2012)

レーティング法では、評価対象となる音（ターゲット音）に対し、その主観的な顕著性の強さに応じて、聴取者が数値の割り当てを行い、その数値の大小によって顕著性を評価する。しかし、この方法で得られた評価値は測定に用いたターゲット音のセットの構成に依存する。このため、異なる音セットで得られた数値同士を比較することは困難である。 In the rating method, the listener assigns a numerical value to the sound to be evaluated (target sound) according to the strength of the subjective saliency, and the saliency is evaluated by the magnitude of the numerical value. However, the evaluation value obtained by this method depends on the configuration of the target sound set used for the measurement. For this reason, it is difficult to compare numerical values obtained with different sound sets.

一対比較法では、対象とするターゲット音のセットについて、ターゲット音の対を複数用意する。各ターゲット音対について、どちらの音が顕著であるかを聴取者に判断させる。この結果を統計的に処理することによって、ターゲット音の顕著性の順位付けや数値化を行う。この方法では、音の対ごとに評価が必要であることから、対象となるターゲット音が増えると、必要となるターゲット音対の数が爆発的に増加し、現実的な時間や手間の範囲内で実験を行うことが困難になる。また、与えられたターゲット音のセットの中での相対的な評価に基づいた数値であるため、レーティング法と同様に異なるセットで得られた数値同士を比較することは困難である。また、非特許文献２は、音列として２種のターゲット音の繰り返しを用いて、そのターゲット音対の相対的な顕著性を評価しているため、前述の一対比較法に見られる問題を解決することはできない。 In the pair comparison method, a plurality of target sound pairs are prepared for a target target sound set. For each target sound pair, the listener is made to judge which sound is more prominent. By statistically processing the result, the saliency of the target sound is ranked and digitized. Since this method requires evaluation for each sound pair, if the number of target sound targets increases, the number of target sound pairs required increases explosively, and within the range of realistic time and effort. This makes it difficult to conduct experiments. Moreover, since the numerical value is based on relative evaluation in a given set of target sounds, it is difficult to compare numerical values obtained in different sets as in the rating method. Non-Patent Document 2 uses the repetition of two types of target sounds as a sound string and evaluates the relative saliency of the target sound pairs, thus solving the problem seen in the above-described pair comparison method. I can't do it.

以上のように、従来の手法では、顕著性の評価値が、評価に用いた音セットに依存するという問題があった。そこで本発明では、評価に用いた音セットに依存しない形で、任意のターゲット音の聴覚的顕著性を定量的に評価できる聴覚的顕著性評価装置を提供することを目的とする。 As described above, the conventional method has a problem that the evaluation value of the saliency depends on the sound set used for the evaluation. Therefore, an object of the present invention is to provide an auditory saliency evaluation apparatus capable of quantitatively evaluating the auditory saliency of an arbitrary target sound without depending on the sound set used for evaluation.

本発明の聴覚的顕著性評価装置は、音呈示部と、入力情報取得部と、評価値計算部とを含む。 The auditory saliency evaluation apparatus of the present invention includes a sound presentation unit, an input information acquisition unit, and an evaluation value calculation unit.

音呈示部は、時間間隔を空けた複数個の第１の音と複数個の第２の音から構成される音列を呈示する。入力情報取得部は、呈示される音列を聴きながら拍を打つ人の、拍情報の系列を取得する。評価値計算部は、所定時間区間内の拍情報の系列のうち２つの音のうちの一方の音に対応している拍情報の割合に基づいて聴覚的顕著性レベルを計算する。 The sound presenting unit presents a sound string composed of a plurality of first sounds and a plurality of second sounds spaced apart from each other by a time interval. The input information acquisition unit acquires a series of beat information of a person who beats while listening to the presented sound string. The evaluation value calculation unit calculates the auditory saliency level based on the ratio of beat information corresponding to one of the two sounds in the series of beat information within a predetermined time interval.

本発明の聴覚的顕著性評価装置によれば、評価に用いた音セットに依存しない形で、任意のターゲット音の聴覚的顕著性を定量的に評価できる。 According to the auditory saliency evaluation apparatus of the present invention, the auditory saliency of an arbitrary target sound can be quantitatively evaluated without depending on the sound set used for the evaluation.

聴覚的顕著性の評価実験で聴取者に呈示される音列の構成例を説明する図。The figure explaining the structural example of the sound string shown to a listener by evaluation experiment of auditory saliency. 評価実験結果であるターゲット音の種類ごとの聴覚的顕著性レベルを示す図。The figure which shows the auditory saliency level for every kind of target sound which is an evaluation experiment result. 従来方法（一対比較法）と発明方法の実験結果を比較して示す図。The figure which compares and shows the experimental result of the conventional method (pair comparison method) and an invention method. 従来方法（一対比較法）と発明方法の実験結果の相関を示す図。The figure which shows the correlation of the experimental result of a conventional method (pair comparison method) and an invention method. 実施例１の聴覚的顕著性評価装置の構成を示すブロック図。1 is a block diagram illustrating a configuration of an auditory saliency evaluation apparatus according to Embodiment 1. FIG. 実施例１の聴覚的顕著性評価装置の動作を示すフローチャート。5 is a flowchart showing the operation of the auditory saliency evaluation apparatus according to the first embodiment. 実施例２の聴覚的顕著性評価装置の構成を示すブロック図。The block diagram which shows the structure of the auditory saliency evaluation apparatus of Example 2. FIG. 実施例２の聴覚的顕著性評価装置の動作を示すフローチャート。10 is a flowchart showing an operation of the auditory saliency evaluation apparatus according to the second embodiment.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

＜原理＞
非特許文献２でも用いられている音の目立ちやすさによる拍の刻みやすさの特性とは、二つの異なる音が交互に繰り返し呈示され、聴取者がどちらかの音に合わせて拍を打つ場合には、聴取者は一般に、より目立つ（顕著性が高い）音に合わせて拍を刻みやすいという特性である。 <Principle>
Non-patent document 2 also uses the characteristics of the ease of beats by the conspicuousness of the sound, where two different sounds are presented alternately and the listener beats in accordance with either sound. In general, it is a characteristic that a listener can easily beat a beat in accordance with a more conspicuous (highly noticeable) sound.

一方で、拍の刻みやすさは、音量（音圧レベル）にも依存する。同じ音が異なる音圧レベルで交互に呈示された場合には、聴取者は一般に、大きな音に合わせて拍を刻みやすい。本発明では、この２つの特性を組み合わせることにより、ターゲット音の顕著性を定量化する。 On the other hand, the ease of engraving beats also depends on the volume (sound pressure level). If the same sound is presented alternately at different sound pressure levels, the listener is generally more likely to beat the loud sound. In the present invention, the saliency of the target sound is quantified by combining these two characteristics.

＜実験条件＞
図１を参照して評価実験に用いられる音列について説明する。図１は聴覚的顕著性の評価実験で聴取者に呈示される音列の構成例を説明する図である。まず、ターゲット音（Ｔａｒｇｅｔ）５と基準音（Ｒｅｆｅｒｅｎｃｅ）６を用意する。図１に示される複数のターゲット音５は、符号５に枝数字−１，２，３，…を付して区別した。同様に図１に示される複数の基準音６は、符号６に枝数字−１，２，３，…を付して区別した。図１の例ではターゲット音５と基準音６が交互に繰り返し配置される。それぞれの音の間には任意の時間間隔７が設けられている。図１に示される複数の時間間隔７は、符号７に枝数字−１，２，３，…を付して区別した。図１の例では、ターゲット音５−１の後に時間間隔７−１が挿入され、時間間隔７−１の後に基準音６−１が配置され、基準音６−１の後に時間間隔７−２が挿入され、時間間隔７−２の後にターゲット音５−２が配置されており、交互に配置されるターゲット音５と基準音６の間に必ず時間間隔７が挿入されて、音列８を構成している。図１の例では基準音６がターゲット音５の後に呈示されているが、順番は逆であっても良い。また、必ずしも基準音６−１、ターゲット音５−２、基準音６−２、ターゲット音５−３、…というようにターゲット音５と基準音６とが交互に呈示される必要はなく、聴取者が音の流れを把握し拍を刻むことさえできれば、例えば基準音６−１、基準音６−２、ターゲット音５−１、基準音６−３、基準音６−４、ターゲット音５−２、…というようにどちらかの音が連続して呈示されることがあっても良い。要するに、ターゲット音５と基準音６がランダムに呈示されず、時間間隔７を空けて繰り返し呈示されればよい。ターゲット音５、基準音６は、もっと一般的な名称としてもよい。例えばターゲット音５、基準音６を第１の音、第２の音と呼び換えてもよい。ターゲット音５を第１の音、基準音６を第２の音に割り当ててもよいし、ターゲット音５を第２の音、基準音６を第１の音に割り当ててもよい。音列は、図１の例に限定されず、時間間隔を空けた複数個の第１の音と複数個の第２の音から構成されていればよい。 <Experimental conditions>
A sound string used in the evaluation experiment will be described with reference to FIG. FIG. 1 is a diagram for explaining a configuration example of a sound string presented to a listener in an auditory saliency evaluation experiment. First, a target sound (Target) 5 and a reference sound (Reference) 6 are prepared. A plurality of target sounds 5 shown in FIG. 1 are distinguished from each other by adding branch numbers -1, 2, 3,. Similarly, a plurality of reference sounds 6 shown in FIG. 1 are distinguished by adding branch numbers -1, 2, 3,. In the example of FIG. 1, the target sound 5 and the reference sound 6 are alternately and repeatedly arranged. An arbitrary time interval 7 is provided between the sounds. A plurality of time intervals 7 shown in FIG. 1 are distinguished from each other by adding a branch numeral -1, 2, 3,. In the example of FIG. 1, a time interval 7-1 is inserted after the target sound 5-1, a reference sound 6-1 is arranged after the time interval 7-1, and a time interval 7-2 after the reference sound 6-1. Is inserted after the time interval 7-2, and the time interval 7 is always inserted between the target sound 5 and the reference sound 6 that are alternately arranged, It is composed. In the example of FIG. 1, the reference sound 6 is presented after the target sound 5, but the order may be reversed. Further, it is not always necessary to present the target sound 5 and the reference sound 6 alternately such as the reference sound 6-1, the target sound 5-2, the reference sound 6-2, the target sound 5-3,. As long as the person can grasp the flow of the sound and cut the beat, for example, the reference sound 6-1, the reference sound 6-2, the target sound 5-1, the reference sound 6-3, the reference sound 6-4, and the target sound 5- One of the sounds may be presented continuously, such as 2,. In short, the target sound 5 and the reference sound 6 may not be presented randomly, but may be presented repeatedly with a time interval 7. The target sound 5 and the reference sound 6 may have more general names. For example, the target sound 5 and the reference sound 6 may be referred to as the first sound and the second sound. The target sound 5 may be assigned to the first sound, the reference sound 6 may be assigned to the second sound, the target sound 5 may be assigned to the second sound, and the reference sound 6 may be assigned to the first sound. The sound string is not limited to the example of FIG. 1, and may be composed of a plurality of first sounds and a plurality of second sounds that are spaced apart from each other.

ターゲット音５は複数種類用意される。基準音６は、複数種類のターゲット音５間で共通の任意の音である。以下では基準音６としてピンクノイズを用いた実験結果について説明する。なお、基準音６はピンクノイズに限定されるものでなく、ターゲット音５間で共通のものであればどんな音であってもよい。なお、評価実験に用いたターゲット音５、基準音６の持続時間（図１のＸ［ｍｓ］）および呈示間隔（図１のＹ，Ｚ［ｍｓ］）は、それぞれ３００［ｍｓ］および１００［ｍｓ］であったが、これに限らず、ターゲット音、基準音の持続時間や呈示間隔は自由度があり、所定の範囲内で任意に設定することができる。なお、図１における時間間隔Ｙ［ｍｓ］とＺ［ｍｓ］は同じであっても異なっていても良いが、０より大きい時間とする。すなわち、ＹとＺは、基準音とターゲット音が別々の音として聴き分けられる程度の長さであり、かつ、後述の入力情報取得部１４で聴取者から入力される入力情報が、どちらの音に対してタッピングしたものであるかを識別できる程度の時間間隔であればよい。 A plurality of types of target sounds 5 are prepared. The reference sound 6 is an arbitrary sound common to a plurality of types of target sounds 5. Hereinafter, experimental results using pink noise as the reference sound 6 will be described. The reference sound 6 is not limited to pink noise, and may be any sound as long as it is common among the target sounds 5. Note that the duration (X [ms] in FIG. 1) and the presentation interval (Y, Z [ms] in FIG. 1) of the target sound 5 and the reference sound 6 used in the evaluation experiment are 300 [ms] and 100 [ However, the present invention is not limited to this, and the duration and the presentation interval of the target sound and the reference sound have a degree of freedom and can be arbitrarily set within a predetermined range. Note that the time intervals Y [ms] and Z [ms] in FIG. That is, Y and Z are of such a length that the reference sound and the target sound can be heard as separate sounds, and the input information input from the listener by the input information acquisition unit 14 to be described later is either sound. It is sufficient that the time interval is such that it can be identified whether or not it is tapped.

また、ターゲット音５と基準音６の長さ（図１のＸ［ｍｓ］）は完全に同一である必要はないが、ほぼ同じ長さとして聴こえる程度の長さであることが望ましい。これは、音の長さに違いがあると、長さの違いに意識が向けられてタッピングが行われる可能性があり、音そのものの顕著性を正確に反映した指標が得られない可能性があるためである。 Further, the lengths of the target sound 5 and the reference sound 6 (X [ms] in FIG. 1) do not need to be completely the same, but are desirably long enough to be heard as substantially the same length. This is because if there is a difference in the length of the sound, there is a possibility that tapping will be performed due to the difference in length, and there is a possibility that an index that accurately reflects the saliency of the sound itself may not be obtained. Because there is.

聴取者は、繰り返し呈示される音列８を聴き、拍を刻みやすいほうの音に合わせてボタンを押し、拍を打つ（タップする）。このとき、タップされたほうの音が目立つ（顕著性が高い）音といえる。一方、前述のように、より音圧レベルが大きな音に対しては拍を刻みやすいため、基準音６の音圧レベルを大きくしたり、小さくしたりすることで、タップされる音が切り替わることが考えられる。そこで、ターゲット音５に対してタップされた場合は、基準音６の音圧レベルを上げ、基準音６に対してタップされた場合は、基準音６の音圧レベルを下げる。この操作を繰り返すと、最終的に、ターゲット音５と基準音６とが同等の割合（各々５０％の割合）でタップされるような、基準音６の音圧レベルを見つけることができる。なお、二つの音がタップされる確率が同じとなる場合（５０％）を基準とすることは必須ではなく、いずれか一方の音がタップされる確率（例えばターゲット音がタップされる確率６０％、７０％など）を基準として設定してもよい。 The listener listens to the sound string 8 presented repeatedly, presses the button in accordance with the sound that is easier to cut, and beats (tap) the beat. At this time, it can be said that the tapped sound is noticeable (highly noticeable). On the other hand, as described above, since it is easy to beat the sound with a higher sound pressure level, the sound to be tapped is switched by increasing or decreasing the sound pressure level of the reference sound 6. Can be considered. Therefore, when the target sound 5 is tapped, the sound pressure level of the reference sound 6 is increased, and when the target sound 6 is tapped, the sound pressure level of the reference sound 6 is decreased. By repeating this operation, it is possible to finally find the sound pressure level of the reference sound 6 such that the target sound 5 and the reference sound 6 are tapped at an equal ratio (a ratio of 50% each). Note that it is not essential to use the case where two sounds have the same probability of being tapped (50%), and the probability that one of the sounds is tapped (for example, the probability that the target sound is tapped is 60%). , 70%, etc.) may be set as a reference.

顕著性が高いターゲット音５については、上記基準に到達するために必要となる基準音６の音圧レベルは相対的に高くなると考えられる。同様に顕著性が低いターゲット音５については、上記基準に到達するために必要となる基準音６の音圧レベルは相対的に低くなると考えられる。このため、ターゲット音５と基準音６とが、所定の割合（例えば各々５０％）でタップされる場合の基準音の音圧レベルを、顕著性を示す指標に用いることができる。そこで、この音圧レベルを聴覚的顕著性レベルと呼ぶこととする。この聴覚的顕著性レベルは複数種類のターゲット音５各々について独立に決定される。つまり、音圧レベルを聴覚的顕著性レベルとして用いることで、ターゲット音５のセットに依存しない指標として顕著性の表現が可能となる。音圧レベルを指標とするほかに、基準音６の音圧レベルを固定してこの実験をおこない、ターゲット音５が選択される確率の大小をもって顕著性の指標とすることもできる。 For the target sound 5 having high saliency, it is considered that the sound pressure level of the reference sound 6 required to reach the above reference is relatively high. Similarly, for the target sound 5 with low saliency, it is considered that the sound pressure level of the reference sound 6 required to reach the reference is relatively low. For this reason, the sound pressure level of the reference sound when the target sound 5 and the reference sound 6 are tapped at a predetermined ratio (for example, 50% each) can be used as an index indicating saliency. Therefore, this sound pressure level is called an auditory saliency level. This auditory saliency level is determined independently for each of the plurality of types of target sounds 5. That is, by using the sound pressure level as the auditory saliency level, saliency can be expressed as an index independent of the set of the target sound 5. In addition to using the sound pressure level as an index, this experiment can be performed with the sound pressure level of the reference sound 6 fixed, and the probability of the target sound 5 being selected can be used as the saliency index.

＜実験結果＞
以下、図２を参照して評価実験の結果を説明する。図２は、評価実験結果であるターゲット音５の種類ごとの聴覚的顕著性レベルを示す図である。図２のグラフの縦軸は基準音６の音圧レベル（聴覚的顕著性レベル）、横軸の番号は実験に用いたターゲット音５の番号を表す。図２は、１番：ＢＥＥＰ、２番：ＢＩＲＤ、３番：ＣＨＩＲＰ、４番：ＣＲＹＩＮＧ、５番：ＤＯＧ、６番：ＬＡＵＧＨＴＥＲ、７番：ＰＨＯＮＥ、８番：ＳＣＲＡＴＣＨ、９番：ＴＯＮＥ、１０番：ＷＨＩＴＥＮＯＩＳＥの１０種類のターゲット音５の実験結果を示すものである。１１番のＰＩＮＫＮＯＩＳＥ（ピンクノイズ）は基準音６である。 <Experimental result>
Hereinafter, the results of the evaluation experiment will be described with reference to FIG. FIG. 2 is a diagram showing an auditory saliency level for each type of target sound 5 as an evaluation experiment result. The vertical axis of the graph in FIG. 2 represents the sound pressure level (auditory saliency level) of the reference sound 6, and the horizontal axis represents the number of the target sound 5 used in the experiment. FIG. 2 shows 1st: BEEP, 2nd: BIRD, 3rd: CHIRP, 4th: CRYING, 5th: DOG, 6th: LAUGHTER, 7th: PHONE, 8th: SCRATCH, 9th: TONE, 10 No .: Shows experimental results of 10 kinds of target sounds 5 of WHITE NOISE. The 11th PINK NOISE (pink noise) is the reference sound 6.

＜従来方法と発明方法との比較＞
以下、図３、図４を参照して従来方法と発明方法との比較を行う。図３は従来方法（一対比較法）と発明方法の実験結果を比較して示す図である。従来方法である一対比較法では、今回ターゲット音５として用いた１０種類の音の対を用意し、各ターゲット音対に対してどちらの音が顕著であるかを判断させ、得られたデータに対してサーストンの一対比較法により顕著性のスケールを求めた。図４には、従来方法（一対比較法）と発明方法の実験結果の相関が示されている。図４に示すように、発明方法と従来方法の間の相関係数は０．７９であり、二つの間に相関があることが分かる。つまり、同じ音セットを用いた場合には、発明方法で得られる聴覚的顕著性レベルは従来方法と相関性の高い評価値が得られることが分かる。 <Comparison between conventional method and invention method>
Hereinafter, the conventional method and the inventive method will be compared with reference to FIGS. FIG. 3 is a diagram comparing experimental results of the conventional method (paired comparison method) and the inventive method. In the conventional pair comparison method, 10 types of sound pairs used as the target sound 5 this time are prepared, and which sound is significant for each target sound pair is determined, and the obtained data is On the other hand, the scale of saliency was obtained by Thurston's paired comparison method. FIG. 4 shows the correlation between the experimental results of the conventional method (paired comparison method) and the inventive method. As shown in FIG. 4, the correlation coefficient between the inventive method and the conventional method is 0.79, and it can be seen that there is a correlation between the two. That is, when the same sound set is used, it can be seen that the auditory saliency level obtained by the inventive method has an evaluation value highly correlated with the conventional method.

以下、図５、図６を参照して実施例１の聴覚的顕著性評価装置について説明する。図５は本実施例の聴覚的顕著性評価装置１の構成を示すブロック図である。図６は本実施例の聴覚的顕著性評価装置１の動作を示すフローチャートである。図５に示すように、本実施例の聴覚的顕著性評価装置１は、記憶部１１、制御部１２、音呈示部１３、入力情報取得部１４、評価値計算部１５を含んで構成される。以下、各構成の動作について説明する。 Hereinafter, the auditory saliency evaluation apparatus according to the first embodiment will be described with reference to FIGS. 5 and 6. FIG. 5 is a block diagram showing the configuration of the auditory saliency evaluation apparatus 1 of this embodiment. FIG. 6 is a flowchart showing the operation of the auditory saliency evaluation apparatus 1 of the present embodiment. As shown in FIG. 5, the auditory saliency evaluation apparatus 1 according to the present embodiment includes a storage unit 11, a control unit 12, a sound presentation unit 13, an input information acquisition unit 14, and an evaluation value calculation unit 15. . Hereinafter, the operation of each component will be described.

＜記憶部１１＞
記憶部１１には、ターゲット音５と基準音６が記憶されている。基準音６は比較の対象となる音であり、ターゲット音５は評価の対象となる音である。例えば、基準音６はピンクノイズ等であり、ターゲット音５の種類が異なっても、これに関係なく予め用意された同じ音を使用するものとする。なお、ターゲット音５、基準音６は予め記憶部１１に記憶されていなくてもよく、たとえば外部から入力されても良い。この場合、記憶部１１は省略可能である。 <Storage unit 11>
The storage unit 11 stores a target sound 5 and a reference sound 6. The reference sound 6 is a sound to be compared, and the target sound 5 is a sound to be evaluated. For example, the reference sound 6 is pink noise or the like, and the same sound prepared in advance is used regardless of the type of the target sound 5. The target sound 5 and the reference sound 6 may not be stored in the storage unit 11 in advance, and may be input from the outside, for example. In this case, the storage unit 11 can be omitted.

＜制御部１２＞
制御部１２は、所定時間に渡り、時間間隔を空けた複数個の第１の音と複数個の第２の音から構成される音列を発生する。このとき、制御部１２は、第１の音と第２の音の繰り返し回数の制御を実行する（Ｓ１２）。より詳細には、制御部１２は、所定の時間区間の間、音呈示部１３において呈示する音列８（所定の時間長の複数個のターゲット音５と所定の時間長の複数個の基準音６とこれらの間に挿入される時間間隔７との規則的な組み合わせからなる音列）を発生させる。このとき、制御部１２は、ターゲット音５と基準音６とを発生させる繰り返し回数の制御を実行する（Ｓ１２）。この例では、ターゲット音５及び基準音６は、記憶部１１に予め記憶されている。制御部１２は、記憶部１１からターゲット音５及び基準音６を読み込んで、これらの音を時間間隔７を挿入しながら組み合わせて音列８を発生する。 <Control unit 12>
The control unit 12 generates a sound string composed of a plurality of first sounds and a plurality of second sounds that are spaced over a predetermined time. At this time, the control unit 12 controls the number of repetitions of the first sound and the second sound (S12). More specifically, the control unit 12 includes a sound string 8 (a plurality of target sounds 5 having a predetermined time length and a plurality of reference sounds having a predetermined time length) presented in the sound presenting unit 13 during a predetermined time interval. 6 and a regular string of time intervals 7 inserted between them is generated. At this time, the control unit 12 controls the number of repetitions for generating the target sound 5 and the reference sound 6 (S12). In this example, the target sound 5 and the reference sound 6 are stored in the storage unit 11 in advance. The control unit 12 reads the target sound 5 and the reference sound 6 from the storage unit 11, and generates a sound string 8 by combining these sounds while inserting the time interval 7.

＜音呈示部１３＞
音呈示部１３は、制御部１２が発生した音列８を呈示する（Ｓ１３）。言い換えれば、音呈示部１３は、時間間隔を空けた複数個の第１の音と複数個の第２の音から構成される音列を呈示する（Ｓ１３）。音呈示部１３は、例えばアンプとスピーカ又はアンプとイヤホンにより構成される。このように制御部１２が発生したターゲット音５及び基準音６を含む音列８は、聴取者である人に呈示される。 <Sound presentation unit 13>
The sound presentation unit 13 presents the sound string 8 generated by the control unit 12 (S13). In other words, the sound presenting unit 13 presents a sound string composed of a plurality of first sounds and a plurality of second sounds spaced apart from each other (S13). The sound presentation unit 13 includes, for example, an amplifier and a speaker or an amplifier and an earphone. Thus, the sound string 8 including the target sound 5 and the reference sound 6 generated by the control unit 12 is presented to a person who is a listener.

＜入力情報取得部１４＞
入力情報取得部１４は、呈示される音列を聴きながら拍を打つ人（聴取者）の、拍情報の系列を取得する（Ｓ１４）。より詳細には、入力情報取得部１４は、聴取者が音呈示部１３から繰り返し呈示される音列８を聴きながら拍を打った（タッピングした）時刻の情報を、拍情報の系列として取得する（Ｓ１４）。例えば、聴取者は拍を刻みやすい音に合わせてボタンを押す（タッピング）操作を行う。この場合、入力情報取得部１４は、ボタンが押された（タッピングされた）時刻の時系列情報を拍情報（入力情報ともいう）の系列として取得する。 <Input information acquisition unit 14>
The input information acquisition unit 14 acquires a series of beat information of a person (listener) who beats while listening to the presented sound string (S14). More specifically, the input information acquisition unit 14 acquires, as a series of beat information, information on the time at which the listener beats (tapped) while listening to the sound string 8 repeatedly presented from the sound presenting unit 13. (S14). For example, the listener performs an operation of pressing (tapping) a button in accordance with a sound that easily beats. In this case, the input information acquisition unit 14 acquires time-series information at the time when the button is pressed (tapped) as a series of beat information (also referred to as input information).

＜評価値計算部１５＞
評価値計算部１５は、所定時間区間内の拍情報の系列のうち２つの音（第１、第２の音）のうちの一方の音に対応している拍情報の割合に基づいて聴覚的顕著性レベルを計算する（Ｓ１５）。具体的には、評価値計算部１５は、入力情報取得部１４で取得した拍情報の系列の各々が、ターゲット音５と基準音６のどちらに対応しているかを検出する。そして、所定時間区間内の拍情報の系列全体に占めるターゲット音５に対応している拍情報の割合を求め、これを聴覚的顕著性レベル（聴覚的顕著性の評価値）として出力する。聴覚的顕著性レベルが大きいほど、ターゲット音５の顕著性が高いことを示し、聴覚的顕著性レベルが小さいほど、ターゲット音５の顕著性が低いことを示す。言い換えれば、聴覚的顕著性レベルが大きくなるほど基準音６の顕著性が低くなることを示し、聴覚的顕著性レベルが小さくなるほど基準音６の顕著性が高くなることを示す。 <Evaluation Value Calculation Unit 15>
The evaluation value calculation unit 15 is auditory based on the ratio of the beat information corresponding to one of the two sounds (first and second sounds) in the series of beat information within the predetermined time interval. A saliency level is calculated (S15). Specifically, the evaluation value calculation unit 15 detects whether each of the series of beat information acquired by the input information acquisition unit 14 corresponds to the target sound 5 or the reference sound 6. Then, the ratio of the beat information corresponding to the target sound 5 in the entire series of beat information within the predetermined time interval is obtained, and this is output as an auditory saliency level (audience saliency evaluation value). The higher the auditory saliency level, the higher the saliency of the target sound 5, and the lower the auditory saliency level, the lower the saliency of the target sound 5. In other words, it indicates that the saliency of the reference sound 6 decreases as the auditory saliency level increases, and the saliency of the reference sound 6 increases as the auditory saliency level decreases.

なお、所定時間区間内の入力情報（拍情報）の系列に占めるターゲット音５に対応している入力情報（拍情報）の割合の代わりに、所定時間区間内の入力情報（拍情報）の系列に占める基準音６に対応している入力情報（拍情報）の割合を聴覚的顕著性レベルとして用いても良い。ただし、この場合は聴覚的顕著性レベルが大きくなるほどターゲット音５の顕著性が低くなることを示し、聴覚的顕著性レベルが小さくなるほどターゲット音５の顕著性が高くなることを示す。言い換えれば、聴覚的顕著性レベルが大きいほど基準音６の顕著性が高いことを示し、聴覚的顕著性レベルが小さいほど基準音６の顕著性が低いことを示す。いずれの場合も、聴覚的顕著性レベルが中程度（５０％前後）の場合は、基準音とターゲット音の顕著性が同程度（つまり、顕著性に差がない）ことを意味する。 A series of input information (beat information) in a predetermined time interval instead of a ratio of input information (beat information) corresponding to the target sound 5 in a series of input information (beat information) in a predetermined time interval The ratio of input information (beat information) corresponding to the reference sound 6 may be used as the auditory saliency level. However, in this case, the higher the auditory saliency level is, the lower the saliency of the target sound 5 is, and the lower the auditory saliency level is, the higher the saliency of the target sound 5 is. In other words, the higher the auditory saliency level, the higher the saliency of the reference sound 6, and the lower the auditory saliency level, the lower the saliency of the reference sound 6. In any case, when the auditory saliency level is medium (around 50%), it means that the saliency of the reference sound and the target sound is the same (that is, there is no difference in saliency).

なお、本発明におけるターゲット音５と基準音６には、任意の音を用いることができる。例えば、周波数または周期または音圧が異なる２つの音だけでなく、全く同一（同じ周波数かつ同じ音圧）の音であっても良い。ターゲット音５と基準音６が同一の音である場合、聴取者はターゲット音５と基準音６の区別がつかないため、ターゲット音５と基準音６がタッピングされる割合が同程度になると想定される。この場合、得られる聴覚的顕著性レベルは、中程度（５０％前後）になる。従って、聴覚的顕著性レベルが５０前後であることは、ターゲット音と基準音の顕著性にほとんど差がないことを意味する。 Note that any sound can be used as the target sound 5 and the reference sound 6 in the present invention. For example, not only two sounds having different frequencies, periods, or sound pressures but also sounds having exactly the same (same frequency and same sound pressure) may be used. When the target sound 5 and the reference sound 6 are the same sound, the listener cannot distinguish between the target sound 5 and the reference sound 6, and therefore the target sound 5 and the reference sound 6 are assumed to have the same ratio of tapping. Is done. In this case, the level of auditory saliency obtained is moderate (around 50%). Therefore, an auditory saliency level of around 50 means that there is almost no difference in saliency between the target sound and the reference sound.

また、ターゲット音５と基準音６の音圧は基本的には制限されるものではないが、聴取できないほど小さい音圧や、聴力損失を生じさせるほど大きな音圧は避け、無理なく聴取できる範囲に留めるのが望ましい。 In addition, the sound pressures of the target sound 5 and the reference sound 6 are not basically limited, but avoid a sound pressure that is so small that it cannot be heard or a sound pressure that is so high that it causes hearing loss, and can be heard without difficulty. It is desirable to keep it on.

以下、図７、図８を参照して、実施例１の一部を変更した実施例２の聴覚的顕著性評価装置について説明する。図７は本実施例の聴覚的顕著性評価装置２の構成を示すブロック図である。図８は本実施例の聴覚的顕著性評価装置２の動作を示すフローチャートである。図７に示すように、本実施例の聴覚的顕著性評価装置２は、記憶部１１と、制御部２２と、音呈示部１３と、入力情報取得部１４と、評価値計算部２５とを含んで構成される。実施例１と本実施例の違いは、実施例１における制御部１２が本実施例において制御部２２に変更されている点、実施例１における評価値計算部１５が本実施例において評価値計算部２５に変更されている点のみである。以下、実施例１と異なる構成についてのみ説明する。 Hereinafter, with reference to FIG. 7 and FIG. 8, the auditory saliency evaluation apparatus according to the second embodiment in which a part of the first embodiment is changed will be described. FIG. 7 is a block diagram showing the configuration of the auditory saliency evaluation apparatus 2 of the present embodiment. FIG. 8 is a flowchart showing the operation of the auditory saliency evaluation apparatus 2 of the present embodiment. As shown in FIG. 7, the auditory saliency evaluation apparatus 2 of the present embodiment includes a storage unit 11, a control unit 22, a sound presentation unit 13, an input information acquisition unit 14, and an evaluation value calculation unit 25. Consists of including. The difference between the first embodiment and the present embodiment is that the control unit 12 in the first embodiment is changed to the control unit 22 in the present embodiment, and the evaluation value calculation unit 15 in the first embodiment calculates the evaluation value in the present embodiment. It is only the point which is changed to the part 25. Only the configuration different from the first embodiment will be described below.

＜制御部２２＞
本実施例の制御部２２は、実施例１の制御部１２と同様に音列８の発生、繰り返し回数の制御を行うと同時に、音圧レベル制御を実行する。具体的には、制御部２２は、所定時間区間内の拍情報の系列のうち２つの音（第１、第２の音）のうちの一方の音に対応している拍情報の割合が所定比よりも大きい場合に当該一方の音の音圧レベルを小さくし（または一方の音の音圧レベルが相対的に小さくなるように他方の音の音圧レベルを大きくし）、所定時間区間内の拍情報の系列のうち２つの音（第１、第２の音）のうちの一方の音に対応している拍情報の割合が所定比よりも小さい場合に当該一方の音の音圧レベルを大きく（または一方の音の音圧レベルが相対的に大きくなるように他方の音の音圧レベルを小さく）する（Ｓ２２）。例えば制御部２２は、評価値計算部１４から、後述する「ターゲット音５が多くタップされたことを示す情報」が入力された場合は基準音６の音圧レベルを現在の音圧レベルよりも高く（大きく）制御し、後述する「基準音６が多くタップされたことを示す情報」が入力された場合は基準音６の音圧レベルを現在の音圧レベルよりも低く（小さく）制御する。音呈示部１３は所定の時間区間の間、制御部２２が音圧レベルを制御して発生した音列８を呈示する（Ｓ１３）。 <Control unit 22>
The control unit 22 of the present embodiment controls the generation of the sound string 8 and the number of repetitions as well as the control unit 12 of the first embodiment, and at the same time executes the sound pressure level control. Specifically, the control unit 22 has a predetermined ratio of beat information corresponding to one of the two sounds (first and second sounds) in a series of beat information within a predetermined time interval. If the ratio is larger than the ratio, the sound pressure level of the one sound is decreased (or the sound pressure level of the other sound is increased so that the sound pressure level of one sound is relatively decreased), and within a predetermined time interval When the ratio of beat information corresponding to one of the two sounds (first and second sounds) in the series of beat information is smaller than a predetermined ratio, the sound pressure level of the one sound Is increased (or the sound pressure level of the other sound is decreased so that the sound pressure level of one sound is relatively increased) (S22). For example, when “information indicating that many target sounds 5 have been tapped”, which will be described later, is input from the evaluation value calculation unit 14, the control unit 22 sets the sound pressure level of the reference sound 6 to be higher than the current sound pressure level. Control is made high (large), and when “information indicating that many reference sounds 6 have been tapped” is input, the sound pressure level of the reference sound 6 is controlled to be lower (smaller) than the current sound pressure level. . The sound presenting unit 13 presents the sound string 8 generated by the control unit 22 controlling the sound pressure level during a predetermined time interval (S13).

＜評価値計算部２５＞
評価値計算部２５は、所定時間区間内の拍情報の系列のうち一方の音に対応している拍情報の割合が所定比と等しい場合に、一方の音の音圧レベルに対応した値を聴覚的顕著性レベルとして出力する（Ｓ２５）。 <Evaluation Value Calculation Unit 25>
The evaluation value calculation unit 25 calculates a value corresponding to the sound pressure level of one sound when the proportion of the beat information corresponding to one sound in the series of beat information within the predetermined time interval is equal to the predetermined ratio. An audio saliency level is output (S25).

より詳細には、まず評価値計算部２５は、入力情報取得部１４から入力された入力情報が、ターゲット音５と基準音６のどちらに対応しているかを検出する。そして、所定時間区間内の入力情報の系列に占めるターゲット音５に対応している入力情報の割合を求める。ここまでは、実施例１と同じである。次に、評価値計算部２５は求めた割合が所定比より小さい場合に「基準音６が多くタップされたことを示す情報」を制御部２２に出力する。一方、評価値計算部２５は求めた割合が所定比より大きい場合に「ターゲット音５が多くタップされたことを示す情報」を制御部２２に出力する。また、評価値計算部２５は求めた割合が所定比と同じである場合には、そのときの基準音６の音圧レベルを制御部２２から取得して、当該音圧レベルを聴覚的顕著性レベル（聴覚的顕著性の評価値）として出力する（Ｓ２５）。 More specifically, first, the evaluation value calculation unit 25 detects whether the input information input from the input information acquisition unit 14 corresponds to the target sound 5 or the reference sound 6. Then, the ratio of the input information corresponding to the target sound 5 in the input information series within the predetermined time interval is obtained. Up to this point, the process is the same as in the first embodiment. Next, the evaluation value calculation unit 25 outputs “information indicating that many reference sounds 6 have been tapped” to the control unit 22 when the obtained ratio is smaller than the predetermined ratio. On the other hand, the evaluation value calculation unit 25 outputs “information indicating that many target sounds 5 have been tapped” to the control unit 22 when the obtained ratio is greater than the predetermined ratio. Further, when the calculated ratio is the same as the predetermined ratio, the evaluation value calculation unit 25 acquires the sound pressure level of the reference sound 6 at that time from the control unit 22 and determines the sound pressure level as auditory saliency. It outputs as a level (evaluation value of auditory saliency) (S25).

なお、所定時間区間内の入力情報の系列に占めるターゲット音５に対応している入力情報の割合の代わりに、所定時間区間内の入力情報の系列に占める基準音６に対応している入力情報の割合を用いても良い。この場合は、評価値計算部２５は求めた割合が所定比よりも大きい場合には「基準音６が多くタップされたことを示す情報」を制御部２２に出力する。一方、評価値計算部２５は求めた割合が所定比よりも小さい場合には「ターゲット音５が多くタップされたことを示す情報」を制御部２２に出力する。また、評価値計算部２５は求めた割合が所定比と同じである場合には、そのときの基準音６の音圧レベルを制御部２２から取得して、聴覚的顕著性レベル（聴覚的顕著性の評価値）として出力する（Ｓ２５）。 In addition, instead of the ratio of the input information corresponding to the target sound 5 in the input information sequence in the predetermined time interval, the input information corresponding to the reference sound 6 in the input information sequence in the predetermined time interval May be used. In this case, the evaluation value calculation unit 25 outputs “information indicating that many reference sounds 6 have been tapped” to the control unit 22 when the obtained ratio is greater than the predetermined ratio. On the other hand, when the calculated ratio is smaller than the predetermined ratio, the evaluation value calculation unit 25 outputs “information indicating that many target sounds 5 have been tapped” to the control unit 22. Further, when the calculated ratio is the same as the predetermined ratio, the evaluation value calculation unit 25 acquires the sound pressure level of the reference sound 6 at that time from the control unit 22, and the auditory saliency level (auditory saliency level). (Evaluation value of sex) is output (S25).

あるいは評価値計算部２５は、所定時間区間内の入力情報の系列に占める基準音６に対応している入力情報の割合の代わりに、所定時間区間内の入力情報の系列に占める基準音６に対応している入力情報の数とターゲット音５に対応している入力情報の数の比を用いても良い。 Alternatively, the evaluation value calculation unit 25 uses the reference sound 6 in the input information sequence in the predetermined time interval instead of the ratio of the input information corresponding to the reference sound 6 in the input information sequence in the predetermined time interval. A ratio of the number of input information corresponding to the number of input information corresponding to the target sound 5 may be used.

＜従来方法との比較＞
音に合わせてタッピングする方法で評価をする技術として、参考非特許文献１〜３等の技術が知られている。参考非特許文献１は、左右の耳に基準音とターゲット音をそれぞれ呈示し、ターゲット音に合わせてタッピングするよう指示する。このときに、ターゲット音に対するタップが基準音によって妨害される音圧レベルを求める。左右の耳に呈示する音を入れ替えて同様の実験を行い、左右の耳のそれぞれの妨害音の音圧レベルの差によって、左右耳の優位性を測定することを目的としたものである。 <Comparison with conventional methods>
Techniques such as Reference Non-Patent Documents 1 to 3 are known as techniques for evaluating by a method of tapping according to sound. Reference Non-Patent Document 1 presents a reference sound and a target sound to the left and right ears, respectively, and instructs to tap in accordance with the target sound. At this time, the sound pressure level at which the tap with respect to the target sound is disturbed by the reference sound is obtained. The purpose of this experiment is to measure the superiority of the left and right ears based on the difference in the sound pressure levels of the interfering sounds of the left and right ears by switching the sounds presented to the left and right ears.

参考非特許文献２は、本発明と同様に基準音とターゲット音を呈示するが、ターゲット音に合わせてタッピングするよう指示する。そして、基準音の呈示タイミングを徐々にターゲット音の呈示タイミングに近づけていき、ターゲット音に対するタップが基準音によって妨害される妨害音の呈示タイミングを求める。これは、聴覚フィードバックにおける遅延量（時間）ごとの同時性を測定することを目的としたものである。以上の技術は、『基準音とターゲット音という２種類の音のうち、いずれか一方の音に合わせてタッピングする』という点では共通するが、目的が全く異なる。 Reference Non-Patent Document 2 presents a reference sound and a target sound in the same manner as in the present invention, but instructs to tap in accordance with the target sound. Then, the presentation timing of the reference sound is gradually brought closer to the presentation timing of the target sound, and the presentation timing of the disturbing sound in which the tap on the target sound is obstructed by the reference sound is obtained. This is intended to measure simultaneity for each delay amount (time) in auditory feedback. The above techniques are common in that “tapping is performed in accordance with one of the two types of sounds, the reference sound and the target sound”, but the purpose is completely different.

参考非特許文献１は、左右耳の優位性を測定するためには、左右の耳にそれぞれ異なる音を呈示しなければならず、本発明のように左右耳に同じくターゲット音も基準音も聞こえるようにしてしまうと、左右耳の優位性を測定するという目的を達成することはできない。また、参考非特許文献２、３は、ターゲット音に対するタップが基準音によって妨害される妨害音の呈示タイミング、すなわち時間差を求めるものである。このような時間差と聴覚顕著性には明確な相関関係がないので、時間差から聴覚顕著性の指標を得ることは難しい。 In Non-patent Document 1, in order to measure the superiority of the left and right ears, different sounds must be presented to the left and right ears, and the target sound and the reference sound can be heard in the left and right ears as in the present invention. In this case, the purpose of measuring the superiority of the left and right ears cannot be achieved. Further, Reference Non-Patent Documents 2 and 3 determine the presentation timing of an interfering sound in which a tap with respect to a target sound is obstructed by a reference sound, that is, a time difference. Since there is no clear correlation between such a time difference and auditory saliency, it is difficult to obtain an index of auditory saliency from the time difference.

また、いずれの技術もターゲット音に合わせてタッピングするように指示することで、ターゲット音にタップを合わせようとするバイアスがかかってしまい、このバイアスが顕著性の指標をぼかしてしまう可能性がある。これに対し、本発明は聴覚顕著性を測ることを目的として、基準音とターゲット音のうちタッピングしやすい方に合わせてタップしてもらう構成により、聴取者に教示を与えなくとも（どちらかに合わせてタップさえしてもらえれば）顕著性を測定できる。また、教示を与えないことで、より客観的な指標値を得ることができる。
（参考非特許文献１）Tsunoda (1975) “Functional Differences Between Right- and Left-Cerebral Hemispheres Detected by the Key-Tapping Method”
（参考非特許文献２）Aschersleben & Prinz (1997) “Delayed Auditory Feedback in Synchronization”
（参考非特許文献３）Finney & Warren (2002) “Delayed auditory feedback and rhythmic tapping: Evidence for a critical interval shift” In addition, by instructing any of the techniques to perform tapping in accordance with the target sound, there is a possibility that the target sound is biased to be tapped, and this bias may blur the saliency index. . On the other hand, the present invention aims to measure auditory saliency, and has a configuration in which the tapping is performed according to the reference sound and the target sound that are easy to be tapped without giving any instruction to the listener (either You can measure the saliency (if you just tap it together). Further, by providing no teaching, a more objective index value can be obtained.
(Reference Non-Patent Document 1) Tsunoda (1975) “Functional Differences Between Right- and Left-Cerebral Hemispheres Detected by the Key-Tapping Method”
(Reference Non-Patent Document 2) Aschersleben & Prinz (1997) “Delayed Auditory Feedback in Synchronization”
(Reference 3) Finney & Warren (2002) “Delayed auditory feedback and rhythmic tapping: Evidence for a critical interval shift”

Claims

A sound presenting unit for presenting a sound string composed of a plurality of first sounds and a plurality of second sounds spaced apart from each other;
An input information acquisition unit for acquiring a series of beat information of a person who beats while listening to the presented sound string;
An evaluation value calculation unit that calculates an auditory saliency level based on a ratio of beat information corresponding to one of the two sounds in the series of beat information within a predetermined time interval;
When the ratio of beat information corresponding to one of the two sounds in the series of beat information within the predetermined time interval is greater than a predetermined ratio, the sound pressure level of the one sound is Make it smaller
The sound pressure level of the one sound when the ratio of the beat information corresponding to one of the two sounds in the series of beat information within the predetermined time interval is smaller than the predetermined ratio Including a control unit for increasing
The sound presenting unit controls the sound pressure controlled by the control unit until the ratio of the beat information corresponding to the one sound in the series of beat information within the predetermined time interval becomes equal to the predetermined ratio. Repeatedly presenting one sound of the above level and the other sound of a predetermined sound pressure level,
The evaluation value calculation unit, when a ratio of beat information corresponding to the one sound in the series of beat information within the predetermined time interval is equal to the predetermined ratio, a sound pressure level of the one sound The value corresponding to is output as the auditory saliency level.
Hearing Satoshiteki saliency evaluation device.

Using a plurality of types of evaluation target sounds prepared in advance and a reference sound prepared in advance that is different from any of the evaluation target sounds, a plurality of evaluation target sounds and a plurality of the above reference standards separated by time intervals sound that consists of sound columns, a sound presentation section for presenting for each of the plurality of types of evaluation of the sound,
An input information acquisition unit for acquiring a series of beat information of a person who beats while listening to the presented sound string;
Based on the ratio of the beat corresponding to that beat information on Symbol reference tone of sequence information of a predetermined time interval, greater the higher the salience of the evaluated sound, saliency of the evaluated sound An evaluation value calculation unit that calculates an auditory saliency level that decreases as the value of each of the plurality of types of sounds to be evaluated;
An auditory saliency evaluation apparatus including:

A auditory saliency evaluation apparatus according to claim 2 Symbol placement,
The auditory saliency level is a value in a monotonically increasing relationship with respect to the ratio of the upper SL evaluated the beat information corresponding to the sound of the series of the beat information within the predetermined time interval, the value the larger, the sound of the object of evaluation is an indication that high remote auditory saliency by the reference sound auditory saliency evaluation device.

A auditory saliency evaluation apparatus according to claim 2 Symbol placement,
If the proportion of the beat corresponding to that beat information on Symbol reference tone of sequence information within the predetermined time interval is greater than a predetermined ratio, to reduce the sound pressure level of the upper Symbol reference tone,
If the proportion of the correspondingly have beat information on Symbol reference tone of sequences of said beat information within the predetermined time interval is less than the predetermined ratio, the control unit to increase the sound pressure level of the upper Symbol reference tone In addition,
The sound presentation unit, the sound pressure ratio corresponding to that beat information on Symbol reference sound to equal to the predetermined ratio, which is controlled by the control unit of the series of the beat information within the predetermined time interval level repeatedly presented over Symbol reference tone and the evaluation target sound of a given sound pressure level,
The evaluation value calculating unit, when the proportion of the beat information corresponding to the upper Symbol reference tone of sequences of said beat information within the predetermined time interval is equal to the predetermined ratio, the sound pressure level of the upper Symbol reference tone auditory saliency evaluation device for outputting a level above auditory saliency values corresponding to.

  A sound presenting step for presenting a sound string composed of a plurality of first sounds and a plurality of second sounds spaced apart from each other;
  An input information acquisition step of acquiring a series of beat information of a person who beats while listening to the presented sound string;
  An evaluation value calculating step for calculating an auditory saliency level based on a ratio of beat information corresponding to one of the two sounds in the series of beat information within a predetermined time interval;
  When the ratio of beat information corresponding to one of the two sounds in the series of beat information within the predetermined time interval is greater than a predetermined ratio, the sound pressure level of the one sound is Make it smaller
  The sound pressure level of the one sound when the ratio of the beat information corresponding to one of the two sounds in the series of beat information within the predetermined time interval is smaller than the predetermined ratio Including control steps to increase
  In the sound presenting step, the sound pressure controlled in the control step until the ratio of beat information corresponding to the one sound in the series of beat information in the predetermined time interval becomes equal to the predetermined ratio. Repeatedly presenting one sound of the above level and the other sound of a predetermined sound pressure level,
  In the evaluation value calculating step, when the ratio of beat information corresponding to the one sound in the beat information series in the predetermined time interval is equal to the predetermined ratio, the sound pressure level of the one sound The value corresponding to is output as the auditory saliency level.
Auditory saliency evaluation method.

  Using a plurality of types of evaluation target sounds prepared in advance and a reference sound prepared in advance that is different from any of the evaluation target sounds, a plurality of evaluation target sounds and a plurality of the above reference standards separated by time intervals A sound presenting step for presenting a sound string composed of sounds for each of the plurality of types of sounds to be evaluated;
  An input information acquisition step of acquiring a series of beat information of a person who beats while listening to the presented sound string;
  Based on the ratio of beat information corresponding to the reference sound in the beat information series in a predetermined time interval, the higher the saliency of the evaluation target sound, the greater the saliency of the evaluation target sound. An evaluation value calculating step for calculating an auditory saliency level that decreases as the value decreases for each of the plurality of types of sounds to be evaluated;
Auditory saliency evaluation method including

An auditory saliency evaluation method according to claim 6,
The auditory saliency level is a value that is monotonically increasing with respect to the proportion of beat information corresponding to the sound to be evaluated in the series of beat information within the predetermined time interval, and the value is The larger the value, the higher the auditory saliency of the sound to be evaluated than the reference sound.
Auditory saliency evaluation method.

  An auditory saliency evaluation method according to claim 6,
  When the ratio of the beat information corresponding to the reference sound in the series of beat information within the predetermined time interval is greater than a predetermined ratio, the sound pressure level of the reference sound is reduced,
  The method further includes a control step of increasing the sound pressure level of the reference sound when the ratio of the beat information corresponding to the reference sound in the series of beat information within the predetermined time interval is smaller than the predetermined ratio. ,
  In the sound presenting step, the sound pressure level controlled in the control step until the ratio of the beat information corresponding to the reference sound in the series of beat information in the predetermined time interval becomes equal to the predetermined ratio. Repeatedly presenting the reference sound and the sound to be evaluated having a predetermined sound pressure level,
  The evaluation value calculating step corresponds to the sound pressure level of the reference sound when the ratio of the beat information corresponding to the reference sound in the series of beat information within the predetermined time interval is equal to the predetermined ratio. Is output as the auditory saliency level.
Auditory saliency evaluation method.

A program for causing a computer to function as the auditory saliency evaluation apparatus according to claim 1 or al 4.