JP5644122B2

JP5644122B2 - Maska sound generator

Info

Publication number: JP5644122B2
Application number: JP2010014876A
Authority: JP
Inventors: 舞小池; 寧清水; 雅人秦; 高史山川
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-01-26
Filing date: 2010-01-26
Publication date: 2014-12-24
Anticipated expiration: 2030-01-26
Also published as: JP2011154141A

Description

本発明は、マスカ音を生成して音の漏れ聞こえを防ぐ技術に関する。 The present invention relates to a technique for generating a masker sound and preventing sound leakage.

マスキング効果を利用して音の漏れ聞こえを防ぐ技術が各種提案されている。マスキング効果は、２種類の音信号を同じ空間内に伝搬させた場合に、空間内の者が、２種類の音信号の音響的特徴（周波数成分，時間波形等）の関係に応じて、それらの音信号に気づき難くなる現象である。この種の技術の多くは、話者が居る領域と壁や衝立を介して隣接している領域に向けて話者の話声の聞き取りを妨げるマスカ音を放音するものである。そして、この場合のマスカ音は、広い帯域にスペクトルを有するノイズ音でもよいが、聞き取りを妨げる対象となる音（以下、ターゲット音という）と類似した特徴を持った音の方が高いマスキング効果の得られることが知られている。 Various techniques have been proposed to prevent sound leakage by using the masking effect. The masking effect is that when two kinds of sound signals are propagated in the same space, the person in the space can change the sound characteristics (frequency components, time waveforms, etc.) of the two kinds of sound signals. This is a phenomenon that makes it difficult to notice the sound signal. Many of this type of technology emits masker sounds that hinder the listening of a speaker's voice toward an area where the speaker is located and an area adjacent to the area through a wall or screen. The masker sound in this case may be a noise sound having a spectrum in a wide band, but a sound having characteristics similar to a target sound to be obstructed (hereinafter referred to as a target sound) has a higher masking effect. It is known to be obtained.

特許文献１には、複数種類のマスカ音のなかから最も高いマスキング効果の得られるものを都度選択して放音するように構成されたマスキングシステムの開示がある。同文献に開示されたマスキングシステムは、壁を挟んで隣接する２つの領域である音響空間２０Ａおよび２０Ｂ間の話声の漏れ聞こえを防ぐものである。このマスキングシステムでは、年齢、言語、性別などを異にする人物の声を予め採取する。そして、採取した各声の音信号のフレーム配列を並べ替えたスクランブル音信号をそれらの声のフォルマントやパワースペクトルなどを示す音響特性情報と対応づけてメモリに記憶させておく。このマスキングシステムでは、音響空間２０Ａ内における話者の声を分析してその声の音響特性情報を求め、求めた音響特性情報に最も近い音響特性情報と対応付けてメモリに記憶されているスクランブル音信号を読み出し、このスクランブル音信号をマスカ音として音響空間２０Ｂ内に放音する。この技術によると、音響空間２０Ａ内の話者の声に最も類似した特徴を持ったマスカ音が音響空間２０Ｂに放音されるため、音響空間２０Ｂ内において高いマスキング効果を得ることができる。 Patent Document 1 discloses a masking system configured to select and emit a sound having the highest masking effect from a plurality of types of masker sounds each time. The masking system disclosed in this document prevents leakage of speech between the acoustic spaces 20A and 20B, which are two adjacent areas across a wall. In this masking system, voices of persons with different ages, languages, genders, etc. are collected in advance. Then, the scrambled sound signal obtained by rearranging the frame arrangement of the collected sound signals of the respective voices is stored in the memory in association with acoustic characteristic information indicating the formant, power spectrum, etc. of those voices. In this masking system, the voice of the speaker in the acoustic space 20A is analyzed to obtain acoustic characteristic information of the voice, and the scrambled sound stored in the memory in association with the acoustic characteristic information closest to the obtained acoustic characteristic information. The signal is read out, and this scrambled sound signal is emitted as a masker sound into the acoustic space 20B. According to this technique, since the masker sound having the characteristics most similar to the voice of the speaker in the acoustic space 20A is emitted to the acoustic space 20B, a high masking effect can be obtained in the acoustic space 20B.

特開２００８−２３３６７２号公報JP 2008-233672-A

このマスキングシステムでは、音響空間２０Ｂ内におけるマスキング効果を保つために、複数種類のなかから選んだスクランブル音信号をマスカ音として音響空間２０Ｂ内に繰り返し放音し続ける必要がある。しかしながら、このようにして同じマスカ音を長時間に渡って繰り返し放音し続けた場合、音響空間２０Ｂの者は、同じマスカ音が繰り返し放音されているのを感じ、違和感を持つという問題があった。
本発明は、このような背景の下に案出されたものであり、放音するマスカ音の周期性を目立たなくして、マスカ音を放音する領域内の者に違和感を与えることなくその領域内において高いマスキング効果を得ることを目的とする。 In this masking system, in order to maintain the masking effect in the acoustic space 20B, it is necessary to continuously emit the scrambled sound signal selected from a plurality of types as a masker sound in the acoustic space 20B. However, when the same masker sound is repeatedly emitted over a long period of time in this way, the person in the acoustic space 20B feels that the same masker sound is repeatedly emitted and has a problem of being uncomfortable. there were.
The present invention has been devised under such a background, and the periodicity of the masker sound to be emitted is inconspicuous, and the area within the area that emits the masker sound does not give an uncomfortable feeling to the person in the area. The purpose is to obtain a high masking effect.

この発明の好適な態様であるマスカ音生成装置は、マスカ音信号を記憶するバッファと、前記バッファに記憶されたマスカ音信号を繰り返し読み出して出力する放音制御部と、音信号を取得する取得手段と、前記取得手段が取得した音信号を処理対象とし、一定時間が経過する毎に、前記処理対象内の信号の配列順を変更する並べ替え処理を実行し、その結果得られるマスカ音信号を前記バッファに上書きする手段であって、前記並べ替え処理を実行する都度、前記並べ替え処理の方法を変更する生成手段とを具備することを特徴とする。
また、他の好ましい態様において、マスカ音生成装置は、所定の時間長の音信号を取得する取得手段と、前記取得手段が取得した音信号を処理対象とし、前記処理対象内の信号の配列順を乱数列に応じてランダムに変更する並べ替え処理を実行し、その結果得られるマスカ音信号を出力する処理を繰り返すとともに、前記所定の時間長より長い一定時間が経過する毎に、前記乱数列とは異なる種類に変更した乱数列に応じて、前記並べ替え処理の方法の変更を繰り返す生成手段とを具備することを特徴とする。 A masker sound generation device according to a preferred aspect of the present invention includes a buffer that stores a masker sound signal, a sound emission control unit that repeatedly reads out and outputs the masker sound signal stored in the buffer, and obtains a sound signal. And a masker sound signal obtained as a result of executing a rearrangement process for changing the arrangement order of the signals within the processing target every time a predetermined time elapses. And a generation means for changing a method of the rearrangement process each time the rearrangement process is executed.
Further, in another preferred aspect, the masker sound generation device includes an acquisition unit that acquires a sound signal having a predetermined time length, and the sound signal acquired by the acquisition unit as a processing target, and the arrangement order of the signals within the processing target The random number sequence is repeatedly changed according to the random number sequence, the process of outputting the masker sound signal obtained as a result is repeated, and each time a certain time longer than the predetermined time length elapses, the random number sequence And generating means for repeatedly changing the rearrangement processing method according to a random number sequence changed to a different type.

この発明によると、生成手段が並べ替え処理の方法を変更する度にマスカ音信号の聴感が変化する。よって、同じマスカ音信号を放音し続ける場合に比べて、その放音先の領域内の者に与える違和感を小さくすることができる。 According to the present invention, the audibility of the masker sound signal changes every time the generating means changes the rearrangement processing method. Therefore, compared with the case where the same masker sound signal is continuously emitted, it is possible to reduce the uncomfortable feeling given to the person in the area of the sound emission destination.

また、前記生成手段は、前記取得手段が取得した音信号を一定時間長の複数の区間に分割し、分割した各区間を並べ替える並べ替え処理を前記配列順を変更する処理として繰り返し、繰り返しの度に各区間の並べ替えの方法を変えてもよい。 Further, the generation unit divides the sound signal acquired by the acquisition unit into a plurality of sections having a predetermined time length, and repeats a rearrangement process of rearranging the divided sections as a process of changing the arrangement order. The method of rearranging each section may be changed each time.

このマスカ音生成装置によると、一定時間長分の音信号毎に配列順が変更される。よって、音信号の配列順を１サンプルずつ変更する場合に比べて、よりマスキング効果の高いマスカ音信号を生成することができる。 According to this masker sound generation device, the arrangement order is changed for each sound signal for a certain length of time. Therefore, it is possible to generate a masker sound signal with a higher masking effect than in the case where the arrangement order of the sound signals is changed sample by sample.

また、前記取得手段は、１種類のマスカ音信号を生成するために複数種類の音信号を取得し、前記生成手段は、前記複数種類の音信号の種類毎に、当該種類の音信号を一定時間長の複数の区間に分割し、各区間を並べ替える並べ替え処理を前記配列順を変更する処理として繰り返し、前記複数種類の音信号の種類毎に区間の並べ替え方法を変えるようにしてもよい。 In addition, the acquisition unit acquires a plurality of types of sound signals to generate one type of masker sound signal, and the generation unit constantly determines the type of sound signal for each type of the plurality of types of sound signals. Dividing into a plurality of time length sections and rearranging the rearrangement of the sections as a process of changing the arrangement order may change the section rearrangement method for each of the plurality of types of sound signals. Good.

このマスカ音生成装置によると、複数種類の音信号をミキシングしてマスカ音信号とするため、マスキングの対象となる音が複数に及ぶ場合でも、高いマスキング効果を発生させることができる。 According to this masker sound generation device, since a plurality of types of sound signals are mixed to form a masker sound signal, a high masking effect can be generated even when there are a plurality of masking target sounds.

また、前記生成手段は、音信号を分割した各区間内の音信号の配列を逆転させる区間内逆転処理を行い、この区間内逆転処理と前記区間並べ替え処理の両方を経た信号を用いて前記マスカ音信号を生成してもよい。 Further, the generation means performs an intra-interval reversal process for reversing the arrangement of the sound signals in each segment obtained by dividing the sound signal, and uses the signals that have undergone both the intra-interval reversal process and the segment rearrangement process, as described above. A masker sound signal may be generated.

また、前記生成手段は、前記複数種類の音信号のうち少なくとも一部の種類の音信号について、前記区間の並べ替え後の信号に音響効果を付与し、この音響効果を付与した信号を前記ミキシングの対象としてもよい。 In addition, the generation unit adds an acoustic effect to the rearranged signals of the sections with respect to at least some types of the plurality of types of sound signals, and the signal to which the acoustic effect is added is mixed. It is good also as an object of.

また、本発明の別の好適な態様であるマスカ音生成装置は、音信号を取得する取得手段と、前記取得手段が取得した音信号の配列順を変更し、配列順を変更した信号に音響効果を付与し、音響効果を付与した信号をマスカ音信号として出力する処理を繰り返し、音響効果の付与の方法の変更を繰り返す生成手段とを具備することを特徴とする。 According to another aspect of the present invention, there is provided a masker sound generating apparatus that includes an acquisition unit that acquires a sound signal, an arrangement order of the sound signals acquired by the acquisition unit, and an acoustic signal that changes the arrangement order. It is characterized by comprising generating means for repeating the process of giving an effect and outputting a signal to which the acoustic effect is given as a masker sound signal, and repeatedly changing the method of giving the acoustic effect.

このマスカ音生成装置によると、生成手段が音響効果の付与の方法を変更する度にマスカ音信号の聴感が変化する。よって、同じマスカ音信号を放音し続ける場合に比べて、その放音先の領域内の者に与える違和感を小さくすることができる。 According to this masker sound generating device, the audibility of the masker sound signal changes every time the generating means changes the method of applying the acoustic effect. Therefore, compared with the case where the same masker sound signal is continuously emitted, it is possible to reduce the uncomfortable feeling given to the person in the area of the sound emission destination.

また、本発明の別の好適な態様であるマスカ音生成装置は、複数種類の音信号を取得する取得手段と、前記取得手段が取得した複数種類の音信号の配列順を変更し、配列順を変更した複数種類の音信号のうち少なくとも一部の種類の音信号について、配列順の変更後の信号に音響効果を付与し、音響効果を付与した信号をミキシングしてマスカ音信号として出力する処理を繰り返し、ミキシングの方法の変更を繰り返す生成手段とを具備することを特徴とする。 Further, the masker sound generation device according to another preferred aspect of the present invention includes an acquisition unit that acquires a plurality of types of sound signals, and an arrangement order of the plurality of types of sound signals acquired by the acquisition unit, For at least some types of sound signals among the plurality of types of sound signals that have been changed, an acoustic effect is imparted to the signal after the change of the order of arrangement, and the signal to which the acoustic effect is imparted is mixed and output as a masker sound signal And generating means for repeating the process and repeatedly changing the mixing method.

このマスカ音生成装置によると、生成手段がミキシングの方法を変更する度にマスカ音信号の聴感が変化する。よって、同じマスカ音信号を放音し続ける場合に比べて、その放音先の領域内の者に与える違和感を小さくすることができる。 According to this masker sound generating device, the audibility of the masker sound signal changes every time the generating means changes the mixing method. Therefore, compared with the case where the same masker sound signal is continuously emitted, it is possible to reduce the uncomfortable feeling given to the person in the area of the sound emission destination.

また、本発明の別の好適な態様であるマスカ音生成装置は、前記取得手段が取得した複数種類の音信号の配列順を変更し、配列順を変更した複数種類の音信号のうち少なくとも一部の種類の音信号について、配列順の変更後の信号に音響効果を付与し、音響効果を付与した信号をミキシングした信号を繰り返し処理の処理対象とし、処理対象とした信号の配列順を変更した音信号をマスカ音信号として出力する処理を繰り返し、配列順の変更の方法の変更を繰り返す生成手段とを具備することを特徴とする。 According to another aspect of the present invention, there is provided a masker sound generating device that changes an arrangement order of a plurality of types of sound signals acquired by the acquisition unit, and at least one of the plurality of types of sound signals whose arrangement order is changed. For the sound signal of the part type, a sound effect is given to the signal after the change of the order of arrangement, and the signal obtained by mixing the signal to which the sound effect is added is set as the processing target of the repeated processing, and the order of the signal as the processing target is changed. And generating means for repeating the process of outputting the generated sound signal as a masker sound signal and repeatedly changing the method of changing the arrangement order.

このマスカ音生成装置によっても、生成手段が配列順の変更の方法を変更する度にマスカ音信号の聴感が変化する。よって、同じマスカ音信号を放音し続ける場合に比べて、その放音先の領域内の者に与える違和感を小さくすることができる。 Also with this masker sound generating device, the audibility of the masker sound signal changes every time the generating means changes the method of changing the arrangement order. Therefore, compared with the case where the same masker sound signal is continuously emitted, it is possible to reduce the uncomfortable feeling given to the person in the area of the sound emission destination.

また、本発明の別の好適な態様であるプログラムは、コンピュータに、音信号を取得する取得手段と、前記取得手段が取得した音信号の配列順を変更した信号をマスカ音信号として出力する処理を繰り返すとともに、配列順の変更の方法の変更を繰り返す生成手段とを実現させる。 According to another preferred aspect of the present invention, there is provided a program for acquiring, as a masker sound signal, an acquisition unit that acquires a sound signal and a signal obtained by changing an arrangement order of the sound signals acquired by the acquisition unit. And generating means for repeatedly changing the method of changing the arrangement order.

本発明の第１〜第５実施形態であるマスカ音生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the masker sound production | generation apparatus which is 1st-5th embodiment of this invention. 同マスカ音生成装置の設置の態様の一例を示す図である。It is a figure which shows an example of the aspect of installation of the same masker sound production | generation apparatus. 同マスカ音生成装置が記憶する音データベースのデータ構造図である。It is a data structure figure of the sound database which the same masker sound production | generation apparatus memorize | stores. 本発明の第１実施形態であるマスカ音生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the masker sound production | generation apparatus which is 1st Embodiment of this invention. 同マスカ音生成装置による音信号の処理の様子を示す図である。It is a figure which shows the mode of the process of the sound signal by the same masker sound production | generation apparatus. 本発明の第２実施形態であるマスカ音生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the masker sound production | generation apparatus which is 2nd Embodiment of this invention. 同マスカ音生成装置による音信号の処理の様子を示す図である。It is a figure which shows the mode of the process of the sound signal by the same masker sound production | generation apparatus. 本発明の第３実施形態であるマスカ音生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the masker sound production | generation apparatus which is 3rd Embodiment of this invention. 本発明の第４実施形態であるマスカ音生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the masker sound generation apparatus which is 4th Embodiment of this invention. 本発明の第５実施形態であるマスカ音生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the masker sound generation apparatus which is 5th Embodiment of this invention.

以下、図面を参照しつつ本発明の実施形態について説明する。
＜第１実施形態＞
図１は、本発明の第１実施形態であるマスカ音生成装置１０の構成を示すブロック図である。図２は、マスカ音生成装置１０の設置の態様の一例を示す図である。図２の例に示すように、マスカ音生成装置１０は、衝立５０によって外部と仕切られた領域Ａに設置される。この領域Ａには、領域Ａ内への話者の進入および領域Ａ外への話者の退出を検知する人感センサ３０が設けられている。マスカ音生成装置１０は、人感センサ３０が領域Ａ内に話者が進入したことを検知してから話者が領域Ａ外に退出したことを検知するまでの間、領域Ａから衝立５０を超えてその外側の領域Ｂに伝搬される話声をターゲット音Ｔとし、このターゲット音Ｔの聴取を妨げるマスカ音信号Ｍを領域Ｂのスピーカ３１から放音する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<First Embodiment>
FIG. 1 is a block diagram showing a configuration of a masker sound generation apparatus 10 according to the first embodiment of the present invention. FIG. 2 is a diagram illustrating an example of how the masker sound generation device 10 is installed. As shown in the example of FIG. 2, the masker sound generation device 10 is installed in a region A separated from the outside by a partition 50. This area A is provided with a human sensor 30 that detects the entry of a speaker into the area A and the exit of the speaker outside the area A. The masker sound generation device 10 moves the partition 50 from the region A until the human sensor 30 detects that the speaker has entered the region A until it detects that the speaker has left the region A. The speech that is transmitted to the region B outside the target sound T is set as the target sound T, and a masker sound signal M that prevents the target sound T from being heard is emitted from the speaker 31 in the region B.

図１において、マスカ音生成装置１０は、ハードディスク１１、制御部１２、バッファ１３、放音制御部１４、Ｄ／Ａ変換部１５、およびアンプ１６を有する。ハードディスク１１は、音データベース２１を記憶している。音データベース２１は、様々な声の特徴を持った人物から収録した時間長Ｔ１（例えば、Ｔ１＝３０秒とする）分の音声と対応する複数のレコードの集合体である。図３に示すように、このデータベース２１における１つの音声と対応するレコードは、その音声の時間長Ｔ１分の音信号Ｓを示す「音声」のフィールドと、その音声の属性情報を示す「属性」のフィールドとを有する。属性情報は、例えば、音声の収録元の人物の性別と声の高さ（高音、中音、低音）の組み合わせを示す情報である。属性情報には、「男性，高音」、「男性，中音」、「男性，低音」、「女性，高音」、「女性，中音」、「女性，低音」の６種類がある。 In FIG. 1, the masker sound generation device 10 includes a hard disk 11, a control unit 12, a buffer 13, a sound emission control unit 14, a D / A conversion unit 15, and an amplifier 16. The hard disk 11 stores a sound database 21. The sound database 21 is an aggregate of a plurality of records corresponding to voices of time length T1 (for example, T1 = 30 seconds) recorded from persons with various voice characteristics. As shown in FIG. 3, a record corresponding to one voice in the database 21 includes a “voice” field indicating a sound signal S corresponding to the time length T1 of the voice, and an “attribute” indicating attribute information of the voice. Field. The attribute information is, for example, information indicating a combination of the gender of the voice recording source person and the pitch of the voice (high, medium, and low sounds). There are six types of attribute information: “male, treble”, “male, middle tone”, “male, bass”, “female, treble”, “female, middle tone”, “female, bass”.

制御部１２は、ＣＰＵ２２、ＲＡＭ２３、およびＲＯＭ２４を有する。ＣＰＵ２２は、ＲＡＭ２３をワークエリアとして利用しつつ、ＲＯＭ２４に記憶されたマスカ音生成プログラム２５を実行する。マスカ音生成プログラム２５は、取得処理と生成処理の２つの処理をＣＰＵ２２に実行させるプログラムである。取得処理は、音データベース２１から複数種類の音信号Ｓを取得してＲＡＭ２３に格納する処理である。生成処理は、ＲＡＭ２３に格納された音信号Ｓの配列順を変更した信号をマスカ音信号Ｍとし、マスカ音信号Ｍのバッファ１３への出力を繰り返すとともに、配列順の変更の方法の変更を繰り返す処理である。取得処理と生成処理の詳細については、後述する。放音制御部１４は、バッファ１３に書き込まれている最新のマスカ音信号Ｍを読み出してＤ／Ａ変換部１５に出力する処理を繰り返す回路である。Ｄ／Ａ変換部１５は、放音制御部１４を介して出力されたマスカ音信号Ｍをアナログ信号に変換し、アンプ１６に出力する。アンプ１６は、Ｄ／Ａ変換部１５から出力されたアナログ信号を増幅し、スピーカ３１から音として出力する。 The control unit 12 includes a CPU 22, a RAM 23, and a ROM 24. The CPU 22 executes the masker sound generation program 25 stored in the ROM 24 while using the RAM 23 as a work area. The masker sound generation program 25 is a program that causes the CPU 22 to execute two processes of an acquisition process and a generation process. The acquisition process is a process of acquiring a plurality of types of sound signals S from the sound database 21 and storing them in the RAM 23. In the generation process, a signal obtained by changing the arrangement order of the sound signals S stored in the RAM 23 is set as a masker sound signal M, and the output of the masker sound signal M to the buffer 13 is repeated, and the change of the arrangement order changing method is repeated. It is processing. Details of the acquisition process and the generation process will be described later. The sound emission control unit 14 is a circuit that repeats the process of reading the latest masker sound signal M written in the buffer 13 and outputting it to the D / A conversion unit 15. The D / A conversion unit 15 converts the masker sound signal M output via the sound emission control unit 14 into an analog signal and outputs the analog signal to the amplifier 16. The amplifier 16 amplifies the analog signal output from the D / A converter 15 and outputs it as sound from the speaker 31.

次に、本実施形態の動作について説明する。マスカ音生成装置１０のＣＰＵ２２は、人感センサ３０から領域Ａ内に話者が進入したことを示す検知信号Ｓ_ＩＮが与えられると、取得処理と生成処理とを実行する。取得処理では、ＣＰＵ２２は、「男性，高音」の属性情報と対応付けられた音信号Ｓ、「男性，中音」の属性情報と対応付けられた音信号Ｓ、「男性，低音」の属性情報と対応付けられた音信号Ｓ、「女性，高音」の属性情報と対応付けられた音信号Ｓ、「女性，中音」の属性情報と対応付けられた音信号Ｓ、および「女性，低音」の属性情報と対応付けられた音信号Ｓを音データベース２１から１種類ずつ選び、これら６種類の音信号Ｓを同データベース２１から取得してＲＡＭ２３に格納する。以下では、説明の便宜のため、この取得処理によってＲＡＭ２３に格納された６種類の音信号Ｓの各々を音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，Ｓｆと記す。 Next, the operation of this embodiment will be described. CPU22 of the masking sound generating apparatus 10, given the detection signal S _IN indicating that a speaker has entered the region A from the motion sensor 30, executes a generation process and acquisition process. In the acquisition process, the CPU 22 has the sound signal S associated with the attribute information “male, treble”, the sound signal S associated with the attribute information “male, medium sound”, and the attribute information “male, bass”. , The sound signal S associated with the attribute information of “female, treble”, the sound signal S associated with the attribute information of “female, medium sound”, and “female, bass” The sound signals S associated with the attribute information are selected one by one from the sound database 21, and these six types of sound signals S are acquired from the database 21 and stored in the RAM 23. Hereinafter, for convenience of explanation, each of the six types of sound signals S stored in the RAM 23 by this acquisition process will be referred to as sound signals Sa, Sb, Sc, Sd, Se, Sf.

生成処理では、ＣＰＵ２２は、６種類の音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆを処理対象として図４に示すステップＳ１００〜ステップＳ１２０の処理を行い、以降は図４に示すステップＳ１３０〜ステップＳ１９０のループ処理を時間長Ｔ２（例えば、Ｔ２＝１分間とする）毎に繰り返す。以下、ステップＳ１００〜ステップＳ１９０の処理の詳細について説明する。 In the generation process, the CPU 22 performs the processing from step S100 to step S120 shown in FIG. 4 for the six types of sound signals Sa, Sb, Sc, Sd, Se, and Sf as processing targets, and thereafter, step S130 shown in FIG. The loop process of step S190 is repeated every time length T2 (for example, T2 = 1 minute). Hereinafter, details of the processing in steps S100 to S190 will be described.

まず、ＣＰＵ２２は、図５（Ａ）に示すように、６種類の音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆを、各々が時間長Ｔ３（例えば、Ｔ３＝１００ミリ秒とする）の長さを有するＮ個（Ｎ＝Ｔ１／Ｔ３）のフレームＦ_ｉ（ｉ＝１〜Ｎ）に分割する（Ｓ１００）。なお、図５（Ａ）では、図面が煩雑になるのを防ぐため、Ｎ＝１５の場合が図示されている First, as shown in FIG. 5A, the CPU 22 sets each of the six types of sound signals Sa, Sb, Sc, Sd, Se, and Sf to a time length T3 (for example, T3 = 100 milliseconds). Is divided into N (N = T1 / T3) frames F _i (i = 1 to N) (S100). In FIG. 5A, the case of N = 15 is shown in order to prevent the drawing from becoming complicated.

ＣＰＵ２２は、図５（Ｂ）に示すように、フレーム内逆転処理を行う（Ｓ１１０）。フレーム内逆転処理は、音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆの各々におけるフレームＦ_ｉ内のサンプルデータの配列を逆転させた音信号Ｓａ_Ｒ，Ｓｂ_Ｒ，Ｓｃ_Ｒ，Ｓｄ_Ｒ，Ｓｅ_Ｒ，及びＳｆ_Ｒを生成する処理である。 As shown in FIG. 5B, the CPU 22 performs in-frame reverse processing (S110). Frame reversal process, sound signals Sa, Sb, Sc, Sd, Se, and Sf respectively frame _F sound signals reversed the sequence of sample data in the _i _Sa R in _{_{_{the, Sb R, Sc R, Sd}}} R, This is a process for generating Se _R and Sf _R.

ＣＰＵ２２は、フレーム内逆転処理を終えると、図５（Ｃ）に示すように、フレーム内逆転処理の処理結果である音信号Ｓａ_Ｒ，Ｓｂ_Ｒ，Ｓｃ_Ｒ，Ｓｄ_Ｒ，Ｓｅ_Ｒ，及びＳｆ_Ｒに窓関数ωを乗算する（Ｓ１２０）。この窓関数ωは、分割したフレームＦ_ｉ間の結合を円滑にする波形整形のためのものである。 When the CPU 22 finishes the intra-frame reverse process, as shown in FIG. 5C, the sound signals Sa _R , Sb _R , Sc _R , Sd _R , Se _R , and Sf _R that are the processing results of the intra-frame reverse process. Is multiplied by the window function ω (S120). This window function omega, is intended for waveform shaping to facilitate coupling between the divided frames F _i.

次に、ＣＰＵ２２は、図５（Ｄ）に示すように、窓関数ωを乗算した音信号Ｓａ_Ｗ，Ｓｂ_Ｗ，Ｓｃ_Ｗ，Ｓｄ_Ｗ，Ｓｅ_Ｗ，及びＳｆ_Ｗを処理対象としてフレーム並べ替え処理を行う（Ｓ１３０）。フレーム並べ替え処理では、ＣＰＵ２２は、音信号Ｓａ_Ｗ，Ｓｂ_Ｗ，Ｓｃ_Ｗ，Ｓｄ_Ｗ，Ｓｅ_Ｗ，及びＳｆ_Ｗの各々のフレームＦ_ｉ（ｉ＝１〜１５）の配列をランダムに並べ替えた音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを生成する。 Next, as shown in FIG. 5D, the CPU 22 performs frame rearrangement processing on the sound signals Sa _W , Sb _W , Sc _W , Sd _W , Se _W , and Sf _W multiplied by the window function ω. (S130). In the frame rearrangement process, the CPU 22 randomly rearranges the arrangement of the frames F _i (i = 1 to 15) of the sound signals Sa _W , Sb _W , Sc _W , Sd _W , Se _W , and Sf _W. Sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S are generated.

以下、音信号Ｓａ_Ｗを処理対象とするフレーム並べ替え処理を例にとり、その具体的な手順について説明する。ＣＰＵ２２は、音信号Ｓａ_Ｗを分割したフレーム数Ｎが１５である場合、１〜１５までの数字からなる乱数列を発生する。そして、１５個の乱数を先頭から順番に読んで行き、最初の乱数が８であれば並び替え前の１番目のフレームを並び替え後の８番目のフレームとし、２番目の乱数が４であれば並び替え前の２番目のフレームを並び替え後の４番目のフレームとし…、というようにして、乱数列に応じてフレームの並び替え後の順番を決定して行く。そして、１番目〜１５番目のフレームを並び替えたものを音信号Ｓａ_Ｓとする。ここで、本実施形態では、並び替え方法を変更するために、乱数の並びの異なった複数種類の乱数列（Ｎ＝１５である場合は、いずれも１５個の乱数からなる乱数列）を用意する。そして、フレーム並び替え処理の度に、並び替えに使用する乱数列の種類を変更する。
ＣＰＵ２２は、音信号Ｓｂ_Ｗ，Ｓｃ_Ｗ，Ｓｄ_Ｗ，Ｓｅ_Ｗ，及びＳｆ_Ｗを処理対象とするフレーム並べ替え処理も同様にして行う。 Hereinafter, a specific procedure will be described with reference to an example of a frame rearrangement process using the sound signal Sa _W as a processing target. When the number of frames N obtained by dividing the sound signal Sa _W is 15, the CPU 22 generates a random number sequence consisting of numbers from 1 to 15. Then, 15 random numbers are read in order from the top, and if the first random number is 8, the first frame before the rearrangement is the eighth frame after the rearrangement, and the second random number is 4. For example, the second frame before rearrangement is set as the fourth frame after rearrangement, and so on, and the order after rearrangement of the frames is determined according to the random number sequence. A sound signal Sa _S is obtained by rearranging the first to fifteenth frames. Here, in the present embodiment, in order to change the rearrangement method, a plurality of types of random number sequences with different random number sequences (if N = 15, all are random number sequences consisting of 15 random numbers) are prepared. To do. Each time the frame rearrangement process is performed, the type of random number sequence used for rearrangement is changed.
The CPU 22 similarly performs the frame rearrangement processing for processing the sound signals Sb _W , Sc _W , Sd _W , Se _W , and Sf _W.

フレーム並べ替え処理を終えると、ＣＰＵ２２は、音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを処理対象とする音響効果付与処理を行う（Ｓ１４０）。音響効果付与処理では、ＣＰＵ２２は、フレーム並べ替え処理の処理結果として生成した音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓに所定の音響効果（例えば、リバーブとする）を付与した音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’を生成する。音響効果付与処理を終えると、ＣＰＵ２２は、ミキシング処理を行う（Ｓ１５０）。ミキシング処理では、ＣＰＵ２２は、音響効果付与処理を施した音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’を所定のミキシング比率（例えば、１：１：１：１：１：１とする）でミキシングし、このミキシングした信号をマスカ音信号Ｍとする。ミキシング処理を終えると、ＣＰＵ２２は、話速変換処理を行う（Ｓ１６０）。話速変換処理では、ＣＰＵ２２は、ミキシング処理によって生成された時間長Ｔ１分のマスカ音信号Ｍの時間軸を伸長して時間長Ｔ１’（Ｔ１’＞Ｔ１）分のマスカ音信号Ｍとする。より具体的に説明すると、この話速変換処理では、ＣＰＵ２２は、処理対象であるマスカ音信号ＭにおけるフレームＦ_ｉ（ｉ＝１〜１５）のうち音波形の立ち上がり部分と立ち下り部分とを除いた定常部分のフレームＦ_ｉを時間長Ｔ１と時間長Ｔ１’の時間差を埋め合わせるのに必要な数だけ複製し、この複製したフレームＦ_ｉ’を定常部分のフレームＦ_ｉおよびＦ_ｉ＋１間に挿入する。 When the frame rearrangement process is completed, the CPU 22 performs an acoustic effect applying process on the sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S (S140). In the sound effect applying process, the CPU 22 applies a predetermined sound effect (for example, reverb) to the sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S generated as a result of the frame rearrangement process. ) To which the sound signals Sa _S ′, Sb _S ′, Sc _S ′, Sd _S ′, Se _S ′, and Sf _S ′ are added. When the sound effect applying process is finished, the CPU 22 performs a mixing process (S150). In the mixing process, the CPU 22 converts the sound signals Sa _S ', Sb _S ', Sc _S ', Sd _S ', Se _S ', and Sf _S ' subjected to the sound effect applying process into a predetermined mixing ratio (for example, 1: 1: 1: 1: 1: 1), and the mixed signal is a masker sound signal M. When the mixing process is completed, the CPU 22 performs a speech speed conversion process (S160). In the speech speed conversion process, the CPU 22 expands the time axis of the masker sound signal M corresponding to the time length T1 generated by the mixing process to obtain the masker sound signal M corresponding to the time length T1 ′ (T1 ′> T1). More specifically, in this speech speed conversion process, the CPU 22 excludes the rising portion and falling portion of the sound waveform from the frame F _i (i = 1 to 15) in the masker sound signal M to be processed. The frame F _i of the stationary part is duplicated as many times as necessary to make up the time difference between the time length T1 and the time length T1 ′, and the duplicated frame F _i ′ is inserted between the frames F _i and F _{i + 1} of the stationary part. .

ＣＰＵ２２は、話速変換処理を施したマスカ音信号Ｍを出力してバッファ１３に上書きする（Ｓ１７０）。ＣＰＵ２２は、人感センサ３０から領域Ａ外に話者が退出したことを示す検知信号Ｓ_ＯＵＴが与えられることなく（Ｓ１８０：Ｎｏ）、ステップＳ１３０の実行時から時間長Ｔ２（Ｔ２＝１分間）が経過すると（Ｓ１９０：Ｙｅｓ）、ステップＳ１３０に戻って以降の処理を繰り返す。一方、人感センサ３０から検知信号Ｓ_ＯＵＴが与えられると（Ｓ１８０：Ｙｅｓ）、放音制御部１４にマスカ音信号Ｍの読み出しの停止を指示して処理を終了する。 The CPU 22 outputs the masker sound signal M subjected to the speech speed conversion process and overwrites the buffer 13 (S170). The CPU 22 does not receive the detection signal _SOUT indicating that the speaker has left the area A from the human sensor 30 (S180: No), and the time length T2 (T2 = 1 minute) from the execution of step S130. (S190: Yes), the process returns to step S130 and the subsequent processing is repeated. On the other hand, when the detection signal S _OUT is given from the human sensor 30 (S180: Yes), the sound emission control unit 14 is instructed to stop reading the masker sound signal M, and the process is ended.

以上説明した本実施形態によると、次の効果が得られる。
第１に、本実施形態では、６種類の音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆからマスカ音信号Ｍを生成する。よって、領域Ａ内に異なる声の特徴をもった複数人の話者がいる場合でも、領域Ｂ内において高いマスキング効果を発生させることができる。 According to the embodiment described above, the following effects can be obtained.
First, in this embodiment, a masker sound signal M is generated from six types of sound signals Sa, Sb, Sc, Sd, Se, and Sf. Therefore, even when there are a plurality of speakers having different voice characteristics in the region A, a high masking effect can be generated in the region B.

第２に、本実施形態では、音信号Ｓａ_Ｗ，Ｓｂ_Ｗ，Ｓｃ_Ｗ，Ｓｄ_Ｗ，Ｓｅ_Ｗ，及びＳｆ_Ｗを処理対象とするフレーム並べ替え処理を時間長Ｔ２毎に繰り返し、フレーム並べ替え処理によってフレームＦ_ｉ（ｉ＝１〜１５）の配列をランダムに変えた音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓをマスカ音信号Ｍとして領域Ｂ内に放音する。また、本実施形態では、フレーム並べ替え処理（ステップＳ１３０）に進む都度、フレームの並び替え方法を変更する。この結果、領域Ｂ内に放音されるマスカ音信号Ｍの聴感が時間長Ｔ２毎に変化する。よって、同じフレームＦ_ｉ（ｉ＝１〜１５）の配列のマスカ音信号Ｍを領域Ｂ内へ長時間に渡って放音し続けた場合に比べて、領域Ｂ内の者に違和感を与え難くすることができる。 Secondly, in the present embodiment, the frame rearrangement process is performed every time length T2 by repeating the frame rearrangement process for the sound signals Sa _W , Sb _W , Sc _W , Sd _W , Se _W , and Sf _W for each time length T2. The sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _{S in} which the arrangement of the frames F _i (i = 1 to 15) is randomly changed by the above are emitted as a masker sound signal M in the region B. To do. In the present embodiment, the frame rearrangement method is changed every time the process proceeds to the frame rearrangement process (step S130). As a result, the audibility of the masker sound signal M emitted in the region B changes every time length T2. Therefore, compared with the case where the masker sound signal M having the same arrangement of the frames F _i (i = 1 to 15) is continuously emitted into the region B for a long time, it is less likely to give a strange feeling to those in the region B. can do.

第３に、本実施形態では、音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’をミキシングしてマスカ音信号Ｍとした後、そのマスカ音信号Ｍの時間軸を伸長してから領域Ｂに放音する。人の音声を示す音信号に対してその配列を変更する処理（ステップＳ１１０およびステップＳ１３０）を施した場合、通常であれば処理が施された音信号は早口で話している人の音声と似通った音の特徴を持つようになる。しかし、本実施形態によると、そのような早口の話声が聞えているとの印象を和らげることができる。また、本実施形態によれば、配列を変更した場合に早口の印象となり難いような音信号を厳選して音データベース２１に記憶させておく必要もなくなる。 Third, in the present embodiment, the sound signals Sa _S ′, Sb _S ′, Sc _S ′, Sd _S ′, Se _S ′, and Sf _S ′ are mixed to obtain a masker sound signal M, and then the masker sound After extending the time axis of the signal M, sound is output to the region B. When processing (step S110 and step S130) for changing the arrangement of sound signals indicating human speech is performed, the processed sound signal is usually similar to the speech of a person who speaks quickly. It has a characteristic of sound. However, according to the present embodiment, it is possible to relieve the impression that such a fast speech is heard. Further, according to the present embodiment, it is not necessary to carefully select and store in the sound database 21 a sound signal that is unlikely to be a quick impression when the arrangement is changed.

第４に、本実施形態では、６種類の音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’をミキシングしてから音響効果を付与する。このようにして音響効果が付与されたマスカ音信号Ｍは、領域Ｂ内の伝搬によって空間的な音響効果（残響）が付与された話声（ターゲット音Ｔ）と音響的に類似したものとなる。従って、マスカ音を放音する領域内の者に違和感を与えることなくその領域内において高いマスキング効果を得ることができる。 Fourth, in the present embodiment, the sound effect is applied after mixing the six types of sound signals Sa _S ′, Sb _S ′, Sc _S ′, Sd _S ′, Se _S ′, and Sf _S ′. The masker sound signal M to which the acoustic effect is added in this manner is acoustically similar to the voice (target sound T) to which the spatial acoustic effect (reverberation) is given by propagation in the region B. . Therefore, a high masking effect can be obtained in the region without giving a sense of incongruity to the person in the region where the masker sound is emitted.

＜第２実施形態＞
次に、本発明の第２実施形態について説明する。図６に示すように、本実施形態における生成処理では、ＣＰＵ２２は、ステップＳ１３０のフレーム並べ替え処理を行った後、このフレーム並べ替え処理によってフレームＦ_ｉ（ｉ＝１〜１５）を並べ替えた音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを処理対象とし、ステップＳ１４０〜ステップＳ１９０のループ処理を時間長Ｔ２毎に繰り返す。この場合において、ＣＰＵ２２は、各ループ処理におけるステップＳ１４０の音響効果付与処理の度に、音響効果であるリバーブの深さ（直接音と残響音のレベル比）をランダムに変える。より具体的に説明すると、図７に示すように、ＣＰＵ２２は、音響効果付与処理では、音信号Ｓａ_Ｓから残響音信号ＲＳａ_Ｓを生成する処理を行う。この処理では、音信号Ｓａ_Ｓを遅延させた遅延音信号ＤＳａ_Ｓ−ｎ（ｎ＝１，２，…）を求め、遅延音信号ＤＳａ_Ｓ−ｎ（ｎ＝１，２，…）を加算したものを残響音信号ＲＳａ_Ｓとする。次に、乱数を発生し、この乱数と残響音信号ＲＳａ_Ｓの積を音信号Ｓａ_Ｓに加算したものを、音響効果を付与した音信号Ｓａ_Ｓ’とする。以下、同様に、ＣＰＵ２２は、個別に発生した乱数の各々と残響音信号ＲＳｂ_Ｓ，ＲＳｃ_Ｓ，ＲＳｄ_Ｓ，ＲＳｅ_Ｓ，及びＲＳｆ_Ｓの積を音信号Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓの各々に加算したものを、音信号Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’とする。 Second Embodiment
Next, a second embodiment of the present invention will be described. As illustrated in FIG. 6, in the generation process in the present embodiment, the CPU 22 rearranges the frames F _i (i = 1 to 15) by the frame rearrangement process after performing the frame rearrangement process in step S <b> 130. The sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S are targeted for processing, and the loop processing from step S140 to step S190 is repeated for each time length T2. In this case, the CPU 22 randomly changes the reverb depth (level ratio between the direct sound and the reverberant sound), which is an acoustic effect, every time the acoustic effect applying process in step S140 in each loop process. More specifically, as shown in FIG. 7, the CPU 22 performs a process of generating a reverberation sound signal RSa _S from the sound signal Sa _{S in the} sound effect applying process. In this processing, a delayed sound signal DSa _S −n (n = 1, 2,...) Obtained by delaying the sound signal Sa _S is obtained, and the delayed sound signal DSa _S −n (n = 1, 2,...) Is added. The reverberation signal RSa _{S is} assumed. Next, a random number is generated, and a product obtained by adding the product of the random number and the reverberation sound signal RSa _S to the sound signal Sa _S is defined as a sound signal Sa _S ′ with an acoustic effect. Hereinafter, similarly, the CPU 22 calculates the product of each individually generated random number and the reverberation sound signals RSb _S , RSc _S , RSd _S , RSe _S , and RSf _S as sound signals Sb _S , Sc _S , Sd _S , Se _S. , And Sf _S are added to sound signals Sb _S ′, Sc _S ′, Sd _S ′, Se _S ′, and Sf _S ′.

本実施形態では、音響効果付与処理（Ｓ１４０）の内容を時間長Ｔ２毎に変更するので、領域Ｂ内に放音されるマスカ音信号Ｍの聴感が時間長Ｔ２毎に変化する。よって、領域Ｂ内の者に違和感を与え難くすることができる。 In the present embodiment, since the contents of the sound effect applying process (S140) are changed for each time length T2, the audibility of the masker sound signal M emitted in the region B changes for each time length T2. Therefore, it is possible to make it difficult for a person in the region B to feel uncomfortable.

＜第３実施形態＞
次に、本発明の第３実施形態について説明する。図８に示すように、本実施形態における生成処理では、ＣＰＵ２２は、ステップＳ１４０の音響効果付与処理を行った後、この音響効果付与処理によって音響効果を付与した音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’を処理対象とし、ステップＳ１５０〜ステップＳ１９０のループ処理を時間長Ｔ２毎に繰り返す。この場合において、ＣＰＵ２２は、各ループ処理におけるステップＳ１５０のミキシング処理の度に、Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’のミキシング比率をランダムに変える。より具体的に説明すると、ＣＰＵ２２は、ミキシング処理では、６種類の乱数（０を除く）を発生し、それらの乱数の各々を音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’の各々のミキシング比率とする。 <Third Embodiment>
Next, a third embodiment of the present invention will be described. As shown in FIG. 8, in the generation process in the present embodiment, the CPU 22 performs the sound effect applying process in step S140, and then performs sound signals Sa _S 'and Sb _S ' to which the sound effect is applied by the sound effect applying process. , Sc _S ′, Sd _S ′, Se _S ′, and Sf _S ′ are processed, and the loop processing from step S150 to step S190 is repeated every time length T2. In this case, the CPU 22 randomly sets the mixing ratios of Sa _S ', Sb _S ', Sc _S ', Sd _S ', Se _S ', and Sf _S ' at each mixing process of step S150 in each loop process. Change. More specifically, in the mixing process, the CPU 22 generates six types of random numbers (excluding 0), and each of these random numbers is used as a sound signal Sa _S ', Sb _S ', Sc _S ', Sd _S '. , Se _S ′, and Sf _S ′.

本実施形態では、ミキシング処理（ステップＳ１５０）の内容を時間長Ｔ２毎に変更するので、領域Ｂ内に放音されるマスカ音信号Ｍの聴感が時間長Ｔ２毎に変化する。よって、領域Ｂ内の者に違和感を与え難くすることができる。 In the present embodiment, since the content of the mixing process (step S150) is changed for each time length T2, the audibility of the masker sound signal M emitted in the region B changes for each time length T2. Therefore, it is possible to make it difficult for a person in the region B to feel uncomfortable.

＜第４実施形態＞
次に、本発明の第４実施形態について説明する。図９に示すように、本実施形態における生成処理では、ＣＰＵ２２は、ステップＳ１５０におけるミキシング処理を行った後、ステップＳ１６０〜ステップＳ２００のループ処理を時間長Ｔ２毎に繰り返す。このループ処理におけるステップＳ１６０〜ステップＳ１９０までの内容は第１実施形態のステップＳ１６０〜ステップＳ１９０までの内容と同じである。即ち、ＣＰＵ２２は、人感センサ３０から領域Ａ外に話者が退出したことを示す検知信号Ｓ_ＯＵＴが与えられることなく（Ｓ１８０：Ｎｏ）、時間長Ｔ２が経過すると（Ｓ１９０：Ｙｅｓ）、ステップＳ２００に進む。 <Fourth embodiment>
Next, a fourth embodiment of the present invention will be described. As shown in FIG. 9, in the generation process in the present embodiment, the CPU 22 performs the mixing process in step S150, and then repeats the loop process in steps S160 to S200 every time length T2. The contents from step S160 to step S190 in this loop processing are the same as the contents from step S160 to step S190 of the first embodiment. That is, the CPU 22 does not receive the detection signal _SOUT indicating that the speaker has left the area A from the human sensor 30 (S180: No), and when the time length T2 has elapsed (S190: Yes), the step Proceed to S200.

ステップＳ２００では、ＣＰＵ２２は、ステップＳ１５０のミキシング処理の処理結果であるマスカ音信号Ｍを処理対象としてフレーム並べ替え処理を行う。このステップＳ２００のフレーム並べ替え処理では、ＣＰＵ２２は、マスカ音信号Ｍを再びフレームＦ_ｉ（ｉ＝１〜１５）に分割し、分割したフレームＦ_ｉ（ｉ＝１〜１５）をランダムに並べ替えたマスカ音信号Ｍを生成する。ステップＳ２００のフレーム並べ替え処理を実行した後、ＣＰＵ２２は、ステップＳ１６０に戻り、新たに生成したマスカ音信号Ｍに話速変換処理を施し、ステップＳ１７０に進んでそのマスカ音信号Ｍをバッファ１３に上書きする。 In step S200, the CPU 22 performs a frame rearrangement process on the masker sound signal M that is the processing result of the mixing process in step S150. In the frame rearrangement process in step S200, the CPU 22 again divides the masker sound signal M into frames F _i (i = 1 to 15), and randomly rearranges the divided frames F _i (i = 1 to 15). A masker sound signal M is generated. After executing the frame rearrangement processing in step S200, the CPU 22 returns to step S160, performs speech speed conversion processing on the newly generated masker sound signal M, proceeds to step S170, and stores the masker sound signal M in the buffer 13. Overwrite.

本実施形態では、時間長Ｔ２毎にフレームの並べ替え方法を変更するので、領域Ｂ内に放音されるマスカ音信号Ｍの聴感が時間長Ｔ２毎に変化する。よって、領域Ｂ内の者に違和感を与え難くすることができる。 In the present embodiment, since the frame rearrangement method is changed every time length T2, the audibility of the masker sound signal M emitted in the region B changes every time length T2. Therefore, it is possible to make it difficult for a person in the region B to feel uncomfortable.

＜第５実施形態＞
次に、本発明の第５実施形態について説明する。図１０に示すように、本実施形態における生成処理では、ＣＰＵ２２は、ステップＳ１６０における話速変換処理を行った後、ステップＳ１７０〜ステップＳ２００のループ処理を時間長Ｔ２毎に繰り返す。このループ処理におけるステップＳ２００のフレーム並べ替え処理では、ＣＰＵ２２は、ステップＳ１６０の話速変換処理によって時間軸を伸長したマスカ音信号Ｍを処理対象としてフレーム並べ替え処理を行う。このステップＳ２００のフレーム並べ替え処理の内容は、第４実施形態における同処理の内容と同じである。 <Fifth Embodiment>
Next, a fifth embodiment of the present invention will be described. As shown in FIG. 10, in the generation process in the present embodiment, the CPU 22 performs the speech speed conversion process in step S160, and then repeats the loop process in steps S170 to S200 for each time length T2. In the frame rearrangement process of step S200 in this loop process, the CPU 22 performs the frame rearrangement process on the masker sound signal M whose time axis is expanded by the speech speed conversion process of step S160. The contents of the frame rearrangement process in step S200 are the same as the contents of the same process in the fourth embodiment.

本実施形態においても、時間長Ｔ２毎にフレームの並び替え方法を変更するので、領域Ｂ内に放音されるマスカ音信号Ｍの聴感が時間長Ｔ２毎に変化する。よって、領域Ｂ内の者に違和感を与え難くすることができる。 Also in this embodiment, since the frame rearrangement method is changed every time length T2, the audibility of the masker sound signal M emitted in the region B changes every time length T2. Therefore, it is possible to make it difficult for a person in the region B to feel uncomfortable.

以上、この発明の第１〜第５実施形態について説明したが、この発明には他にも実施形態があり得る。例えば、以下の通りである。
（１）上記第１〜第５実施形態のマスカ音生成装置１０において、性別や声の高さなどの複数種類の属性の各々について複数の選択項目を提示し、少なくとも１種類の属性の選択項目の選択を受け付ける選択支援手段を有し、ＣＰＵ２２は、この選択支援手段によって選択された選択項目の属性を持った人物を収録元とする一又は複数種類の音信号Ｓを音データベース２１から読み出し、読み出した音信号Ｓを素材としてマスカ音信号Ｍを生成してもよい。 The first to fifth embodiments of the present invention have been described above. However, the present invention may have other embodiments. For example, it is as follows.
(1) In the masker sound generation device 10 of the first to fifth embodiments, a plurality of selection items are presented for each of a plurality of types of attributes such as gender and voice pitch, and at least one type of selection item of attributes is presented. The CPU 22 reads from the sound database 21 one or a plurality of types of sound signals S whose recording source is a person having the attribute of the selection item selected by the selection support means. The masker sound signal M may be generated using the read sound signal S as a material.

この実施形態は、例えば次のようして実現する。まず、音データベース２１には、高音の男性、中音の男性、及び低音の男性の音声をミキシングしたものと「男性」の属性情報、高音の女性、中音の女性、及び低音の女性の音声をミキシングしたものと「女性」の属性情報、高音の男女の音声をミキシングしたものと「高音」の属性情報、中音の男女の音声をミキシングしたものと「中音」の属性情報、低音の男女の音声をミキシングしたものと「低音」の属性情報を各々対応付けて記憶しておく。 This embodiment is realized as follows, for example. First, in the sound database 21, high-frequency male, medium-frequency male, and low-frequency male voices are mixed with “male” attribute information, high-frequency female, medium-frequency female, and low-frequency female voices. MIX information and "female" attribute information, high-frequency male and female voice mixing information and "high-pitched" attribute information, middle-tone male and female voice mixing information and "middle sound" attribute information, bass A mix of male and female voices and attribute information of “bass” are stored in association with each other.

そして、ＣＰＵ２２は、性別の選択項目（男性，女性）の１つが操作支援手段によって選択された場合は、「男性」と「女性」のうち選択された属性情報と対をなす音信号Ｓを音データベース２１から読み出し、この音信号Ｓを素材としてマスカ音信号Ｍを生成する。また、ＣＰＵ２２は、声の高さの選択項目（高音、中音、低音）の１つが操作支援手段によって選択された場合は、「高音」、「中音」、「低音」のうち選択された属性情報と対をなす音信号Ｓを音データベース２１から読み出し、この音信号Ｓを素材としてマスカ音信号Ｍを生成する。 Then, when one of the sex selection items (male, female) is selected by the operation support means, the CPU 22 generates a sound signal S that makes a pair with the attribute information selected from “male” and “female”. A masker sound signal M is generated by reading from the database 21 and using the sound signal S as a material. In addition, when one of the voice pitch selection items (high tone, middle tone, low tone) is selected by the operation support means, the CPU 22 selects one of “high tone”, “medium tone”, and “low tone”. The sound signal S paired with the attribute information is read from the sound database 21, and a masker sound signal M is generated using the sound signal S as a material.

この実施形態によると、利用者が、自身についての複数種類の属性のうちの一部の種類の選択項目だけを指定した場合でも、その利用者の音声に対して高いマスキング効果を発生するマスカ音信号Ｍを生成することができる。また、他の種類の属性情報（例えば、言語や年齢）と対応付けた複数種類の音信号Ｓを音データベース２１に記憶し、操作手段の指定に応じて選択したものをマスカ音信号Ｍの素材としてもよい。 According to this embodiment, even when a user designates only some types of selection items among a plurality of types of attributes of the user, a masker sound that generates a high masking effect on the user's voice A signal M can be generated. In addition, a plurality of types of sound signals S associated with other types of attribute information (for example, language and age) are stored in the sound database 21, and the material selected for the masker sound signal M is selected according to the designation of the operating means. It is good.

（２）上記第１〜第５実施形態の音響効果付与処理において、ディレイ、ハーモニー、ディストーションなどといったリバーブ以外の種類の音響効果を音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓに付与してもよい。 (2) In the sound effect imparting process of the first to fifth embodiments, the sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and other kinds of sound effects other than reverb such as delay, harmony, and distortion are used. And Sf _S.

（３）上記第１〜第５実施形態において、ステップＳ１１０とステップＳ１２０の処理の順序を逆にし、音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆの各々におけるフレームＦ_ｉに窓関数ωを乗算してから各々のフレームＦ_ｉ内のサンプルデータの配列を逆転させてもよい。 (3) In the first to fifth embodiments, the order of processing in step S110 and step S120 Conversely, the sound signal Sa, Sb, Sc, Sd, Se, and window function to the frame _{F i} at each of the Sf omega And the arrangement of the sample data in each frame F _i may be reversed.

（４）上記第２実施形態において、音響効果付与処理の繰り返しの度に、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓのうち音響効果を付与するものと音響効果を付与しないものの組合せを変えてもよい。また、音響効果付与処理の繰り返しの度に、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓの各々に付与する音響効果の種類を変えてもよい。また、音響効果付与処理の繰り返しの度に、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓの各々のフレームＦ_ｉ（ｉ＝１〜１５）のうち音響効果を付与するフレームＦ_ｉと音響効果を付与しないフレームＦ_ｉの組合せを変えてもよい。 (4) In the second embodiment, the sound effect is applied among the six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S each time the sound effect applying process is repeated. You may change the combination of a thing and the thing which does not provide an acoustic effect. Further, each time the sound effect applying process is repeated, the types of sound effects applied to each of the six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S may be changed. Also, each time repeating the sound effect imparting processing, six kinds of sound signals _{_{_{_{Sa S, Sb S, Sc S}}}} , Sd S, Se S, and each frame _F i (i = _1~15) of Sf _S of combinations may be changed in the frame F _i that does not impart frame F _i and sound effect imparting sound effects.

（５）上記第１実施形態では、音信号Ｓａ_Ｗ，Ｓｂ_Ｗ，Ｓｃ_Ｗ，Ｓｄ_Ｗ，Ｓｅ_Ｗ，及びＳｆ_Ｗの各々を処理対象とするフレーム並べ替え処理を時間長Ｔ２毎に繰り返した。しかし、フレーム並べ替え処理を複数種類の音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆに固有の相異なる時間長Ｔ２_ａ，Ｔ２_ｂ，Ｔ２_ｃ，Ｔ２_ｄ，Ｔ２_ｅ，Ｔ２_ｆ毎に繰り返してもよい。この場合において、時間長Ｔ２_ａ，Ｔ２_ｂ，Ｔ２_ｃ，Ｔ２_ｄ，Ｔ２_ｅ，Ｔ２_ｆを互いに素な関係にある長さ（１：３：５などの互いに素数比となる長さ）にするとよい。このようにすれば、領域Ｂ内に放音するマスカ音Ｍの聴感が変わる周期が実質的に長くなり、領域Ｂ内の者に違和感をより一層与え難くすることができる。同様に、第２実施形態におけるステップＳ１４０の音響効果付与処理の繰り返し、第３実施形態におけるステップＳ１５０のミキシング処理の繰り返し、第４および第５実施形態におけるステップＳ２００のフレーム並べ替え処理の繰り返しを時間長Ｔ２_ａ，Ｔ２_ｂ，Ｔ２_ｃ，Ｔ２_ｄ，Ｔ２_ｅ，Ｔ２_ｆ毎に行ってもよい。 (5) In the first embodiment, the frame rearrangement process for each of the sound signals Sa _W , Sb _W , Sc _W , Sd _W , Se _W , and Sf _W is repeated for each time length T2. However, the frame rearrangement process is performed for each of different time lengths T2 _a , T2 _b , T2 _c , T2 _d , T2 _e , and T2 _f that are unique to the plurality of types of sound signals Sa, Sb, Sc, Sd, Se, and Sf. It may be repeated. In this case, if the time lengths T2 _a , T2 _b , T2 _c , T2 _d , T2 _e , and T2 _f are lengths that are relatively prime to each other (length that is a prime number ratio such as 1: 3: 5). Good. In this way, the period of change in the audibility of the masker sound M emitted in the region B becomes substantially longer, and it is possible to make it more difficult to give an uncomfortable feeling to those in the region B. Similarly, it is time to repeat the acoustic effect imparting process in step S140 in the second embodiment, the mixing process in step S150 in the third embodiment, and the frame rearrangement process in step S200 in the fourth and fifth embodiments. long _{_{_{_{T2 a, T2 b, T2 c}}}} , T2 d, T2 e, may be performed for each T2 _f.

（６）上記第１〜第５実施形態では、ループ処理を繰り返す時間長Ｔ２をマスカ音信号Ｍの材料となる音声の時間長Ｔ１よりも長くした（Ｔ２＝１分間、Ｔ１＝３０秒）。しかし、時間長Ｔ２を時間長Ｔ１と同じ長さにしてもよい。また、時間長Ｔ２を時間長Ｔ１’（話速変換処理を経たマスカ音信号Ｍの長さ）と同じ長さにしてもよい。また、ループ処理の繰り返す時間長Ｔ２を乱数を用いてランダムに決定してもよい。 (6) In the first to fifth embodiments, the time length T2 for repeating the loop processing is set longer than the time length T1 of the voice that is the material of the masker sound signal M (T2 = 1 minute, T1 = 30 seconds). However, the time length T2 may be the same as the time length T1. Alternatively, the time length T2 may be the same as the time length T1 '(the length of the masker sound signal M that has undergone the speech speed conversion process). Further, the time length T2 at which the loop process is repeated may be determined randomly using a random number.

（７）上記第１〜第５実施形態では、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓの全てを処理対象として音響効果付与処理（Ｓ１４０）を行った。しかし、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓのうち一部の種類のものを処理対象として音響効果付与処理を行ってもよい。 (7) In the first to fifth embodiments, the sound effect applying process (S140) is performed on all six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S as processing targets. went. However, the acoustic effect imparting process may be performed on a part of the six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S as a processing target.

（８）上記第１〜第５実施形態では、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓの全てを処理対象として、フレーム内逆転処理（Ｓ１１０）、窓関数を乗算する処理（Ｓ１２０）、フレーム並べ替え処理（Ｓ１３０）、および音響効果付与処理（Ｓ１４０）の各処理を行い、処理結果である音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，Ｓｄ_Ｓ’，Ｓｅ_Ｓ’，及びＳｆ_Ｓ’をミキシングしたものをマスカ音信号Ｍとした。しかし、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓのうち一部の種類のもの（例えば、音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓとする）についてはステップＳ１１０〜ステップＳ１４０の各処理を行う一方で、残りの音信号Ｓｅ_Ｓ，及びＳｆ_Ｓについては、ステップＳ１１０〜ステップＳ１４０の何れの処理も行わず、ステップＳ１１０〜ステップＳ１４０の各処理を行った処理結果である音信号Ｓａ_Ｓ’，Ｓｂ_Ｓ’，Ｓｃ_Ｓ’，およびＳｄ_Ｓ’と音信号Ｓｅ_ＳおよびＳｆ_Ｓをミキシングしたものをマスカ音信号Ｍとしてもよい。この場合において、音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓのうちの一部または全部の種類の音信号について、フレーム内逆転処理（Ｓ１１０）、窓関数を乗算する処理（Ｓ１２０）、またはフレーム並べ替え処理（Ｓ１３０）までを行った処理結果をミキシングの対象としてもよい。 (8) In the first to fifth embodiments, intra-frame inversion processing (S110) with all six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S as processing targets. , A process of multiplying the window function (S120), a frame rearrangement process (S130), and a sound effect applying process (S140), and the sound signals Sa _S ', Sb _S ', Sc _S 'as the processing results are performed. , Sd _S ′, Se _S ′, and Sf _S ′ are mixed to obtain a masker sound signal M. However, some of the six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S (for example, sound signals Sa _S , Sb _S , Sc _S , Sd _S and On the other hand, the processes of steps S110 to S140 are performed, while the remaining sound signals Se _S and Sf _S are not subjected to any of the processes of steps S110 to S140, and the processes of steps S110 to S140 are performed. A masker sound signal M may be obtained by mixing the sound signals Sa _S ′, Sb _S ′, Sc _S ′, and Sd _S ′ and the sound signals Se _S and Sf _S , which are the processing results of the processing. In this case, for some or all of the sound signals Sa _S , Sb _S , Sc _S , Sd _S , in-frame inversion processing (S 110), processing for multiplying a window function (S 120), or The processing result obtained up to the frame rearrangement processing (S130) may be the target of mixing.

（９）上記第１〜第５実施形態では、フレーム内逆転処理（Ｓ１１０）の後にフレーム並べ替え処理（Ｓ１３０）を行った。しかし、フレーム並べ替え処理の後にフレーム内逆転処理を行ってもよい。 (9) In the first to fifth embodiments, the frame rearrangement process (S130) is performed after the intra-frame reverse process (S110). However, in-frame inversion processing may be performed after the frame rearrangement processing.

（１０）上記第１〜第５実施形態において、６種類の音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓをまずミキシングし、ミキシングした音信号を処理対象としてステップＳ１１０〜ステップＳ１４０の各処理行い、ステップＳ１１０〜ステップＳ１４０の各処理の処理結果をマスカ音信号Ｍとしてもよい。 (10) In the first to fifth embodiments, the six types of sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S are first mixed, and the mixed sound signal is processed as a processing target. Each process of S110 to S140 may be performed, and the processing result of each process of S110 to S140 may be a masker sound signal M.

（１１）第１〜第５実施形態では、領域Ａ内への話者の進入を人感センサ３０が検知する度に音データベース２１から音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを読み出し、音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを処理対象としてステップＳ１００〜ステップＳ１９０の各処理を行って得たマスカ音信号Ｍを領域Ｂに放射した。しかし、ステップＳ１００〜ステップＳ１９０の各処理を行って得たマスカ音信号Ｍをメモリに記憶させ、以降は、話者の進入を人感センサ３０が検知する度にメモリ内のマスカ音信号Ｍを読み出して領域Ｂに繰り返し放射するようにしてもよい。この場合において、時間長Ｔ１（Ｔ１＝３０秒）の長さの音信号Ｓａ_Ｓ，Ｓｂ_Ｓ，Ｓｃ_Ｓ，Ｓｄ_Ｓ，Ｓｅ_Ｓ，及びＳｆ_Ｓを素材として、図４，図６，図８，図９，または図１０の一連の処理を複数回繰り返すことにより、時間長Ｔ１よりも十分に長い時間長Ｔ４（例えば、Ｔ４＝１０分）分のマスカ音信号Ｍを生成し、この時間長Ｔ４分のマスカ音信号Ｍをメモリに記憶させて利用してもよい。 (11) In the first to fifth embodiments, the sound signal Sa _S , Sb _S , Sc _S , Sd _S , Se from the sound database 21 every time the human sensor 30 detects the speaker entering the area A. _{S 1} and Sf _S are read out, and the masker sound signal M obtained by performing each of the steps S100 to S190 with the sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S being processed is obtained. Radiated to region B. However, the masker sound signal M obtained by performing each process of step S100 to step S190 is stored in the memory, and thereafter, every time the human sensor 30 detects the speaker's entry, the masker sound signal M in the memory is stored. You may make it read and repeatedly radiate | emit to the area | region B. In this case, the sound signals Sa _S , Sb _S , Sc _S , Sd _S , Se _S , and Sf _S having a length of time length T1 (T1 = 30 seconds) are used as materials, as shown in FIGS. By repeating the series of processes of FIG. 9 or FIG. 10 a plurality of times, a masker sound signal M for a time length T4 (for example, T4 = 10 minutes) sufficiently longer than the time length T1 is generated, and this time length T4 The minute masker sound signal M may be stored in a memory and used.

（１２）上記第１〜第５実施形態は、衝立５０により仕切られた領域Ａから外部の領域Ｂへの音声の漏れ聞こえの防止に本発明を適用したものであった。しかし、衝立５０などが間に介在しない２つの領域Ａ’およびＢ’のうち一方の領域Ａ’（またはＢ’）で発生した音を他方の領域Ｂ’（またはＡ’）で聞こえ難くする用途に本発明を適用してもよい。また、４方の壁と天井とにより外部と区切られた部屋にマスカ音生成装置１０を設置し、このマスカ音生成装置１０によって生成したマスカ音信号Ｍを壁の外側の領域に向けて放音するようにしてもよい。また、異なる空間に居る者同士の通話を実現させる通話装置（例えば、携帯電話、ＩＰ電話、インターフォン等）における各話者の話声を周りに聞こえ難くする用途に本発明を適用してもよい。この実施形態は、例えば、通話装置に第１〜第５実施形態のマスカ音生成装置１０を内蔵し、マスカ音生成装置１０が生成したマスカ音信号Ｍを話者の周りに放音することによって実現可能である。この場合において、発話者にイヤホンを装着させたり通話装置のスピーカの指向性を制御することにより、マスカ音信号Ｍが通話の相手方まで伝送されて会話が混乱する事態を防ぐようにするとなおよい。 (12) In the first to fifth embodiments, the present invention is applied to the prevention of sound leakage from the area A partitioned by the partitions 50 to the external area B. However, it is difficult to hear the sound generated in one area A ′ (or B ′) of the two areas A ′ and B ′ where the partition 50 or the like is not interposed, in the other area B ′ (or A ′). The present invention may be applied to. Also, a masker sound generator 10 is installed in a room separated from the outside by four walls and a ceiling, and the masker sound signal M generated by the masker sound generator 10 is emitted toward an area outside the wall. You may make it do. In addition, the present invention may be applied to a purpose of making it difficult to hear each speaker's voice in a communication device (for example, a mobile phone, an IP phone, an interphone, etc.) that realizes a call between people in different spaces. . In this embodiment, for example, the masker sound generation device 10 according to the first to fifth embodiments is built in the communication device, and the masker sound signal M generated by the masker sound generation device 10 is emitted around the speaker. It is feasible. In this case, it is more preferable to prevent the situation where the masker sound signal M is transmitted to the other party of the call and the conversation is confused by attaching the earphone to the speaker or controlling the directivity of the speaker of the communication device.

（１３）上記第１〜第５実施形態において、領域Ａにマイクロホンを設置してもよい。この場合において、ＣＰＵ２２は、取得処理では、この領域Ａのマイクロホンが収音した音信号を取得し、生成処理では、その取得した音信号からマスカ音信号Ｍを生成するとよい。 (13) In the first to fifth embodiments, a microphone may be installed in the area A. In this case, the CPU 22 may acquire a sound signal picked up by the microphone in the region A in the acquisition process, and generate a masker sound signal M from the acquired sound signal in the generation process.

（１４）上記第１〜第５実施形態において、人感センサ３０は、音響センサ（例えば、音波を検出するマイクロホン、振動を検出する振動ピックアップなど）であってもよいし、生体センサ（例えば、生体の熱を検出する感熱センサ、生体の赤外線を検出する赤外線センサなど）であってもよい。また、（１５）に示したマイクロホンの機能と人感センサ３０の機能とを兼ね備えた収音・検知装置を領域Ａに設置し、収音・検知装置によって領域Ａ内に話者が進入したことが検知された場合に、同装置が以降に収音した音信号を素材としてマスカ音信号Ｍを生成するようにしてもよい。 (14) In the first to fifth embodiments, the human sensor 30 may be an acoustic sensor (for example, a microphone that detects sound waves, a vibration pickup that detects vibration), or a biosensor (for example, A thermal sensor for detecting the heat of the living body, an infrared sensor for detecting the infrared ray of the living body, and the like. In addition, a sound collection / detection device having both the function of the microphone and the function of the human sensor 30 shown in (15) is installed in the region A, and the speaker enters the region A by the sound collection / detection device. When the signal is detected, the masker sound signal M may be generated using a sound signal collected by the apparatus thereafter as a material.

（１６）上記第１〜第５実施形態において、ハードディスク１１をマスカ音生成装置１０の外部要素としてもよい。この実施形態では、外部の記憶装置内の音データベース２１からネットワークを経由して音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆを取得し、この音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆを素材としてマスカ音信号Ｍを生成するとよい。また、バッファ１３、放音制御部１４、Ｄ／Ａ変換部１５、およびアンプ１６のうち全部または一部をマスカ音生成装置１０の外部要素としてもよい。この実施形態では、例えば、音信号Ｓａ，Ｓｂ，Ｓｃ，Ｓｄ，Ｓｅ，及びＳｆを素材として生成したマスカ音信号Ｍを、バッファ１３の役割を果たす外部記憶装置に各種インターフェースを介して出力するとよい。 (16) In the first to fifth embodiments, the hard disk 11 may be an external element of the masker sound generation device 10. In this embodiment, sound signals Sa, Sb, Sc, Sd, Se, and Sf are acquired from a sound database 21 in an external storage device via a network, and the sound signals Sa, Sb, Sc, Sd, Se are acquired. , And Sf may be used as a material to generate a masker sound signal M. All or some of the buffer 13, the sound emission control unit 14, the D / A conversion unit 15, and the amplifier 16 may be external elements of the masker sound generation device 10. In this embodiment, for example, the masker sound signal M generated using the sound signals Sa, Sb, Sc, Sd, Se, and Sf as a material may be output to an external storage device serving as the buffer 13 via various interfaces. .

（１８）上記第１〜第５実施形態では、マスカ音生成装置１０のＣＰＵ２２は、領域Ａ内に話者が進入したことを示す検知信号Ｓ_ＩＮが与えられると、取得処理と生成処理とを実行した。しかし、検知信号Ｓ_ＩＮが与えられた場合に、取得処理と生成処理を実行せず、ハードディスク１１やその他のメモリに予め記憶したマスカ音信号Ｍをスピーカ３１から出力してもよい。 (18) In the above first to fifth embodiment, CPU 22 of the masking sound generating apparatus 10, given the detection signal S _IN indicating that a speaker has entered the region A, and a generation process and acquisition process Executed. However, when the detection signal _SIN is given, the masking sound signal M stored in advance in the hard disk 11 or other memory may be output from the speaker 31 without executing the acquisition process and the generation process.

（１９）上記第１〜第５実施形態のフレーム並べ替え処理では、互いに異なる１〜Ｎまでの数字からなる乱数列をフレームの並び替えに使用した。しかし、乱数列の中に同じ乱数が複数回現れるような乱数列をフレーム並び替えに使用してもよい。また、最初の乱数が８であれば並び替え前の８番目のフレームを並び替え後の１番目のフレームとし、２番目の乱数が４であれば並び替え前の４番目のフレームを並び替え後の２番目のフレームとし…、というようにして、乱数列に応じて並び替え前のもの中から選び出すフレームを決定するようにしてもよい。 (19) In the frame rearrangement process of the first to fifth embodiments, a random number sequence composed of numbers 1 to N different from each other is used for frame rearrangement. However, a random number sequence in which the same random number appears multiple times in the random number sequence may be used for frame rearrangement. If the first random number is 8, the 8th frame before the rearrangement is the first frame after the rearrangement, and if the second random number is 4, the 4th frame before the rearrangement is after the rearrangement. In this way, the frame to be selected from those before rearrangement may be determined according to the random number sequence.

（２０）上記第２実施形態では、ＣＰＵ２２は、音響効果付与処理の度にリバーブの深さ（直接音と残響音の比率）を変更した。しかし、音響効果付与処理の度に残響音の長さ（ディケイ時間）を変更してもよい。この実施形態では、ＣＰＵ２２は、音響効果付与処理の度に、音信号Ｓａ_Ｓを遅延させた遅延音信号ＤＳａ_Ｓ−ｎ（ｎ＝１，２，…）の強さを変更することにより、残響音の長さ（ディケイ時間）を変更してもよいし、音響効果付与処理の度に、音信号Ｓａ_Ｓを遅延させた遅延音信号ＤＳａ_Ｓ−ｎ（ｎ＝１，２，…）の遅延時間を変更することにより、残響音の長さ（ディケイ時間）を変更してもよい。 (20) In the second embodiment, the CPU 22 changes the reverb depth (ratio between direct sound and reverberant sound) every time the acoustic effect is applied. However, the length (decay time) of the reverberant sound may be changed every time the acoustic effect applying process is performed. In this embodiment, the CPU 22 changes the strength of the delayed sound signal DSa _S −n (n = 1, 2,...) Obtained by delaying the sound signal Sa _S every time the acoustic effect applying process is performed. The length of the sound (decay time) may be changed, or the delay of the delayed sound signal DSa _S −n (n = 1, 2,...) Obtained by delaying the sound signal Sa _S each time the sound effect is applied. The length (decay time) of the reverberant sound may be changed by changing the time.

１０…マスカ音生成装置、１１…ハードディスク、１２…制御部、１３…バッファ、１４…放音制御部、１５…Ｄ／Ａ変換部、１６…アンプ、２１…音データベース、２２…ＣＰＵ、２３…ＲＡＭ、２４…ＲＯＭ、３０…人感センサ、３１…スピーカ。 DESCRIPTION OF SYMBOLS 10 ... Masker sound production | generation apparatus, 11 ... Hard disk, 12 ... Control part, 13 ... Buffer, 14 ... Sound emission control part, 15 ... D / A conversion part, 16 ... Amplifier, 21 ... Sound database, 22 ... CPU, 23 ... RAM, 24 ROM, 30 human sensor, 31 speaker.

Claims

A buffer for storing masker sound signals;
A sound emission control unit that repeatedly reads out and outputs a masker sound signal stored in the buffer;
An acquisition means for acquiring a sound signal;
The sound signal acquired by the acquisition unit is set as a processing target, and a reordering process for changing the order of arrangement of the signals in the processing target is executed every time a predetermined time elapses, and the masker sound signal obtained as a result thereof is stored in the buffer. A masker sound generating apparatus , comprising: a means for overwriting, and a generating means for changing a method of the rearrangement process each time the rearrangement process is executed .

Obtaining means for obtaining a sound signal of a predetermined time length;
A process for processing the sound signal acquired by the acquisition unit as a processing target, executing a rearrangement process for randomly changing the arrangement order of the signals in the processing target according to a random number sequence, and outputting a masker sound signal obtained as a result And generating means for repeatedly changing the rearrangement processing method according to the random number sequence changed to a different type from the random number sequence every time a predetermined time longer than the predetermined time length elapses.
A masker sound generating apparatus comprising:

The generating means sets each signal obtained from a plurality of types of sound signals as a processing target, and changes a rearrangement processing method applied to each of the plurality of types of processing targets each time the predetermined time elapses. 3. The masker according to claim 1, wherein a reordering process is performed on each of the plurality of types of processing targets, and a masker sound signal is generated by mixing each processing target after the reordering process. Sound generator.

The process according to claim 1, wherein the process of obtaining the processing target from the sound signal by the generating unit includes at least a division process of dividing the sound signal into a plurality of sections. Masker sound generator.

The process in which the generating means obtains the processing target from the sound signal includes an intra-section reversal process for reversing the arrangement of the sound signals in each section of the plurality of sections obtained through the division process. The masker sound generation device according to claim 4.

5. The process of obtaining the processing target from the sound signal by the generating means includes a window function multiplication process of multiplying a plurality of sections obtained by the division process by a window function. Or a masker sound generating device according to 5;