JP5741175B2

JP5741175B2 - Confidential data generating device, concealed data generating method, concealing device, concealing method and program

Info

Publication number: JP5741175B2
Application number: JP2011093584A
Authority: JP
Inventors: 茂出木　敏雄; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2011-04-20
Filing date: 2011-04-20
Publication date: 2015-07-01
Anticipated expiration: 2031-04-20
Also published as: JP2012226113A

Description

本発明は、対話音声を秘匿化する音楽データを生成する秘匿化データ生成装置等に関するものである。 The present invention relates to a concealed data generation apparatus that generates music data for concealing dialogue voice.

医療機関（調剤薬局などの受付カウンター）、金融機関・保険会社の相談カウンター、法律事務所などの面談室、携帯電話店のカウンター、会食に使われる飲食店などにおいて交わされる対話音声は、第３者に聴取されることが好ましくない個人情報や企業の機密情報が含まれることが少なくない。しかしながら、従来は、簡易的な間仕切りのみによって済ませている施設が多い。これは、事務所や店のスペース・コストの制約から、カラオケボックスのように遮音機能をもつ什器を導入したり、内装工事を行ったりすることは必ずしも容易ではないからである。そこで、現状設備に殆ど手を加えることなく、対話音声を秘匿化する手法が求められている。 Dialogue voices exchanged at medical institutions (reception counters at dispensing pharmacies, etc.), consultation counters at financial institutions and insurance companies, interview rooms at law firms, mobile phone counters, restaurants used for dinner, etc. In many cases, personal information that is not desirable to be heard by a person or confidential information of a company is included. However, in the past, there are many facilities where only simple partitioning is used. This is because it is not always easy to install a furniture with a sound insulation function like a karaoke box or to perform interior work due to space and cost constraints of offices and shops. Therefore, there is a demand for a technique for concealing dialogue voices with little modification to the current equipment.

音を秘匿化する手法の１つとして、電気的に消音する能動消音法（ＡＮＣ：ＡｃｔｉｖｅＮｏｉｓｅＣｏｎｔｒｏｌ：特許文献１参照）があるが、対象は定常的な騒音に限定されるため、音声のように時間変化が顕著な音には適用できない。 As one method for concealing sound, there is an active silencing method (ANC: Active Noise Control: see Patent Document 1) that electrically silences, but since the target is limited to stationary noise, However, this method cannot be applied to sounds that change significantly with time.

もう１つの音を秘匿化する手法として、ＢＧＭ（ＢａｃｋＧｒｏｕｎｄＭｕｓｉｃ）を利用する手法がある。例えば、ショッピングセンター、カクテルパーティ、飲食店などではＢＧＭが流れていることが多い。これは、人間の聴覚マスキング効果を活用して雑踏騒音を和らげることを意図している。しかし、人間はカクテルパーティ効果と呼ばれる、聴覚マスキング効果とは全く逆の特性も備えている。カクテルパーティ効果とは、カクテルパーティのように多くの人がそれぞれ雑談している中でも、自分が興味のある人の会話などは自然に聴き取ることができるという音声の選択的聴取のことである。
人間は、カクテルパーティ効果によって、より大きな音源（ＢＧＭ等）により部分的にマスクされた音声を補間して興味のある音声を聴取しようとする働きがある為、通常のＢＧＭによって音声を完全に秘匿化することまでは期待できない。このような問題を解決する為に、（１）エネルギーマスキング、（２）インフォメーションマスキングという２つの手法が提案されている。 As another method for concealing the sound, there is a method using BGM (BackGround Music). For example, BGM is often played at shopping centers, cocktail parties, restaurants, and the like. This is intended to mitigate hustle noise using the human auditory masking effect. However, humans also have a completely opposite characteristic to the auditory masking effect, called the cocktail party effect. The cocktail party effect is a selective listening of voice that a conversation of a person who is interested can be naturally heard even when many people are chatting each other like a cocktail party.
Humans work to interpolate partially masked speech with a larger sound source (BGM, etc.) due to the cocktail party effect and listen to the speech of interest, so the speech is completely concealed by normal BGM. It cannot be expected until it becomes. In order to solve such a problem, two methods of (1) energy masking and (2) information masking have been proposed.

（１）エネルギーマスキングについては、例えば、特許文献２に記載されている。特許文献２には、白色雑音（少なくとも可聴域にて、パワーが周波数によらず略均一な傾向を有した雑音）等をマスキング音として流し、聴覚マスキング効果によって音声等をマスキングすることが記載されている。 (1) About energy masking, it describes in patent document 2, for example. Patent Document 2 describes that white noise (noise that has a tendency of power to be substantially uniform regardless of frequency in at least an audible range) or the like is flown as a masking sound and the sound is masked by an auditory masking effect. ing.

（２）インフォメーションマスキングについては、例えば、特許文献３、４に記載されている。特許文献３には、ある音響空間に設置されたマイクロホンから音信号を受取り、受け取った音信号にスクランブルをかけてマスキングサウンドを生成し、他の音響空間（音声信号が漏洩して欲しくない空間）に放音することが記載されている。また、特許文献４には、リアルタイムに録音された対話音声を解析し、対話音声を加工してマスキング音を生成し、出力することが記載されている。 (2) Information masking is described in Patent Documents 3 and 4, for example. In Patent Document 3, a sound signal is received from a microphone installed in a certain acoustic space, the received sound signal is scrambled to generate a masking sound, and another acoustic space (a space where the sound signal is not desired to leak). It is described that sound is emitted. Japanese Patent Application Laid-Open No. 2004-228561 describes analyzing dialogue voice recorded in real time, processing the dialogue voice to generate a masking sound, and outputting it.

しかしながら、特許文献２に記載の手法では、音圧が高いマスキング音が四六時中流れることになり、待合室の人々の雑談や面談中の会話が聞き取り難くなるという問題が指摘されている。
また、特許文献３、４に記載の手法では、マスキング音が人間に不快感を与えるとう問題が指摘されている。また、録音する為のマイクロホン、高速信号処理装置などが必要となり、コストがかかるという問題が指摘されている。尚、不快なマスキング音を和らげるために、更にＢＧＭを合成するという手法も考えられるが、音圧が大きくなり煩わしくなるという別の問題が発生する。 However, in the method described in Patent Document 2, a masking sound having a high sound pressure flows all the time, and it has been pointed out that it is difficult to hear the chat of people in the waiting room and the conversation during the interview.
Further, in the methods described in Patent Documents 3 and 4, there is a problem that the masking sound gives an unpleasant feeling to humans. In addition, a problem has been pointed out that a microphone for recording, a high-speed signal processing device, and the like are required, which is expensive. In order to relieve the unpleasant masking sound, a method of further synthesizing BGM is conceivable, but another problem arises that the sound pressure increases and becomes troublesome.

そこで、本発明者は、人間にとって心地良く、かつ秘匿効果が高い秘匿化データを安価に生成することができる秘匿化データ生成装置等を発明した（特許文献５参照）。また、本発明者は、人手を費やさずに、秘匿化データのどの再生箇所においてもマスキング効果を満遍なく働かせることができる秘匿化データ生成装置等を発明した（特許文献６参照）。
特許文献５及び特許文献６では、ＢＧＭ信号に対して音声に対するマスキング効果を強調させるためのフィルタ関数を設定するにあたり、代表的な音声信号の最大値スペクトルを使用するＢＧＭ音楽信号の平均値スペクトルで除算した値を基にしてフィルタ関数を設定する手法が提案されている。 In view of this, the present inventors have invented a concealed data generating device that can generate concealed data that is comfortable for humans and has a high concealing effect at low cost (see Patent Document 5). In addition, the inventor has invented a concealed data generation device and the like that can make the masking effect work evenly at any reproduction location of the concealed data without spending labor (see Patent Document 6).
In Patent Document 5 and Patent Document 6, when setting a filter function for emphasizing the masking effect on the sound for the BGM signal, the average value spectrum of the BGM music signal using the maximum value spectrum of a typical sound signal is used. A method for setting a filter function based on the divided value has been proposed.

特許第２５４４８９９号公報Japanese Patent No. 2544899 特開２０１０−０３１５０１号公報JP 2010-031501 A 特許第４２４５０６０号公報Japanese Patent No. 42456060 特許第４３３６５５２号公報Japanese Patent No. 4336552 特願２０１０−１９２１３３号Japanese Patent Application No. 2010-192133 特願２０１１−０００９２９号Japanese Patent Application No. 2011-000929

ところで、特許文献５及び特許文献６の手法では、ヒト音声信号成分が多く含まれる５ｋＨｚ〜１０ｋＨｚの周波数成分が強調されるようにフィルタ関数が設定されやすい。５ｋＨｚ〜１０ｋＨｚの周波数帯域は、ヒト聴覚系の感度特性が比較的低い領域ではあるが、このフィルタ関数を用いて音楽信号にフィルタ加工を施した音楽を流そうとすると、ヒト聴覚系の感度特性が高い４ｋＨｚ未満の周波数帯域を基準に再生音量を設定するため、それに伴って５ｋＨｚ〜１０ｋＨｚの周波数帯域の音量が顕著に大きくなり、音色が不自然に変化して煩くなる場合がある。
尚、ヒト聴覚系の感度特性を示す等ラウドネス曲線は、フレッチャー＆マンソンらによる計測データを基本にＩＳＯ２２６として規格化されたものである。ＩＳＯ２２６の規格は、１ｋＨｚ以下の低い周波数帯域がよりヒト聴覚系の感度特性に合うように、更なる改良が行われている。 By the way, in the methods of Patent Document 5 and Patent Document 6, it is easy to set a filter function so that frequency components of 5 kHz to 10 kHz that contain many human audio signal components are emphasized. The frequency band of 5 kHz to 10 kHz is a region where the sensitivity characteristic of the human auditory system is relatively low. However, if the filter function is used to play music that has been subjected to filter processing, the sensitivity characteristic of the human auditory system. Since the reproduction volume is set based on a frequency band of less than 4 kHz, which is high, the volume in the frequency band of 5 kHz to 10 kHz is remarkably increased, and the timbre may change unnaturally and become troublesome.
The equal loudness curve indicating the sensitivity characteristic of the human auditory system is standardized as ISO226 based on measurement data by Fletcher & Manson et al. The ISO 226 standard is further improved so that a low frequency band of 1 kHz or less matches the sensitivity characteristics of the human auditory system.

本発明は、前述した問題点に鑑みてなされたもので、その目的とすることは、音声信号に対するマスキング効果を高めつつ、再生される音楽の音色を原音と同等に維持し、音量を絞って再生しても所定のマスキング効果を働かせることができる秘匿化データ生成装置等を提供することである。 The present invention has been made in view of the above-described problems, and its purpose is to maintain the tone of music to be reproduced equal to the original sound while reducing the volume while enhancing the masking effect on the audio signal. An object of the present invention is to provide a concealed data generation device or the like that can exert a predetermined masking effect even when reproduced.

前述した目的を達成するために第１の発明は、対話音声を秘匿化するための音楽データである秘匿化データを生成する秘匿化データ生成装置であって、予め記憶された音声データ及び音楽データの各々に対して周波数解析を行い、前記音声データの時間軸方向に最大のスペクトルである音声最大値スペクトルＶｖ（ｊ）（ｊは周波数）を算出し、前記音楽データの時間軸方向に平均化したスペクトルである音楽平均値スペクトルＶｍ（ｊ）を算出する周波数解析手段と、前記音声最大値スペクトルＶｖ（ｊ）に基づく値を、前記音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値である除算値スペクトルＤｉｖ（ｊ）を算出し、更に、前記除算値スペクトルＤｉｖ（ｊ）の各値に対して、互いに対応する周波数ｊごとにヒト聴覚感度の重みを定義した聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｊ）を作成するフィルタ関数作成手段と、前記音楽データを所定の区間単位であるフレームｆに分割し、分割された各フレームｆをフーリエ変換し、前記フィルタ関数Ｆ（ｊ）を乗じ、フーリエ逆変換することによって、前記秘匿化データを生成するフィルタリング手段と、を具備することを特徴とする秘匿化データ生成装置である。
第１の発明によって、音声信号に対するマスキング効果を高めつつ、再生される音楽の音色を原音と同等に維持し、音量を絞って再生しても所定のマスキング効果を働かせることができる。 In order to achieve the above-described object, the first invention is a concealment data generation device for generating concealment data that is music data for concealing a conversational voice, and stores voice data and music data stored in advance. Frequency analysis is performed on each of the voice data, and a voice maximum value spectrum Vv (j) (j is a frequency) which is the maximum spectrum in the time axis direction of the voice data is calculated and averaged in the time axis direction of the music data The frequency analysis means for calculating the music average value spectrum Vm (j), which is the spectrum obtained, and the value based on the voice maximum value spectrum Vv (j) correspond to each other by the value based on the music average value spectrum Vm (j). The division value spectrum Div (j), which is a value divided for each frequency j, is calculated, and each value of the division value spectrum Div (j) corresponds to each other. Filter function creating means for creating a filter function F (j) by multiplying a value based on an auditory sensitivity correction curve L (j) defining a weight of human auditory sensitivity for each frequency j, and the music data is predetermined. Filtering means for generating the concealed data by dividing the divided frames into frames f, and performing Fourier transform on each divided frame f, multiplying by the filter function F (j), and performing Fourier inverse transform, A concealed data generation device characterized by comprising:
According to the first invention, while enhancing the masking effect on the audio signal, the timbre of the music to be reproduced can be maintained at the same level as the original sound, and the predetermined masking effect can be exerted even if the volume is reduced.

第１の発明における前記フィルタ関数作成手段が用いる前記聴覚感度補正曲線Ｌ（ｊ）は、例えば、４０フォンの等ラウドネス曲線に基づいて定義される。
４０フォンは、通常の音声や音楽を聴取する際の平均的なラウドネスレベルであり、適切なフィルタ関数を作成することができる。 The auditory sensitivity correction curve L (j) used by the filter function creating means in the first invention is defined based on, for example, an equal loudness curve of 40 phones.
Forty phones is an average loudness level when listening to normal speech and music, and an appropriate filter function can be created.

また、第１の発明における前記フィルタリング手段は、フレームｆごとに、前記フィルタ関数Ｆ（ｊ）が乗算された複素スペクトルに対して、所定の周波数の範囲の中で前記複素スペクトルの最大スカラー値を求め、更に、前記複素スペクトルの各要素に対して、当該要素のスカラー値が前記最大スカラー値を超えない範囲内において所定の１以上のスケール値を乗算させる補正を施した後、前記フーリエ逆変換を行うことが望ましい。
これによって、離散的なスペクトル特性の状態を維持したまま、低域部を白色雑音のように若干平坦に近づけることができ、ひいては、音楽の音色を維持したまま、更にマスキング効果を高めることができる。 Further, the filtering means in the first aspect of the present invention provides the maximum scalar value of the complex spectrum within a predetermined frequency range for the complex spectrum multiplied by the filter function F (j) for each frame f. And further performing a correction for multiplying each element of the complex spectrum by a predetermined scale value within a range in which the scalar value of the element does not exceed the maximum scalar value, and then performing the inverse Fourier transform It is desirable to do.
As a result, it is possible to bring the low-frequency part closer to flatness like white noise while maintaining the state of the discrete spectral characteristics, and further enhance the masking effect while maintaining the timbre of the music. .

また、第１の発明における前記フィルタ関数作成手段は、前記音声最大値スペクトルＶｖ（ｊｃ）（ｊｃは特定の周波数）を、周波数ｊｃよりも高域側の範囲内の最大値に置換することによって、置換音声最大値スペクトルを算出し、前記音楽平均値スペクトルＶｍ（ｊｃ）を、周波数ｊｃの前後の範囲内の平均値に置換することによって、置換音楽平均値スペクトルを算出し、前記置換音声最大値スペクトルを前記置換音楽平均値スペクトルによって除した値を、前記除算値スペクトルＤｉｖ（ｊ）とすることが望ましい。
マスキングは、高音側（周波数が高域側）に働きやすいという性質がある為、音声最大値スペクトルＶｖ（ｊ）を、周波数ｊよりも高域側の範囲内の最大値に置換すれば、音声スペクトルを周波数方向に低音側に非線形シフトする補正を行っていることになり、ひいては、マスキング効果を高めることができる。 Further, the filter function creating means in the first invention replaces the voice maximum value spectrum Vv (jc) (jc is a specific frequency) with a maximum value within a range higher than the frequency jc. Then, a replacement speech maximum value spectrum is calculated, and a replacement music average value spectrum is calculated by replacing the music average value spectrum Vm (jc) with an average value within a range before and after the frequency jc, and the replacement speech maximum value is calculated. A value obtained by dividing the value spectrum by the replacement music average value spectrum is preferably the division value spectrum Div (j).
Since the masking has a property that it tends to work on the high sound side (frequency is on the high frequency side), if the sound maximum spectrum Vv (j) is replaced with the maximum value within the range on the high frequency side than the frequency j, the sound The correction is performed to nonlinearly shift the spectrum to the low frequency side in the frequency direction, and as a result, the masking effect can be enhanced.

また、第１の発明における前記フィルタ関数作成手段は、前記フィルタ関数Ｆ（ｊ）の各値に対して前記聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算した後、周波数ｊの前後の範囲内の平均値に置換することによって、前記フィルタ関数Ｆ（ｊ）を平滑化することが望ましい。
これによって、フィルタ関数が滑らかになり、ひいては、最終的に生成される秘匿化データが、人間にとって心地良い音楽データとなる。 In addition, the filter function creating means in the first invention multiplies each value of the filter function F (j) by a value based on the auditory sensitivity correction curve L (j), and then a range before and after the frequency j. It is desirable to smooth the filter function F (j) by replacing it with an average value.
As a result, the filter function becomes smooth, and as a result, the concealment data finally generated becomes music data comfortable for humans.

また、第１の発明における前記周波数解析手段は、前記音楽平均値スペクトルＶｍ（ｆ,ｊ）として、前記音楽データの各フレームｆの前後Ｍフレームに渡って時間軸方向に平均化したスペクトルをフレームｆごとに算出し、前記フィルタ関数作成手段は、前記除算値スペクトルＤｉｖ（ｆ,ｊ）として、前記音声最大値スペクトルＶｖ（ｊ）に基づく値を、フレームｆに対応する前記音楽平均値スペクトルＶｍ（ｆ,ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値を算出し、更に、前記除算値スペクトルＤｉｖ（ｆ,ｊ）の各値に対して、互いに対応する周波数ｊごとに前記聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、前記フィルタ関数Ｆ（ｆ,ｊ）を作成することが望ましい。
これによって、人手を費やさずに、どの再生箇所においてもマスキング効果が満遍なく働く秘匿化データを生成することができる。 In the first aspect of the present invention, the frequency analysis means uses, as the music average spectrum Vm (f, j), a spectrum obtained by averaging a spectrum averaged in the time axis direction over M frames before and after each frame f of the music data. The filter function creating means calculates a value based on the voice maximum value spectrum Vv (j) as the divided value spectrum Div (f, j) as the division value spectrum Div (f, j), and the music average value spectrum Vm corresponding to the frame f. A value obtained by dividing each frequency j corresponding to each other by a value based on (f, j) is calculated. Further, for each value of the divided value spectrum Div (f, j), for each frequency j corresponding to each other, It is desirable to create the filter function F (f, j) by multiplying a value based on the auditory sensitivity correction curve L (j).
As a result, it is possible to generate concealment data in which the masking effect works evenly at any reproduction location without spending manpower.

また、第１の発明は、複数の前記音楽データを記憶する音楽データ記憶手段と、前記音楽データ記憶手段によって記憶されている前記音楽データの中から単一の前記音楽データを選択する音楽データ選択手段と、を更に具備し、前記音楽データ選択手段によって選択された単一の前記音楽データに基づいて、前記秘匿化データを生成することが望ましい。
これによって、複数の音楽データに基づいて、複数の秘匿化データを生成することができる。 According to a first aspect of the present invention, there is provided music data storage means for storing a plurality of music data, and music data selection for selecting a single music data from the music data stored by the music data storage means And means for generating the concealment data based on the single music data selected by the music data selection means.
Thereby, a plurality of concealment data can be generated based on a plurality of music data.

第２の発明は、第１の発明の秘匿化データ生成装置が生成する複数の前記秘匿化データを記憶する秘匿化データ記憶手段と、前記秘匿化データ記憶手段によって記憶されている前記秘匿化データの中から単一の前記秘匿化データを選択する秘匿化データ選択手段と、前記秘匿化データ選択手段によって選択された単一の前記秘匿化データを再生する秘匿化データ再生手段と、を具備することを特徴とする秘匿化装置である。
第２の発明によって、第１の発明の秘匿化データ生成装置を物理的に分離することができ、第１の発明の秘匿化データ生成装置を働かさなくても、あらかじめ作成された秘匿化データを随時再生することができる。 2nd invention is the concealment data storage means which memorize | stores the said some concealment data which the concealment data production | generation apparatus of 1st invention produces | generates, The said concealment data memorize | stored by the said concealment data storage means A concealment data selection unit that selects the single concealment data from among the data, and a concealment data reproduction unit that reproduces the single concealment data selected by the concealment data selection unit This is a concealment device.
According to the second invention, the concealed data generating device of the first invention can be physically separated, and the concealed data created in advance can be obtained without operating the concealed data generating device of the first invention. It can be played at any time.

第２の発明における前記秘匿化データ再生手段は、前記秘匿化データを波面が平面波に近い音波として所定平面から均一に放射する機構をもつ平面型スピーカによって構成されていることが望ましい。
これによって、秘匿化対象位置に伝搬される過程で減衰する音波のエネルギー量が、対話音声に比べ記秘匿化データの方が小さくなり、相対的に秘匿化データのエネルギー量が対話音声に比べ大きくなるため、マスキング効果を高めることができる。 The concealed data reproducing means in the second invention is preferably constituted by a flat speaker having a mechanism for uniformly radiating the concealed data from a predetermined plane as a sound wave having a wavefront close to a plane wave.
As a result, the energy amount of the sound wave attenuated in the process of being propagated to the concealment target position is smaller in the concealed data than in the conversation voice, and the energy amount of the concealed data is relatively larger than that in the conversation voice. Therefore, the masking effect can be enhanced.

第３の発明は、対話音声を秘匿化するための音楽データである秘匿化データを生成する秘匿化データ生成方法であって、予め記憶された音声データ及び音楽データの各々に対して周波数解析を行い、前記音声データの時間軸方向に最大のスペクトルである音声最大値スペクトルＶｖ（ｊ）（ｊは周波数）を算出し、前記音楽データの時間軸方向に平均化したスペクトルである音楽平均値スペクトルＶｍ（ｊ）を算出する周波数解析ステップと、前記音声最大値スペクトルＶｖ（ｊ）に基づく値を、前記音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値である除算値スペクトルＤｉｖ（ｊ）を算出し、更に、前記除算値スペクトルＤｉｖ（ｊ）の各値に対して、互いに対応する周波数ｊごとにヒト聴覚感度の重みを定義した聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｊ）を作成するフィルタ関数作成ステップと、前記音楽データを所定の区間単位であるフレームｆに分割し、分割された各フレームｆをフーリエ変換し、前記フィルタ関数Ｆ（ｊ）を乗じ、フーリエ逆変換することによって、前記秘匿化データを生成するフィルタリングステップと、を含むことを特徴とする秘匿化データ生成方法である。
第３の発明によって、音声信号に対するマスキング効果を高めつつ、再生される音楽の音色を原音と同等に維持し、音量を絞って再生しても所定のマスキング効果を働かせることができる。 A third invention is a concealment data generation method for generating concealment data, which is music data for concealing dialogue voice, and performs frequency analysis on each of prestored voice data and music data. And calculating a voice maximum value spectrum Vv (j) (j is a frequency) which is a maximum spectrum in the time axis direction of the voice data, and a music average spectrum which is a spectrum averaged in the time axis direction of the music data A frequency analysis step for calculating Vm (j) and a value obtained by dividing a value based on the maximum speech spectrum Vv (j) for each corresponding frequency j by a value based on the music average spectrum Vm (j). A certain division value spectrum Div (j) is calculated, and further, for each value of the division value spectrum Div (j), a human listening is performed for each frequency j corresponding to each other. A filter function creating step for creating a filter function F (j) by multiplying a value based on the auditory sensitivity correction curve L (j) defining the sensitivity weight, and a frame f that is a unit of a predetermined section of the music data. A filtering step for generating the concealed data by performing Fourier transform on each of the divided frames f, multiplying by the filter function F (j), and performing Fourier inverse transform. This is a method for generating confidential data.
According to the third aspect of the invention, the masking effect on the audio signal can be enhanced, the timbre of the music to be reproduced is maintained at the same level as the original sound, and the predetermined masking effect can be exerted even when the volume is reduced.

第４の発明は、第３の発明の秘匿化データ生成方法によって生成する複数の前記秘匿化データを記憶する秘匿化データ記憶ステップと、前記秘匿化データ記憶ステップによって記憶されている前記秘匿化データの中から単一の前記秘匿化データを選択する秘匿化データ選択ステップと、前記秘匿化データ選択ステップによって選択された単一の前記秘匿化データを再生する秘匿化データ再生ステップと、を含むことを特徴とする秘匿化方法である。
第４の発明によって、第２の発明の秘匿化データ生成方法を物理的に分離することができ、第２の発明の秘匿化データ生成方法を働かさなくても、あらかじめ作成された秘匿化データを随時再生することができる。 4th invention is the concealment data storage step which memorize | stores the said some concealment data produced | generated by the concealment data generation method of 3rd invention, The said concealment data memorize | stored by the said concealment data storage step Including a concealment data selection step for selecting a single concealment data from among the above, and a concealment data reproduction step for reproducing the single concealment data selected by the concealment data selection step. It is the concealment method characterized by this.
According to the fourth invention, the concealed data generation method of the second invention can be physically separated, and the concealed data created in advance can be obtained without using the concealed data generation method of the second invention. It can be played at any time.

第５の発明は、コンピュータに、予め記憶された音声データ及び音楽データの各々に対して周波数解析を行い、前記音声データの時間軸方向に最大のスペクトルである音声最大値スペクトルＶｖ（ｊ）（ｊは周波数）を算出し、前記音楽データの時間軸方向に平均化したスペクトルである音楽平均値スペクトルＶｍ（ｊ）を算出する周波数解析ステップと、前記音声最大値スペクトルＶｖ（ｊ）に基づく値を、前記音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値である除算値スペクトルＤｉｖ（ｊ）を算出し、更に、前記除算値スペクトルＤｉｖ（ｊ）の各値に対して、互いに対応する周波数ｊごとにヒト聴覚感度の重みを定義した聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｊ）を作成するフィルタ関数作成ステップと、前記音楽データを所定の区間単位であるフレームｆに分割し、分割された各フレームｆをフーリエ変換し、前記フィルタ関数Ｆ（ｊ）を乗じ、フーリエ逆変換することによって、前記秘匿化データを生成するフィルタリングステップと、を実行させるためのコンピュータ読取可能なプログラムである。
第５の発明を汎用のコンピュータにインストールすることによって、第１の発明の秘匿化データ生成装置または第３の発明の秘匿化データ生成方法を汎用コンピュータ上で実現することができる。 According to a fifth aspect of the present invention, frequency analysis is performed on each of voice data and music data stored in advance in a computer, and a voice maximum value spectrum Vv (j) (which is a maximum spectrum in the time axis direction of the voice data). j is a frequency), a frequency analysis step of calculating a music average value spectrum Vm (j) that is a spectrum averaged in the time axis direction of the music data, and a value based on the voice maximum value spectrum Vv (j) Is divided by a frequency j corresponding to each other by a value based on the music mean value spectrum Vm (j), and a divided value spectrum Div (j) is calculated. By multiplying the value by a value based on the auditory sensitivity correction curve L (j) defining the weight of the human auditory sensitivity for each frequency j corresponding to each other, A filter function creating step for creating a filter function F (j), the music data is divided into frames f which are predetermined section units, each of the divided frames f is Fourier transformed, and the filter function F (j) is A computer-readable program for executing the filtering step of generating the concealment data by multiplication and inverse Fourier transform.
By installing the fifth invention on a general-purpose computer, the concealed data generating apparatus of the first invention or the concealed data generating method of the third invention can be realized on a general-purpose computer.

本発明によって、フィルタ加工に起因してＢＧＭ音楽中のヒトの聴覚感度が低い周波数帯域の成分が強調されるため再生音量を高めに設定することを抑止し、音色が不自然に変化することを避けることができ、従来よりもＢＧＭ音楽の再生音量を抑えながら、従来と同等以上のマスキング効果を働かせることができる。すなわち、従来よりも再生音量が低く、かつ快適な音響環境にて秘匿化効果を向上させることができる。 According to the present invention, a component of a frequency band with low human auditory sensitivity in BGM music is emphasized due to the filter processing, so that setting the playback volume to a high level is suppressed, and the timbre changes unnaturally. It is possible to avoid the masking effect equivalent to or higher than the conventional one while suppressing the playback volume of the BGM music more than the conventional one. That is, it is possible to improve the concealment effect in a comfortable acoustic environment with a lower reproduction volume than before.

秘匿化装置の概要図Overview of the concealment device 秘匿化データ生成装置のハードウエア構成図Hardware configuration diagram of the concealment data generation device 等ラウドネス曲線の一例を示す図Diagram showing an example of equal loudness curves 聴覚感度補正曲線の一例を示す図Diagram showing an example of auditory sensitivity correction curve 秘匿化処理の流れを示すフローチャートFlow chart showing the flow of concealment processing 秘匿化データ生成処理の流れを示す図Diagram showing the flow of concealment data generation processing 周波数解析処理を説明する図（１）Diagram explaining frequency analysis processing (1) 周波数解析処理を説明する図（２）Diagram explaining frequency analysis processing (2) フィルタ関数作成処理を説明する図（１）FIG. (1) explaining filter function creation processing フィルタ関数作成処理を説明する図（２）FIG. (2) explaining the filter function creation processing フィルタ関数作成処理を説明する図（３）FIG. 3 illustrates the filter function creation process (3) フィルタリング処理を説明する図（１）Diagram explaining filtering process (1) フィルタリング処理を説明する図（２）Diagram for explaining the filtering process (2) フィルタリング処理を説明する図（３）Fig. 3 illustrates filtering processing. 秘匿化装置の第１の設置例First installation example of concealment device 秘匿化装置の第２の設置例Second installation example of concealment device 実施例及び比較例の音声最大値スペクトルを示す図The figure which shows the audio | voice maximum value spectrum of an Example and a comparative example. 実施例及び比較例の音楽平均値スペクトルを示す図The figure which shows the music average value spectrum of an Example and a comparative example 比較例のフィルタ関数を示す図The figure which shows the filter function of the comparative example 比較例のフィルタリング処理後の音楽信号を示す図The figure which shows the music signal after the filtering process of a comparative example 実施例の聴覚感度補正曲線を示す図The figure which shows the auditory sensitivity correction curve of an Example 実施例のフィルタ関数を示す図The figure which shows the filter function of an Example 実施例のフィルタリング処理後（圧縮なし）の音楽信号を示す図The figure which shows the music signal after the filtering process of an Example (no compression) 実施例のフィルタリング処理後（圧縮あり）の音楽信号を示す図The figure which shows the music signal after the filtering process (with compression) of an Example

以下図面に基づいて、本発明の実施形態を詳細に説明する。
図１は、秘匿化装置１の概要図である。図１に示すように、秘匿化装置１は、少なくとも、秘匿化データ生成装置２及び音楽再生装置３から構成される。
秘匿化データ生成装置２は、例えば、コンピュータ等であり、対話音声を秘匿化するための音楽データである秘匿化データ７を生成する。秘匿化データ生成装置２の記憶部には、少なくとも音声データ４、音楽データ５、聴覚感度補正曲線６が記憶される。これらのデータについては後述する。
音楽再生装置３は、音楽プレーヤ及びスピーカから構成され、秘匿化データ７を再生する。音楽再生装置３の記憶部には、少なくとも秘匿化データ生成装置２によって生成される秘匿化データ７が記憶される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a schematic diagram of the concealment device 1. As shown in FIG. 1, the concealment device 1 includes at least a concealment data generation device 2 and a music playback device 3.
The concealment data generation device 2 is, for example, a computer or the like, and generates concealment data 7 that is music data for concealing the dialogue voice. The storage unit of the concealed data generation device 2 stores at least audio data 4, music data 5, and auditory sensitivity correction curve 6. These data will be described later.
The music playback device 3 is composed of a music player and a speaker, and plays back the concealment data 7. The storage unit of the music playback device 3 stores at least concealment data 7 generated by the concealment data generation device 2.

秘匿化装置１は、用途に応じて様々な構成を採ることが可能である。秘匿化装置１を構成する秘匿化データ生成装置２及び音楽再生装置３は、図１に示すように異なる筐体としても良いし、１つの筐体としても良い。
また、秘匿化データ生成装置２及び音楽再生装置３は、図１に示すように有線によって接続されても良いし、無線によって接続されても良いし、ネットワークを介して接続されても良いし、接続されていなくても良い。
秘匿化データ生成装置２及び音楽再生装置３が接続されていない場合、秘匿化データ生成装置２は、秘匿化データ７を記憶媒体（ＣＤ、ＭＤ、ＵＳＢメモリ、ＳＤカードなどコンピュータ及び音楽プレーヤが読取可能な記憶媒体）に出力し、音楽再生装置３は、記憶媒体から秘匿化データ７を入力する。 The concealment device 1 can take various configurations depending on the application. As shown in FIG. 1, the concealment data generation device 2 and the music playback device 3 that constitute the concealment device 1 may be different housings or a single housing.
Further, the concealed data generation device 2 and the music playback device 3 may be connected by wire as shown in FIG. 1, may be connected wirelessly, may be connected via a network, It does not have to be connected.
When the concealment data generation device 2 and the music playback device 3 are not connected, the concealment data generation device 2 reads the concealment data 7 by a storage medium (a computer such as a CD, MD, USB memory, or SD card and a music player). The music player 3 inputs the concealment data 7 from the storage medium.

少なくとも音楽再生装置３は、対話音声の秘匿化を所望する音響空間に設置される。このような音響空間としては、例えば、調剤薬局などの受付カウンターに隣接する待合室などが考えられる。そして、音楽再生装置３は、このような待合室において秘匿化データ７を再生する。
ここで、本発明の実施の形態に係る秘匿化データ生成装置２が生成する秘匿化データ７は、受付カウンターと待合室の間に間仕切りが全く無くても、通常の音量によって、待合室にいる人が受付カウンターの対話音声の内容を聞き取ることができない程度に、秘匿化することが可能である。
音楽再生装置３が設置される音響空間としては、その他に、金融機関、保険会社、携帯電話店などのカウンターに隣接する待機スペース、法律事務所などの面談室に隣接する通路、企業などの応接室、飲食店などの個室などが挙げられる。 At least the music playback device 3 is installed in an acoustic space where it is desired to conceal the dialogue voice. As such an acoustic space, for example, a waiting room adjacent to a reception counter such as a dispensing pharmacy can be considered. Then, the music playback device 3 plays back the concealment data 7 in such a waiting room.
Here, the concealment data 7 generated by the concealment data generation device 2 according to the embodiment of the present invention is such that even if there is no partition between the reception counter and the waiting room, a person who is in the waiting room has a normal volume level. It is possible to conceal it to such an extent that the content of the dialogue voice of the reception counter cannot be heard.
Other acoustic spaces in which the music playback device 3 is installed include waiting spaces adjacent to counters such as financial institutions, insurance companies and mobile phone stores, corridors adjacent to interview rooms such as law offices, and reception areas such as companies Private rooms such as rooms and restaurants.

図２は、秘匿化データ生成装置２のハードウエア構成図である。尚、図２のハードウエア構成は一例であり、用途、目的に応じて様々な構成を採ることが可能である。
秘匿化データ生成装置２は、制御部２１、記憶部２２、メディア入出力部２３、通信制御部２４、入力部２５、表示部２６、周辺機器Ｉ／Ｆ部２７等が、バス２８を介して接続される。 FIG. 2 is a hardware configuration diagram of the concealed data generation device 2. Note that the hardware configuration in FIG. 2 is an example, and various configurations can be adopted depending on the application and purpose.
The concealed data generation apparatus 2 includes a control unit 21, a storage unit 22, a media input / output unit 23, a communication control unit 24, an input unit 25, a display unit 26, a peripheral device I / F unit 27, etc. via a bus 28. Connected.

制御部２１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成される。
ＣＰＵは、記憶部２２、ＲＯＭ、記録媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス２８を介して接続された各装置を駆動制御し、秘匿化データ生成装置２が行う後述する処理を実現する。
ＲＯＭは、不揮発性メモリであり、秘匿化データ生成装置２のブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持している。
ＲＡＭは、揮発性メモリであり、記憶部２２、ＲＯＭ、記録媒体等からロードしたプログラム、データ等を一時的に保持するとともに、制御部１１が各種処理を行う為に使用するワークエリアを備える。 The control unit 21 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
The CPU calls and executes a program stored in the storage unit 22, ROM, recording medium or the like to a work memory area on the RAM, drives and controls each device connected via the bus 28, and provides a concealed data generation device 2 to realize the processing described later.
The ROM is a non-volatile memory, and permanently stores a boot program of the concealed data generation device 2, a program such as BIOS, data, and the like.
The RAM is a volatile memory, and temporarily stores a program, data, and the like loaded from the storage unit 22, ROM, recording medium, and the like, and includes a work area used by the control unit 11 to perform various processes.

記憶部２２は、ＨＤＤ（ハードディスクドライブ）であり、制御部２１が実行するプログラム、プログラム実行に必要なデータ、ＯＳ（オペレーティングシステム）等が格納される。プログラムに関しては、ＯＳ（オペレーティングシステム）に相当する制御プログラムや、後述する処理をコンピュータに実行させるためのアプリケーションプログラムが格納されている。
これらの各プログラムコードは、制御部２１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて各種の手段として実行される。 The storage unit 22 is an HDD (hard disk drive), and stores a program executed by the control unit 21, data necessary for program execution, an OS (operating system), and the like. With respect to the program, a control program corresponding to an OS (operating system) and an application program for causing a computer to execute processing described later are stored.
Each of these program codes is read by the control unit 21 as necessary, transferred to the RAM, read by the CPU, and executed as various means.

メディア入出力部２３（ドライブ装置）は、データの入出力を行い、例えば、ＣＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）、ＤＶＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）、ＭＤドライブ等のメディア入出力装置を有する。
通信制御部２４は、通信制御装置、通信ポート等を有し、秘匿化データ生成装置２とネットワーク間の通信を媒介する通信インタフェースであり、ネットワークを介して、他の装置間との通信制御を行う。ネットワークは、有線、無線を問わない。 The media input / output unit 23 (drive device) inputs / outputs data, for example, a CD drive (-ROM, -R, -RW, etc.), DVD drive (-ROM, -R, -RW, etc.), MD drive, etc. And other media input / output devices.
The communication control unit 24 has a communication control device, a communication port, and the like, and is a communication interface that mediates communication between the concealed data generation device 2 and the network, and performs communication control between other devices via the network. Do. The network may be wired or wireless.

入力部２５は、データの入力を行い、例えば、キーボード、マウス等のポインティングデバイス、テンキー等の入力装置を有する。
入力部２５を介して、秘匿化データ生成装置２に対して、操作指示、動作指示、データ入力等を行うことができる。
表示部２６は、ＣＲＴモニタ、液晶パネル等のディスプレイ装置、ディスプレイ装置と連携して秘匿化データ生成装置２のビデオ機能を実現するための論理回路等（ビデオアダプタ等）を有する。 The input unit 25 inputs data and includes, for example, a keyboard, a pointing device such as a mouse, and an input device such as a numeric keypad.
An operation instruction, an operation instruction, data input, and the like can be performed on the concealed data generation apparatus 2 via the input unit 25.
The display unit 26 includes a display device such as a CRT monitor and a liquid crystal panel, and a logic circuit (such as a video adapter) for realizing the video function of the concealed data generation device 2 in cooperation with the display device.

周辺機器Ｉ／Ｆ（インタフェース）部２７は、秘匿化データ生成装置２に周辺機器を接続させるためのポートであり、秘匿化データ生成装置２は周辺機器Ｉ／Ｆ部２７を介して周辺機器とのデータの送受信を行う。周辺機器Ｉ／Ｆ部２７は、ＵＳＢやＳＤカードリーダ等で構成されている。
バス２８は、各装置間の制御信号、データ信号等の授受を媒介する経路である。 The peripheral device I / F (interface) unit 27 is a port for connecting the peripheral device to the concealed data generation device 2, and the concealed data generation device 2 is connected to the peripheral device via the peripheral device I / F unit 27. Send and receive data. The peripheral device I / F unit 27 is configured by a USB, an SD card reader, or the like.
The bus 28 is a path that mediates transmission / reception of control signals, data signals, and the like between the devices.

図３は、等ラウドネス曲線の一例を示す図である。等ラウドネス曲線は、ＩＳＯ２２６によって規格化されている。等ラウドネス曲線は、ラウドネス（音の聴覚的な強さ）のレベルごとに、周波数の変化に基づいてヒトが感覚的に同じラウドネルレベルに聴取される物理的に計測される音圧レベルの変化を示す曲線である。ラウドネスレベルの単位は、ｐｈｏｎ（フォン、ホン、ホーン）である。音圧レベルの単位は、ｄＢ（デシベル）である。
図３では、横軸が周波数［Ｈｚ］、縦軸が音圧レベル［ｄＢ］であり、ラウドネスレベルごとに等ラウドネス曲線が定義される。図３では、０（最小可聴音場）、１０、２０、３０、・・・、１３０［ｐｈｏｎ］の等ラウドネス曲線が図示されている。 FIG. 3 is a diagram illustrating an example of an equal loudness curve. The equal loudness curve is standardized by ISO226. An equal loudness curve is a change in the physically measured sound pressure level at which the human is audibly audibly heard at the same loudnell level based on the change in frequency for each level of loudness. It is a curve which shows. The unit of the loudness level is phon (phone, phone, horn). The unit of the sound pressure level is dB (decibel).
In FIG. 3, the horizontal axis represents frequency [Hz] and the vertical axis represents sound pressure level [dB], and an equal loudness curve is defined for each loudness level. In FIG. 3, an equal loudness curve of 0 (minimum audible sound field), 10, 20, 30,..., 130 [phon] is illustrated.

図３を見ると分かるように、ラウドネスレベルが大きくなるにつれて、等ラウドネス曲線ごとの最大音圧レベルと最小音圧レベルとの差は小さくなる。すなわち、０［ｐｈｏｎ］の等ラウドネス曲線における最大音圧レベルと最小音圧レベルとの差が一番大きく、１３０［ｐｈｏｎ］の等ラウドネス曲線における最大音圧レベルと最小音圧レベルとの差が一番小さい。
本発明の実施の形態では、通常の音声や音楽を聴取する際の平均的なラウドネスレベルである４０［ｐｈｏｎ］の等ラウドネス曲線を用いて、後述する「聴覚感度補正曲線」を定義する。尚、秘匿化データ７が再生される音響空間の環境がある程度予測できる場合、環境に合わせて等ラウドネス曲線を選択するようにしても良い。 As can be seen from FIG. 3, as the loudness level increases, the difference between the maximum sound pressure level and the minimum sound pressure level for each equal loudness curve decreases. That is, the difference between the maximum sound pressure level and the minimum sound pressure level in the equal loudness curve of 0 [phon] is the largest, and the difference between the maximum sound pressure level and the minimum sound pressure level in the equal loudness curve of 130 [phon] is Smallest.
In the embodiment of the present invention, an “audience sensitivity correction curve” to be described later is defined using an equal loudness curve of 40 [phon], which is an average loudness level when listening to normal speech and music. If the environment of the acoustic space where the concealment data 7 is reproduced can be predicted to some extent, an equal loudness curve may be selected according to the environment.

図４は、聴覚感度補正曲線の一例を示す図である。聴覚感度補正曲線は、秘匿化データ生成装置２によって利用される。聴覚感度補正曲線は、後述する「フィルタ関数」を補正する際に用いられる。
図４では、各周波数に対する上段が、等ラウドネス曲線の音圧レベルを示しており、各周波数に対する下段が、５００Ｈｚを基準（０ｄＢ）とした時の聴覚感度補正曲線の音圧レベルを示している。例えば、周波数が２０[Ｈｚ]に対して、等ラウドネス曲線の音圧レベルが９０．０［ｄＢ］、聴覚感度補正曲線の音圧レベルが−５３．０［ｄＢ］である。また、例えば、周波数が３０[Ｈｚ]に対して、等ラウドネス曲線の音圧レベルが７７．０［ｄＢ］、聴覚感度補正曲線の音圧レベルが−４０．０［ｄＢ］である。
図４に示す例では、“聴覚感度補正曲線の音圧レベル（下段の値）＝等ラウドネス曲線の音圧レベルの５００Ｈｚにおける極小値（＝３７．０）−等ラウドネス曲線の音圧レベル（上段の値）”によって、聴覚感度補正曲線の音圧レベルを求めている。
聴覚感度補正曲線の算出処理は、この例に限られず、例えば、図３の等ラウドネス曲線を、横軸に平行な所定の直線に従って折り返すことによって、聴覚感度補正曲線を求めても良い。また、聴覚感度補正曲線の基準とする周波数は、等ラウドネス曲線上の極小値になる５００Ｈｚに設定する必要もない。 FIG. 4 is a diagram illustrating an example of an auditory sensitivity correction curve. The auditory sensitivity correction curve is used by the concealment data generation device 2. The auditory sensitivity correction curve is used when correcting a “filter function” described later.
In FIG. 4, the upper part for each frequency indicates the sound pressure level of the equal loudness curve, and the lower part for each frequency indicates the sound pressure level of the auditory sensitivity correction curve when 500 Hz is used as a reference (0 dB). . For example, for a frequency of 20 [Hz], the sound pressure level of the equal loudness curve is 90.0 [dB], and the sound pressure level of the auditory sensitivity correction curve is −53.0 [dB]. For example, for a frequency of 30 [Hz], the sound pressure level of the equal loudness curve is 77.0 [dB], and the sound pressure level of the auditory sensitivity correction curve is −40.0 [dB].
In the example shown in FIG. 4, “Sound pressure level of auditory sensitivity correction curve (lower value) = minimum value of sound pressure level of equal loudness curve at 500 Hz (= 37.0) −sound pressure level of equal loudness curve (upper row). The sound pressure level of the auditory sensitivity correction curve is obtained by “
The auditory sensitivity correction curve calculation process is not limited to this example. For example, the auditory sensitivity correction curve may be obtained by folding the equal loudness curve of FIG. 3 according to a predetermined straight line parallel to the horizontal axis. Further, the frequency used as a reference for the auditory sensitivity correction curve does not need to be set to 500 Hz which is a minimum value on the equal loudness curve.

ここで、聴覚感度補正曲線の意義について説明する。
ヒト聴覚系の感度は周波数に依存して変化し、４ｋＨｚ近辺をピークに３００Ｈｚ以下の低音または５ｋＨｚ以上の高音になるほど感度が低下する特性をもつ。ところが、音声信号には音楽信号には比較的少ない５ｋＨｚ〜１０ｋＨｚの周波数帯の成分が多く含まれるため、フィルタ関数はこれらの周波数帯の成分を強調するように働く。この強調される周波数帯域はヒトの聴覚感度が比較的低い帯域であるため、再生時には聴覚感度が高い４ｋＨｚ以下の周波数帯域を基準に音量を設定するようになる。そうすると、これに連動して５ｋＨｚ〜１０ｋＨｚの周波数帯の音量が不自然に大きくなり、全体的に騒がしくなる。そこで、後述するように、聴感特性曲線を重畳してフィルタ関数を生成することによって、５ｋＨｚ〜１０ｋＨｚの周波数帯の強調度合いが抑えられて不自然な音色になることを回避できる。また、ヒトの聴覚感度が高く音声を識別するためのフォルマント成分に富む３００Ｈｚ〜３．４ｋＨｚの周波数帯が強調されることになり、再生音量をあまり上げなくてもマスキングが有効に働きやすくなる。 Here, the significance of the auditory sensitivity correction curve will be described.
The sensitivity of the human auditory system changes depending on the frequency, and has a characteristic that the sensitivity decreases as a low frequency of 300 Hz or lower or a high frequency of 5 kHz or higher reaches a peak around 4 kHz. However, since the audio signal includes many relatively low frequency band components of 5 kHz to 10 kHz in the music signal, the filter function works to emphasize these frequency band components. Since the emphasized frequency band is a band where the human auditory sensitivity is relatively low, the volume is set based on a frequency band of 4 kHz or less where the auditory sensitivity is high during reproduction. Then, in conjunction with this, the volume of the frequency band of 5 kHz to 10 kHz is unnaturally increased, and noise is generally generated. Therefore, as described later, by generating a filter function by superimposing an auditory characteristic curve, it is possible to prevent the degree of emphasis in the frequency band of 5 kHz to 10 kHz from being suppressed and resulting in an unnatural timbre. In addition, the frequency band of 300 Hz to 3.4 kHz, which has a high human auditory sensitivity and is rich in formant components for identifying speech, is emphasized, and masking can easily work effectively without increasing the playback volume.

図５は、秘匿化処理の流れを示すフローチャートである。
図５に示すように、秘匿化データ生成装置２の制御部２１は、音声データ４及び音楽データ５を記憶部２２に記憶する（Ｓ１０１）。音楽データ５は、複数記憶するようにしても良い。
音声データ４は、秘匿化対象の音響空間における対話音声ではなく、固定のサンプルデータとする。すなわち、本発明の実施の形態における秘匿化データ生成装置２は、リアルタイムにサンプリングされた秘匿化対象の対話音声は使用しない。音声データ４は、予め録音された種々の男声、女声が混在した対話音声である。
音楽データ５は任意である。例えば、聴取者にとって意味のあるメロディ・リズム・和声進行が含まれている必要は必ずしもなく、川のせせらぎ音などの自然音でもかまわない。秘匿化対象の対話音声に類似した周波数成分を多く含む音楽データであれば、マスキング効果が働きやすくなるので、マスキング効果を高めるという意味では、声楽データが含まれていることが望ましい。但し、声楽データが含まれると騒がしくなるため、器楽データのみであり、楽器編成が少ない室内楽曲などが現実的である。秘匿化データ生成装置２は、音楽データ５ごとに秘匿化データ７を生成する。 FIG. 5 is a flowchart showing the flow of concealment processing.
As illustrated in FIG. 5, the control unit 21 of the concealed data generation device 2 stores the audio data 4 and the music data 5 in the storage unit 22 (S101). A plurality of music data 5 may be stored.
The voice data 4 is not a conversation voice in the acoustic space to be concealed, but fixed sample data. That is, the concealment data generation device 2 according to the embodiment of the present invention does not use the conversation target conversation voice sampled in real time. The voice data 4 is a dialogue voice in which various male voices and female voices recorded in advance are mixed.
The music data 5 is arbitrary. For example, it is not always necessary to include melodies, rhythms, and progression of harmony that are meaningful to the listener, and natural sounds such as river murmurs may be used. Music data that contains many frequency components similar to the conversational speech to be concealed can easily work with a masking effect. Therefore, it is desirable to include vocal music data in the sense of enhancing the masking effect. However, since voice noise data is noisy, it is realistic to use chamber music with only instrumental data and less instrumentation. The concealment data generation device 2 generates concealment data 7 for each music data 5.

次に、秘匿化データ生成装置２の制御部２１は、単一の音楽データ５を選択する（Ｓ１０２）。音楽データ５の選択は、入力部２５を介してユーザが指示するようにしても良い。
次に、秘匿化データ生成装置２の制御部２１は、Ｓ１０２において選択された単一の音楽データ５に基づいて、秘匿化データ７の生成処理を行う（Ｓ１０３）。秘匿化データ７の生成処理の詳細は後述する。
Ｓ１０２及びＳ１０３の処理を繰り返し、複数の秘匿化データ７を生成するようにしても良い。 Next, the control part 21 of the concealment data generation apparatus 2 selects the single music data 5 (S102). Selection of the music data 5 may be instructed by the user via the input unit 25.
Next, the control unit 21 of the concealed data generation device 2 performs concealment data 7 generation processing based on the single music data 5 selected in S102 (S103). Details of the process for generating the concealment data 7 will be described later.
A plurality of the concealment data 7 may be generated by repeating the processes of S102 and S103.

次に、音楽再生装置３は、Ｓ１０３にて生成された秘匿化データ７を記憶する（Ｓ１０４）。秘匿化データ７は、複数記憶するようにしても良い。
次に、音楽再生装置３は、単一の秘匿化データ７を選択する（Ｓ１０５）。秘匿化データ７の選択は、あらかじめ定義されたプレイリスト（再生プログラム）に基づいて自動的に行われるようにする方法が一般的であるが、ユーザが指示するようにしても良い。
次に、音楽再生装置３は、Ｓ１０５において選択された単一の秘匿化データ７を再生する（Ｓ１０６）。再生音量は、環境の変化に応じて、ユーザの指示により適宜変更される。 Next, the music playback device 3 stores the concealment data 7 generated in S103 (S104). A plurality of concealment data 7 may be stored.
Next, the music playback device 3 selects a single anonymized data 7 (S105). The selection of the concealment data 7 is generally performed automatically based on a playlist (reproduction program) defined in advance, but may be instructed by the user.
Next, the music playback device 3 plays back the single anonymized data 7 selected in S105 (S106). The reproduction volume is appropriately changed according to a user instruction in accordance with a change in environment.

以上により、秘匿化装置１は、音響空間Ａにおける対話音声が、所定の距離だけ離れている音響空間Ｂにいる人に聴取されないように秘匿化することができる。
以下では、秘匿化データ７の生成処理の詳細について説明する。 As described above, the concealment device 1 can conceal the conversation voice in the acoustic space A so that it is not heard by a person in the acoustic space B separated by a predetermined distance.
Below, the detail of the production | generation process of the concealment data 7 is demonstrated.

図６は、秘匿化データ生成処理の流れを示す図である。図６に示すように、秘匿化データ生成処理は、フレーム抽出処理３１、周波数解析処理３２、フィルタ関数作成処理３３、及びフィルタリング処理３４を含む。
ここでは、各処理の概要について説明し、詳細は後述する。 FIG. 6 is a diagram showing the flow of the concealment data generation process. As shown in FIG. 6, the concealment data generation process includes a frame extraction process 31, a frequency analysis process 32, a filter function creation process 33, and a filtering process 34.
Here, an outline of each process will be described, and details will be described later.

フレーム抽出処理３１は、音声データ４及び音楽データ５を入力し、各々に対して所定の区間単位のフレームｆに分割し、音声フレーム群１０及び音楽フレーム群１１を生成する。 The frame extraction process 31 receives the audio data 4 and the music data 5 and divides them into frames f of a predetermined section for each to generate the audio frame group 10 and the music frame group 11.

周波数解析処理３２は、音声フレーム群１０及び音楽フレーム群１１を入力し、音声最大値スペクトルデータ１２及び音楽平均値スペクトルデータ１３を出力する。周波数解析処理３２は、秘匿化データ生成装置２の制御部２１が、音声フレーム群１０及び音楽フレーム群１１の各クレームに対して周波数解析を行い、音声フレームの時間軸方向に最大のスペクトルである単一の音声最大値スペクトルＶｖ（ｊ）（ｊは周波数）を算出し、音楽フレームの時間軸方向に平均化したスペクトルである音楽平均値スペクトルＶｍ（ｊ）を算出する処理である。
尚、Ｖｖ（ｊ）の添え字「ｖ」は、ｖｏｉｃｅの頭文字である。また、Ｖｍ（ｆ、ｊ）の添え字「ｍ」は、ｍｕｓｉｃの頭文字である。 The frequency analysis process 32 receives the audio frame group 10 and the music frame group 11 and outputs the audio maximum value spectrum data 12 and the music average value spectrum data 13. In the frequency analysis process 32, the control unit 21 of the concealed data generation device 2 performs frequency analysis on each claim of the audio frame group 10 and the music frame group 11, and is the maximum spectrum in the time axis direction of the audio frame. This is a process of calculating a single voice maximum value spectrum Vv (j) (j is a frequency) and calculating a music average value spectrum Vm (j) which is a spectrum averaged in the time axis direction of the music frame.
Note that the subscript “v” of Vv (j) is an initial of voice. Further, the subscript “m” of Vm (f, j) is an initial of music.

また、周波数解析処理３２は、フレームｆごとに音楽平均値スペクトルデータ１３を出力しても良い。すなわち、秘匿化データ生成装置２の制御部２１は、音楽平均値スペクトルＶｍ（ｆ,ｊ）として、音楽フレームの前後Ｍフレーム（Ｍ個）に渡って時間軸方向に平均化したスペクトルを算出するようにしても良い。 Further, the frequency analysis processing 32 may output the music average value spectrum data 13 for each frame f. That is, the control unit 21 of the concealed data generation device 2 calculates a spectrum averaged in the time axis direction over the M frames before and after the music frame (M) as the music average value spectrum Vm (f, j). You may do it.

ここで、Ｍは、例えば、「Ｍ（個）×フレームの長さ（秒）」が数秒程度であることが望ましい。これは、「Ｍ（個）×フレームの長さ（秒）」が短すぎると、音楽が不自然に聞こえてしまい、「Ｍ（個）×フレームの長さ（秒）」が長すぎると、マスキング効果、即ち音声の秘匿化が適切に働かない箇所が目立つようになるからである。 Here, as for M, for example, “M (number) × frame length (second)” is preferably about several seconds. This is because if “M (pieces) × frame length (seconds)” is too short, the music will sound unnatural, and if “M (pieces) × frame length (seconds)” is too long, This is because the masking effect, that is, the part where the voice concealment does not work properly becomes conspicuous.

音声データ４は、スペクトルの時系列変動が大きく、無音部も含まれるため、平均値では適切な評価ができない。そこで、本発明の実施の形態では、音声最大値スペクトルＶｖ（ｊ）を１つだけ算出する。
音楽データ５は、フレーム単位の各瞬時スペクトル（位相成分は無視したエネルギー量）に対して、時間軸方向に瞬時スペクトルを平均化した音楽平均値スペクトルＶｍ（ｊ）に置換される。又は、音楽データ５は、フレームｆごとに、前後所定のフレーム数に対応する瞬時スペクトルを平均化した音楽平均値スペクトルＶｍ（ｆ,ｊ）に置換される。 Since the audio data 4 has a large spectrum time series variation and includes a silent part, the average value cannot be appropriately evaluated. Therefore, in the embodiment of the present invention, only one voice maximum value spectrum Vv (j) is calculated.
The music data 5 is replaced with a music average value spectrum Vm (j) obtained by averaging the instantaneous spectrum in the time axis direction for each instantaneous spectrum in units of frames (the energy amount ignoring the phase component). Alternatively, the music data 5 is replaced with a music average value spectrum Vm (f, j) obtained by averaging instantaneous spectra corresponding to a predetermined number of frames before and after every frame f.

フィルタ関数作成処理３３は、音声最大値スペクトルデータ１２及び音楽平均値スペクトルデータ１３を入力し、フレームｆごとに、フィルタ関数データ１４を出力する。フィルタ関数作成処理３３は、秘匿化データ生成装置２の制御部２１が、音声最大値スペクトルＶｖ（ｊ）に基づく値を、音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値である除算値スペクトルＤｉｖ（ｊ）を算出し、更に、除算値スペクトルＤｉｖ（ｊ）の各値に対して、互いに対応する周波数ｊごとにヒト聴覚感度の重みを定義した聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｊ）を作成する処理である。ここで、聴覚感度補正曲線Ｌ（ｊ）の単位は、図４記載のｄＢではなく無次元に換算した値で、具体的には図４記載のｄＢ値をｄとすれば、１０^ｄ／２０で与えられる。 The filter function creation process 33 receives the maximum speech spectrum data 12 and the average music spectrum data 13 and outputs the filter function data 14 for each frame f. In the filter function creation process 33, the control unit 21 of the concealed data generation device 2 sets a value based on the voice maximum value spectrum Vv (j) for each frequency j corresponding to each other by a value based on the music average value spectrum Vm (j). Divide value spectrum Div (j), which is a value divided by 2, and further, auditory sensitivity defining weights of human auditory sensitivity for each frequency j corresponding to each value of divided value spectrum Div (j) In this process, a filter function F (j) is created by multiplying a value based on the correction curve L (j). Here, the unit of the auditory sensitivity correction curve L (j) is not a dB shown in FIG. 4 but a dimensionless value. Specifically, if the dB value shown in FIG. 4 is d, 10 ^{d / 20.} Given in.

また、フィルタ関数作成処理３３は、フレームｆごとにフィルタ関数データ１４を出力しても良い。すなわち、秘匿化データ生成装置２の制御部２１は、除算値スペクトルＤｉｖ（ｆ,ｊ）として、音声最大値スペクトルＶｖ（ｊ）に基づく値を、フレームｆに対応する音楽平均値スペクトルＶｍ（ｆ,ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値を算出し、更に、除算値スペクトルＤｉｖ（ｆ,ｊ）の各値に対して、互いに対応する周波数ｊごとに聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｆ,ｊ）を作成するようにしても良い。 Further, the filter function creation processing 33 may output the filter function data 14 for each frame f. That is, the control unit 21 of the concealed data generation device 2 uses the value based on the maximum speech value spectrum Vv (j) as the divided value spectrum Div (f, j) as the music average value spectrum Vm (f , j) is calculated by dividing each frequency j corresponding to each other by a value based on, j), and for each value of the divided value spectrum Div (f, j), an auditory sensitivity correction curve for each frequency j corresponding to each other. The filter function F (f, j) may be created by multiplying a value based on L (j).

フィルタリング処理３４は、音楽データ５及びフィルタ関数データ１５を入力し、フレームｆごとに、秘匿化データ７を出力する。フィルタリング処理３４は、秘匿化データ生成装置２の制御部２１が、音楽データ５を所定の区間単位であるフレームｆに分割し、分割された各フレームｆをフーリエ変換し、フィルタ関数Ｆ（ｊ）を乗じ、フーリエ逆変換することによって、秘匿化データ７を生成する処理である。 The filtering process 34 receives the music data 5 and the filter function data 15 and outputs the concealment data 7 for each frame f. In the filtering process 34, the control unit 21 of the concealed data generation device 2 divides the music data 5 into frames f that are predetermined intervals, and performs Fourier transform on each of the divided frames f to obtain a filter function F (j). Is the process of generating the concealment data 7 by performing Fourier inverse transform.

以下では、周波数解析処理３２及びフィルタ関数作成処理３３が、それぞれ、フレームｆごとに、音楽平均値スペクトルＶｍ（ｆ,ｊ）及びフィルタ関数Ｆ（ｆ,ｊ）を作成する場合を例にして説明する。尚、この例を説明することによって、周波数解析処理３２及びフィルタ関数作成処理３３が、フレームｆごとではなく音楽平均値スペクトルＶｍ（ｊ）及びフィルタ関数Ｆ（ｊ）を作成する場合も説明されることは、言うまでもない。 In the following, a case where the frequency analysis process 32 and the filter function creation process 33 create the music average spectrum Vm (f, j) and the filter function F (f, j) for each frame f will be described as an example. To do. By explaining this example, the case where the frequency analysis process 32 and the filter function creation process 33 create the music average value spectrum Vm (j) and the filter function F (j) instead of every frame f will be explained. Needless to say.

図７、図８は、周波数解析処理を説明する図である。図７、図８に示すように、周波数解析処理３２は、（狭義の）周波数解析３２ａ、瞬時スペクトル算出処理４１、平均スペクトル算出処理４２を含む。 7 and 8 are diagrams for explaining the frequency analysis processing. As shown in FIGS. 7 and 8, the frequency analysis process 32 includes a (narrow sense) frequency analysis 32 a, an instantaneous spectrum calculation process 41, and an average spectrum calculation process 42.

最初に、音声データ４に対する周波数解析処理について説明する。
例えば、サンプリング周波数Ｆｓを「４４１００Ｈｚ」、サンプル数Ｎを「４０９６」とする。サンプリング周波数Ｆｓ及びサンプル数Ｎによって、音声データ４に含まれるフレーム数Ｆｖが定まる。
フレーム抽出処理３１では、秘匿化データ生成装置２の制御部２１が、サンプリング周波数Ｆｓのモノラル音声信号（ステレオの場合はＬＲ（左右）の合算値とする。）に対して、各々Ｎ／２サンプル間隔ごとに（すなわち、Ｎ／２サンプル分ずつ重複する。）、Ｎ個ずつ、各々Ｆｖフレーム抽出する。 First, frequency analysis processing for the audio data 4 will be described.
For example, the sampling frequency Fs is “44100 Hz”, and the number of samples N is “4096”. The number of frames Fv included in the audio data 4 is determined by the sampling frequency Fs and the number of samples N.
In the frame extraction process 31, the control unit 21 of the concealed data generation device 2 performs N / 2 samples on each of the monaural audio signal having the sampling frequency Fs (the sum of LR (left and right) in the case of stereo). For each interval (ie, overlap by N / 2 samples), each N Fv frames are extracted.

次に、周波数解析処理３２ａでは、制御部２１は、抽出したｆ番目のフレームデータＸｖ（ｆ、ｉ）（ｆ＝０、・・・、Ｆｖ−１；ｉ＝０、・・・、Ｎ−１）に対して、ハニング窓関数Ｈ（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）を用いてフーリエ変換を行う。
次に、制御部２１は、変換データの実部Ａｖ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｖ−１；ｊ＝０、・・・、Ｎ−１）、虚部Ｂｖ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｖ−１；ｊ＝０、・・・、Ｎ−１）及び強度値の時系列の最大値スペクトルＶｖ（ｊ）を各々、次式のように算出する。 Next, in the frequency analysis process 32a, the control unit 21 extracts the f-th frame data Xv (f, i) (f = 0,..., Fv-1; i = 0,. For 1), Fourier transformation is performed using the Hanning window function H (i) = 0.5−0.5 cos (2πi / N).
Next, the control unit 21 calculates the real part Av (f, j) (f = 0,..., Fv−1; j = 0,..., N−1) of the conversion data, and the imaginary part Bv (f , J) (f = 0,..., Fv-1; j = 0,..., N-1) and the time-series maximum value spectrum Vv (j) of the intensity values, respectively, as calculate.

図７には、音声フレームデータＸｖ（ｆ、ｉ）のフレーム１（ｆ＝０に対応）〜フレームＦ（ｆ＝Ｆｖ−１に対応）に対して、周波数解析３２ａが行われ、音声スペクトル１〜音声スペクトルＦが算出され、音声最大値スペクトルＶｖ（ｊ）が算出されることが図示されている。 In FIG. 7, the frequency analysis 32a is performed on the frame 1 (corresponding to f = 0) to the frame F (corresponding to f = Fv−1) of the audio frame data Xv (f, i). ~ The voice spectrum F is calculated and the voice maximum value spectrum Vv (j) is calculated.

次に、音楽データ５に対する周波数解析処理について説明する。
音声データ４と同様、サンプリング周波数Ｆｓを「４４１００Ｈｚ」、サンプル数Ｎを「４０９６」とする。サンプリング周波数Ｆｓ及びサンプル数Ｎによって、音楽データ５に含まれるフレーム数Ｆｍが定まる。
フレーム抽出処理３１では、秘匿化データ生成装置２の制御部２１が、サンプリング周波数Ｆｓのモノラル音楽信号（ステレオの場合はＬＲ（左右）の合算値とする。）に対して、各々Ｎ／２サンプル間隔ごとに（すなわち、Ｎ／２サンプル分ずつ重複する。）、Ｎ個ずつ、各々Ｆｍフレーム抽出する。 Next, frequency analysis processing for music data 5 will be described.
As with the audio data 4, the sampling frequency Fs is “44100 Hz” and the number of samples N is “4096”. The number of frames Fm included in the music data 5 is determined by the sampling frequency Fs and the number of samples N.
In the frame extraction process 31, the control unit 21 of the concealment data generation device 2 performs N / 2 samples for each monaural music signal having a sampling frequency Fs (the sum of LR (left and right) in the case of stereo). For each interval (ie, overlap by N / 2 samples), N Fm frames are extracted.

次に、周波数解析処理３２ａでは、制御部２１は、抽出したｆ番目のフレームデータＸｍ（ｆ、ｉ）（ｆ＝０、・・・、Ｆｍ−１；ｉ＝０、・・・、Ｎ−１）に対して、ハニング窓関数Ｈ（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）を用いてフーリエ変換を行う。
次に、制御部２１は、瞬時スペクトル算出処理４１として、フレームごとに、位相成分は無視したエネルギー量である瞬時スペクトルを算出する。また、制御部２１は、平均スペクトル算出処理４２として、前後Ｍフレーム（Ｍ個）の瞬時スペクトルの平均値である平均スペクトルを算出する。 Next, in the frequency analysis process 32a, the control unit 21 extracts the f-th frame data Xm (f, i) (f = 0,..., Fm−1; i = 0,. For 1), Fourier transformation is performed using the Hanning window function H (i) = 0.5−0.5 cos (2πi / N).
Next, as the instantaneous spectrum calculation process 41, the control unit 21 calculates an instantaneous spectrum that is an energy amount ignoring the phase component for each frame. Moreover, the control part 21 calculates the average spectrum which is an average value of the instantaneous spectrum of M frames (M pieces) before and after as an average spectrum calculation process 42.

具体的には、制御部２１は、変換データの実部Ａｍ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｍ−１；ｊ＝０、・・・、Ｎ−１）、虚部Ｂｍ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｍ−１；ｊ＝０、・・・、Ｎ−１）、及び、対象フレームを中点として前後Ｍ／２フレーム（Ｍ／２個）ずつ、合計Ｍフレーム（Ｍ個）（Ｍ＜Ｆｍ）の平均値スペクトルＶｍ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｍ−１；ｊ＝０、・・・、Ｎ／２）を各々、次式のように算出する。
但し、音楽データ５の先頭部、すなわち、ｆ＜Ｍ／２の場合、前後Ｍ／２フレーム（Ｍ／２個）ずつの平均を取ることができないことから、Ｖｍ（ｆ、ｊ）＝Ｖｍ（Ｍ／２、ｊ）とする。同様に、音楽データ５の後尾部、すなわち、ｆ＞Ｆｍ−Ｍ／２の場合、前後Ｍ／２フレーム（Ｍ／２個）ずつの平均を取ることができないことから、Ｖｍ（ｆ、ｊ）＝Ｖｍ（Ｆｍ−Ｍ／２−１、ｊ）とする。 Specifically, the control unit 21 calculates the real part Am (f, j) (f = 0,..., Fm−1; j = 0,..., N−1) of the converted data, and the imaginary part Bm. (F, j) (f = 0,..., Fm-1; j = 0,..., N-1) and M / 2 frames before and after the target frame as a midpoint (M / 2) Average spectrum Vm (f, j) (f = 0,..., Fm-1; j = 0,..., N / 2) of M frames (M) (M <Fm) in total. Each is calculated as follows:
However, in the case of the top portion of the music data 5, that is, when f <M / 2, it is impossible to take the average of M / 2 frames before and after (M / 2 pieces), so Vm (f, j) = Vm ( M / 2, j). Similarly, in the case of the rear part of the music data 5, that is, when f> Fm−M / 2, it is impossible to take the average of M / 2 frames before and after (M / 2 pieces), so Vm (f, j) = Vm (Fm-M / 2-1, j).

図７には、一例として、音楽データ５のフレームｆとフレームｆ＋１に対する周波数解析処理が示されている。
図７には、音楽フレームデータＸｍ（ｆ、ｉ）のフレーム１〜フレームＭ＋１に対して、周波数解析３２ａが行われ、フレーム１〜フレームＭまでの時系列平均が算出され、フレームｆに対する音楽平均値スペクトルＶｍ（ｆ、ｊ）が算出されることが図示されている。同様に、図７には、フレーム２〜フレームＭ＋１までの時系列平均が算出され、フレームｆ＋１に対する音楽平均値スペクトルＶｍ（ｆ＋１、ｊ）が算出されることが図示されている。 FIG. 7 shows frequency analysis processing for the frame f and the frame f + 1 of the music data 5 as an example.
In FIG. 7, the frequency analysis 32a is performed on the frames 1 to M + 1 of the music frame data Xm (f, i), the time series average from the frames 1 to M is calculated, and the music average for the frame f is calculated. It is illustrated that a value spectrum Vm (f, j) is calculated. Similarly, FIG. 7 illustrates that the time series average from frame 2 to frame M + 1 is calculated, and the music average value spectrum Vm (f + 1, j) for frame f + 1 is calculated.

また、図８には、図７の補足的な説明として、音楽データ５を入力とし、瞬時スペクトル算出処理４１によって、フレームごとに瞬時スペクトルが算出されることが図示されている。また、平均スペクトル算出処理４２によって、処理対象のフレームに対して、前後Ｍフレーム（Ｍ個）の瞬時スペクトルの平均値が算出され、平均値スペクトルに置換され、音楽平均値スペクトルデータ１３が出力されることが図示されている。 In addition, FIG. 8 illustrates, as a supplementary explanation of FIG. 7, that the instantaneous spectrum is calculated for each frame by the instantaneous spectrum calculation process 41 using the music data 5 as an input. Also, the average spectrum calculation process 42 calculates the average value of the instantaneous spectrum of the preceding and following M frames (M) for the frame to be processed, replaces it with the average value spectrum, and outputs the music average value spectrum data 13. It is shown in the figure.

図９〜図１１は、フィルタ関数作成処理を説明する図である。フィルタ関数作成処理３３は、図９に示す臨界帯域幅補正処理４３、図１０に示す除算処理４４、並びに、図１１に示す聴覚感度補正処理４５及び平滑化処理４６を含む。 9 to 11 are diagrams illustrating the filter function creation process. The filter function creation process 33 includes a critical bandwidth correction process 43 shown in FIG. 9, a division process 44 shown in FIG. 10, and an auditory sensitivity correction process 45 and a smoothing process 46 shown in FIG. 11.

まず、図９を参照して臨界帯域幅補正処理４３について説明する。
臨界帯域幅補正処理４３は、秘匿化データ生成装置２の制御部２１が、音声最大値スペクトルＶｖ（ｊ）を、周波数ｊごとに所定の範囲内の最大値に置換することによって、単一の置換音声最大値スペクトルＶｖ’（ｊ）を作成する処理である。また、臨界帯域幅補正処理４３は、フレームｆごとに、音楽平均値スペクトルＶｍ（ｆ、ｊ）を、周波数ｊごとに所定の範囲内の平均値に置換することによって、置換音楽平均値スペクトルＶｍ’（ｆ、ｊ）を作成する処理である。
図９には、一例として、フレームｆとフレームｆ＋１に対する臨界帯域幅補正処理が示されている。 First, the critical bandwidth correction processing 43 will be described with reference to FIG.
In the critical bandwidth correction processing 43, the control unit 21 of the concealed data generation device 2 replaces the voice maximum value spectrum Vv (j) with a maximum value within a predetermined range for each frequency j. This is a process for creating a replacement speech maximum value spectrum Vv ′ (j). Also, the critical bandwidth correction processing 43 replaces the music average value spectrum Vm (f, j) for each frame f with an average value within a predetermined range for each frequency j, thereby replacing the replacement music average value spectrum Vm. This is a process for creating '(f, j).
FIG. 9 shows critical bandwidth correction processing for frame f and frame f + 1 as an example.

臨界帯域幅とは、ある周波数ｊの周波数成分Ｖｖ（ｊ）またはＶｍ（ｆ、ｊ）を中心にマスキングが及ぶ周波数の範囲（臨界帯域幅、Ｂａｒｋと呼ばれる。）である。臨界帯域幅の近似式としては、次式に示すＥ．Ｚｗｉｃｋｅｒの式が知られている。尚、一般に、周波数が高くなると、臨界帯域幅は広くなることが分かっている。 The critical bandwidth is a frequency range (referred to as a critical bandwidth, Bark) over which masking extends around a frequency component Vv (j) or Vm (f, j) of a certain frequency j. As an approximate expression of the critical bandwidth, E.I. The Zwicker equation is known. In general, it has been found that the critical bandwidth increases as the frequency increases.

式（７）におけるｆｒの単位も「Ｈｚ」である。ｆｒとＢｚ（ｆｒ）を本実施の形態におけるフーリエ変換のポイント数の次元に変換すると、次式となる。 The unit of fr in Equation (7) is also “Hz”. When fr and Bz (fr) are converted into the dimension of the number of points of Fourier transform in the present embodiment, the following expression is obtained.

臨界帯域幅補正処理４３では、秘匿化データ生成装置２の制御部２１は、音声信号スペクトルに対して、周波数ｊごとに周波数成分Ｖｖ（ｊ）をｊｃ＝ｊ−（１−α）×Ｂｚ（ｊ）からｊｃ＝ｊ＋α×Ｂｚ（ｊ）の範囲の最大値に置換する。即ち、制御部２１は、ｊ＝０、・・・、Ｎ／２に対して、置換後のスペクトル（置換音声最大値スペクトル）Ｖｖ’（ｊ）を次式のように算出する。 In the critical bandwidth correction processing 43, the control unit 21 of the concealed data generation device 2 converts the frequency component Vv (j) to jc = j− (1−α) × Bz () for each frequency j with respect to the audio signal spectrum. j) to jc = j + α × Bz (j). That is, the control unit 21 calculates the replaced spectrum (replacement speech maximum value spectrum) Vv ′ (j) for j = 0,..., N / 2 as the following equation.

αは０から１までの実数であり、通常はα＝１．０とする。式（９）によって、音声スペクトルを周波数方向に低音側に非線形シフトする補正を行っていることになる。
マスキングは、高音側（周波数が高域側）に働きやすいという性質がある為、音声最大値スペクトルＶｖ（ｊ）を、周波数ｊよりも高域側の範囲内の最大値に置換すれば、音声スペクトルを周波数方向に低音側に非線形シフトする補正を行っていることになり、ひいては、マスキング効果を高めることができる。 α is a real number from 0 to 1, and usually α = 1.0. According to Expression (9), correction is performed to nonlinearly shift the voice spectrum in the frequency direction to the low frequency side.
Since the masking has a property that it tends to work on the high sound side (frequency is on the high frequency side), if the sound maximum spectrum Vv (j) is replaced with the maximum value within the range on the high frequency side than the frequency j, the sound The correction is performed to nonlinearly shift the spectrum to the low frequency side in the frequency direction, and as a result, the masking effect can be enhanced.

一方、音楽信号スペクトルに対しては、制御部２１は、フレームｆごとに処理を行い、周波数ｊごとに周波数成分Ｖｍ（ｆ、ｊ）をｊｃ＝ｊ−０．５×Ｂｚ（ｊ）からｊｃ＝ｊ＋０．５×Ｂｚ（ｊ）の範囲の平均値に置換する。即ち、制御部２１は、ｊ＝０、・・・、Ｎ／２に対して、置換後のスペクトル（置換音楽平均値スペクトル）Ｖｍ’（ｆ、ｊ）を次式のように算出する。 On the other hand, for the music signal spectrum, the control unit 21 performs processing for each frame f, and changes the frequency component Vm (f, j) for each frequency j from jc = j−0.5 × Bz (j) to jc. = J + 0.5 × Bz (j) Replace with an average value. That is, the control unit 21 calculates the replaced spectrum (replacement music average value spectrum) Vm ′ (f, j) for j = 0,..., N / 2 as follows:

式（１０）によって、音楽スペクトルを周波数方向に平滑化をかけていることになる。 According to the equation (10), the music spectrum is smoothed in the frequency direction.

図９では、Ｗ（ｊ）が、置換の際の計算範囲を示している。音声最大値スペクトルＶｖ（ｊ）に対して、単一の置換音声最大値スペクトルＶｖ’（ｊ）が算出されることが図示されている。また、音楽平均値スペクトルＶｍ（ｆ、ｊ）に対しては、置換音楽平均値スペクトルＶｍ’（ｆ、ｊ）が算出され、音楽平均値スペクトルＶｍ（ｆ＋１、ｊ）に対しては、置換音楽平均値スペクトルＶｍ’（ｆ＋１、ｊ）が算出されることが図示されている。 In FIG. 9, W (j) represents the calculation range at the time of replacement. It is illustrated that a single replacement speech maximum value spectrum Vv ′ (j) is calculated for the speech maximum value spectrum Vv (j). A replacement music average value spectrum Vm ′ (f, j) is calculated for the music average value spectrum Vm (f, j), and a replacement music is calculated for the music average value spectrum Vm (f + 1, j). It is shown that the average value spectrum Vm ′ (f + 1, j) is calculated.

次に、図１０を参照して、除算処理４４について説明する。
除算処理４４は、秘匿化データ生成装置２の制御部２１が、フレームｆごとに、音声最大値スペクトルＶｖ（ｊ）に基づく値を音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値を除算値スペクトルＤｉｖ（ｆ、ｊ）として算出する処理である。特に、制御部２１は、フレームｆごとに、置換音声最大値スペクトルＶｖ’（ｊ）を置換音楽平均値スペクトルＶｍ’（ｆ、ｊ）によって除した値を除算値スペクトルＤｉｖ（ｆ、ｊ）とすることが望ましい。
図１０には、一例として、フレームｆとフレームｆ＋１に対する除算処理が示されている。 Next, the division process 44 will be described with reference to FIG.
In the division processing 44, the control unit 21 of the concealed data generation device 2 uses a frequency based on a value based on the audio maximum value spectrum Vv (j) for each frame f by a value based on the music average value spectrum Vm (j). This is a process of calculating a value divided for each j as a divided value spectrum Div (f, j). In particular, for each frame f, the control unit 21 obtains a value obtained by dividing the replacement speech maximum value spectrum Vv ′ (j) by the replacement music average value spectrum Vm ′ (f, j) as a divided value spectrum Div (f, j). It is desirable to do.
FIG. 10 shows division processing for frame f and frame f + 1 as an example.

次に、図１１を参照して、聴覚感度補正処理４５及び平滑化処理４６について説明する。
聴覚感度補正処理４５は、除算値スペクトルＤｉｖ（ｆ,ｊ）の各値に対して、互いに対応する周波数ｊごとに聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、補正除算値スペクトルＤｉｖ’（ｆ、ｊ）を作成する処理である。
図１１には、一例として、フレームｆとフレームｆ＋１に対する聴覚感度補正処理が示されている。 Next, the auditory sensitivity correction process 45 and the smoothing process 46 will be described with reference to FIG.
The auditory sensitivity correction processing 45 multiplies each value of the divided value spectrum Div (f, j) by a value based on the auditory sensitivity correction curve L (j) for each corresponding frequency j, thereby correcting the divided value. This is a process for creating a spectrum Div ′ (f, j).
FIG. 11 shows an auditory sensitivity correction process for frame f and frame f + 1 as an example.

具体的には、制御部２１は、周波数（ｊ＝０，．．，Ｎ／２）ごとに除算値スペクトルＤｉｖ（ｆ,ｊ）の各値に対して、例えば、４０ｐｈｏｎの等ラウドネス曲線に基づいて定義される聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算する。
例えば、制御部２１は、図４の下段に示す値を変数ｄＢに代入し、倍率値として１０^{ｄＢ／２０}を算出し、この倍率値を乗算する。 Specifically, the control unit 21 is based on, for example, a 40 phon equal loudness curve for each value of the division value spectrum Div (f, j) for each frequency (j = 0,..., N / 2). Is multiplied by a value based on the auditory sensitivity correction curve L (j) defined as follows.
For example, the control unit 21 substitutes the value shown in the lower part of FIG. 4 for the variable dB, calculates 10 ^{dB / 20} as the magnification value, and multiplies this magnification value.

聴覚感度補正曲線Ｌ（ｊ）は、図４に示す例のように、“聴覚感度補正曲線の音圧レベル＝等ラウドネス曲線の音圧レベルの５００Ｈｚ極小値−等ラウドネス曲線の音圧レベル”によって求めても良い。
また、聴覚感度補正曲線Ｌ（ｊ）は、等ラウドネス曲線をマイナス側に適宜オフセットを加えて、符号を反転させるようにしても良い。オフセットを加える理由は、単に符号を反転すると、波形振幅が増幅されてしまうからである。
尚、制御部２１は、聴覚感度補正曲線Ｌ（ｊ）を複数回乗算しても良い。 As shown in the example of FIG. 4, the auditory sensitivity correction curve L (j) is expressed by “Sound pressure level of auditory sensitivity correction curve = 500 Hz minimum value of sound pressure level of equal loudness curve−Sound pressure level of equal loudness curve”. You may ask.
Further, the sign of the auditory sensitivity correction curve L (j) may be inverted by appropriately offsetting the equal loudness curve to the minus side. The reason for adding the offset is that simply inverting the sign will amplify the waveform amplitude.
The control unit 21 may multiply the auditory sensitivity correction curve L (j) a plurality of times.

人間の聴覚器官内では、音声や音楽に対して、聴覚感度補正処理４５と同様の処理が行われると考えられる。従って、制御部２１が聴覚感度補正曲線Ｌ（ｊ）を１回乗算することによって生成した秘匿化データ７は、これを聴取する人間の聴覚器官内において聴覚感度補正曲線Ｌ（ｊ）が１回乗算されると考えられる。すなわち、合計すると、秘匿化データ７は、聴覚感度補正曲線Ｌ（ｊ）が２回乗算されて人間に聴取されると考えられる。一方、秘匿化対象の対話音声も聴覚器官内において聴覚感度補正曲線Ｌ（ｊ）が１回乗算されると考えられるため、秘匿化データ７は１回分の余分な乗算により対話音声に対して優位に働くことになる。 In the human auditory organ, it is considered that processing similar to the auditory sensitivity correction processing 45 is performed on speech and music. Therefore, the concealment data 7 generated by the control unit 21 multiplying the auditory sensitivity correction curve L (j) once, the auditory sensitivity correction curve L (j) is once in the human auditory organ that listens to the data. It is considered to be multiplied. That is, in total, the concealment data 7 is considered to be heard by a human being multiplied by the auditory sensitivity correction curve L (j) twice. On the other hand, since the conversational speech to be concealed is also considered to be multiplied by the auditory sensitivity correction curve L (j) once in the auditory organ, the concealment data 7 is superior to the conversational speech by one extra multiplication. Will work.

また、平滑化処理４６は、秘匿化データ生成装置２の制御部２１が、補正除算値スペクトルＤｉｖ’（ｆ、ｊ）を、周波数ｊの前後の範囲内の平均値に置換することによって、補正除算値スペクトルＤｉｖ’（ｆ、ｊ）を平滑化する処理である。
図１１には、一例として、フレームｆとフレームｆ＋１に対する平滑化処理が示されている。 Further, the smoothing process 46 is performed by the control unit 21 of the concealed data generation device 2 replacing the corrected division value spectrum Div ′ (f, j) with an average value within a range before and after the frequency j. This is a process of smoothing the division value spectrum Div ′ (f, j).
FIG. 11 shows a smoothing process for the frame f and the frame f + 1 as an example.

具体的には、制御部２１は、周波数（ｊ＝０、・・・、Ｎ／２）ごとに、補正除算値スペクトルＤｉｖ’（ｆ、ｊ）に対して、所定のタップ数Ｔ（＜Ｎ／２）によって、次式のように、平滑フィルタをかけた結果をＦ（ｆ、ｊ）とする。 Specifically, the control unit 21 performs, for each frequency (j = 0,..., N / 2), a predetermined tap number T (<N) with respect to the corrected division value spectrum Div ′ (f, j). / 2), let F (f, j) be the result of applying a smoothing filter as in the following equation.

βは、音圧を調整するための比例定数（実数値）である。音声信号の音圧と音楽信号の音圧を同程度とする場合、β＝１．０とする。
Ｆ（ｆ、ｊ）の上限値と下限値は予め設定しておく。例えば、中央値を１とすると、上限値を１０倍の「１０」、下限値を１／１０の「０．１」とする。除算結果が上限値を上回る場合、又は、下限値を下回る場合、制御部２１は、それぞれ、Ｆ（ｆ、ｊ）に上限値又は下限値を設定する。 β is a proportionality constant (real value) for adjusting the sound pressure. If the sound pressure of the audio signal is the same as the sound pressure of the music signal, β = 1.0.
An upper limit value and a lower limit value of F (f, j) are set in advance. For example, if the median value is 1, the upper limit value is 10 times “10” and the lower limit value is 1/10 “0.1”. When the division result exceeds the upper limit value or falls below the lower limit value, the control unit 21 sets an upper limit value or a lower limit value for F (f, j), respectively.

図１１に示すように、補正除算値スペクトルＤｉｖ’（ｆ、ｊ）は、極値（極大値及び極小値）を数多く持つ関数となっている。特に、ところどころ０で割り算する箇所が発生してしまい、その箇所では上限値をもつ極値になり不連続点になる。補正除算値スペクトルＤｉｖ’（ｆ、ｊ）をそのままフィルタ関数とすると、人間にとって聞き苦しい秘匿化データ７が生成されてしまう。そこで、本発明の実施の形態では、平滑化処理４６を行っている。
図１１に示すように、平滑化処理４６を行うことで、フィルタ関数Ｆ（ｆ、ｊ）は、極値が少なく、滑らかな関数となっている。 As shown in FIG. 11, the corrected division value spectrum Div ′ (f, j) is a function having many extreme values (maximum values and minimum values). In particular, there are places where division by 0 occurs in some places, and at those places, extreme values having an upper limit value become discontinuous points. If the corrected division value spectrum Div ′ (f, j) is directly used as a filter function, concealment data 7 that is difficult to hear for humans is generated. Therefore, in the embodiment of the present invention, the smoothing process 46 is performed.
As shown in FIG. 11, by performing the smoothing process 46, the filter function F (f, j) is a smooth function with few extreme values.

尚、平滑化処理４６の後に聴覚感度補正処理４５を行うよりも、聴覚感度補正処理４５の後に平滑化処理４６を行う方が、人間にとってより聞き易い秘匿化データ７を生成することができる。 Note that the concealment data 7 that is easier to hear for humans can be generated by performing the smoothing process 46 after the auditory sensitivity correction process 45 than performing the auditory sensitivity correction process 45 after the smoothing process 46.

図１２〜図１４は、フィルタリング処理を説明する図である。フィルタリング処理３４は、図１２に示すフーリエ変換処理４７及びフィルタ関数乗算処理４８、並びに、図１３に示す周波数次元圧縮処理４９及びフーリエ逆変換処理５０を含む。
前述の周波数解析処理３２及びフィルタ関数作成処理３３では、実数値に対して計算を行っているが、フィルタリング処理３４では、複素数値をもつ瞬時スペクトルに対して計算を行う。 12-14 is a figure explaining a filtering process. The filtering process 34 includes a Fourier transform process 47 and a filter function multiplication process 48 shown in FIG. 12, and a frequency dimension compression process 49 and a Fourier inverse transform process 50 shown in FIG.
In the frequency analysis process 32 and the filter function creation process 33 described above, calculation is performed on a real value, but in the filtering process 34, calculation is performed on an instantaneous spectrum having a complex value.

フーリエ変換処理４７は、秘匿化データ生成装置２の制御部２１が、音楽フレームデータＸｍｌ（ｆ、ｉ）及びＸｍｒ（ｆ、ｉ）（ｆ＝０、・・・、Ｆｍ−１；ｉ＝０、・・・、Ｎ−１）をフーリエ変換し、ソース複素スペクトルを算出する処理である。
図１２には、一例として、フレームｆとフレームｆ＋１に対するフーリエ変換処理が示されている。 In the Fourier transform processing 47, the control unit 21 of the concealment data generation device 2 performs music frame data Xml (f, i) and Xmr (f, i) (f = 0,..., Fm−1; i = 0 ,..., N-1) is Fourier transformed to calculate a source complex spectrum.
FIG. 12 shows, as an example, Fourier transform processing for frame f and frame f + 1.

フーリエ変換処理４７では、制御部２１は、サンプリング周波数Ｆｓのステレオ音声信号（モノラル信号の場合は一方を０とする。）に対して、各々Ｎ／２サンプル間隔ごとに（すなわち、Ｎ／２サンプル分ずつ重複する。）、Ｎ個ずつ、各々Ｆｍフレーム抽出したｆ番目の音楽フレームデータＸｍｌ（ｆ、ｉ）及びＸｍｒ（ｆ、ｉ）に対して、ハニング窓関数Ｈ（ｉ）＝０．５−０．５ｃｏｓ（２πｉ／Ｎ）を用いてフーリエ変換を行い、以下のように、変換データの実部Ａｍｌ（ｆ、ｊ）及びＡｍｒ（ｆ、ｊ）、並びに、虚部Ｂｍｌ（ｆ、ｊ）及びＢｍｒ（ｆ、ｊ）（ｆ＝０、・・・、Ｆｍ）−１；ｊ＝０、・・・、Ｎ−１）を算出する。 In the Fourier transform processing 47, the control unit 21 performs a sampling frequency Fs stereo audio signal (one is set to 0 in the case of a monaural signal) at each N / 2 sample interval (that is, N / 2 samples). And Hanning window function H (i) = 0.5 with respect to f-th music frame data Xml (f, i) and Xmr (f, i) extracted by N Fm frames each. Fourier transform is performed using −0.5 cos (2πi / N), and real parts Aml (f, j) and Amr (f, j) and imaginary part Bml (f, j) of the transformed data are obtained as follows. ) And Bmr (f, j) (f = 0,..., Fm) −1; j = 0,.

また、フィルタ関数乗算処理４８は、制御部２１が、ソース複素スペクトルにフィルタ関数Ｆ（ｆ、ｊ）を乗じ、改変複素スペクトルを算出する処理である。
図１２には、一例として、フレームｆとフレームｆ＋１に対するフィルタ関数乗算処理が示されている。 The filter function multiplication process 48 is a process in which the control unit 21 calculates a modified complex spectrum by multiplying the source complex spectrum by the filter function F (f, j).
FIG. 12 shows filter function multiplication processing for frame f and frame f + 1 as an example.

フィルタ関数乗算処理４８では、制御部２１は、Ｆｍ個のフィルタ関数Ｆ（ｆ、ｊ）を用いて、フレームｆごとに所定の周波数区間［ｊ１、ｊ２］の全ての周波数成分に乗算する。即ち、制御部２１は、各フレームｆ＝０、・・・、Ｆｍ−１、及び、各周波数ｊ＝ｊ１、・・・、ｊ２において、次式のように変換を行う。 In the filter function multiplication process 48, the control unit 21 multiplies all frequency components in a predetermined frequency section [j1, j2] for each frame f by using Fm filter functions F (f, j). That is, the control unit 21 performs conversion according to the following expression in each frame f = 0,..., Fm−1 and each frequency j = j1,.

次に、図１３を参照して、周波数次元圧縮処理４９及びフーリエ逆変換処理５０について説明する。
周波数次元圧縮処理４９は、フレームｆごとに、フィルタ関数Ｆ（ｊ）が乗算された複素スペクトルに対して、所定の周波数の範囲の中で複素スペクトルの最大スカラー値を求め、更に、複素スペクトルの各要素に対して、当該要素のスカラー値が最大スカラー値を超えない範囲内において所定の１以上のスケール値を乗算させる補正を施し、再改変複素スペクトルを算出する処理である。
図１３には、一例として、フレームｆとフレームｆ＋１に対する周波数次元圧縮処理が示されている。 Next, the frequency dimension compression process 49 and the inverse Fourier transform process 50 will be described with reference to FIG.
For each frame f, the frequency dimension compression processing 49 obtains a maximum scalar value of the complex spectrum within a predetermined frequency range for the complex spectrum multiplied by the filter function F (j). This is a process of calculating a re-modified complex spectrum by applying correction for multiplying each element by a predetermined scale value within a range in which the scalar value of the element does not exceed the maximum scalar value.
FIG. 13 shows frequency dimension compression processing for frame f and frame f + 1 as an example.

周波数次元圧縮処理４９では、制御部２１は、フレームｆごとに、フィルタ関数Ｆ（ｆ,ｊ）が乗算された複素スペクトル成分に対して、ｊ＝ｊ１，・・・，ｊ２の範囲の中で、スカラー値｛Ａｍｌ’（ｆ,ｊ）^２＋Ｂｍｌ’（ｆ,ｊ）^２｝^１／２を最大にする値、及び、スカラー値｛Ａｍｒ’（ｆ,ｊ）^２＋Ｂｍｒ’（ｆ,ｊ）^２｝^１／２を最大にする値を、ＬＲチャンネル別に、Ｍｍｌ（ｆ）、及び、Ｍｍｒ（ｆ）として算出する。そして、制御部２１は、次式のように、１以上の実数値Ｓｃｌ（例えば、Ｓｃｌ＝２．０）を乗算する。 In the frequency dimension compression processing 49, the control unit 21 applies the complex spectrum component multiplied by the filter function F (f, j) for each frame f within the range of j = j1,..., J2. scalar values ^{{Aml '(f, j)} 2 + Bml' (f, j) 2} 1/2 the value of the maximum, and the scalar value ^{{Amr '(f, j)} 2 + Bmr' (f, j) ² } Values that maximize ^1/2 are calculated as Mml (f) and Mmr (f) for each LR channel. And the control part 21 multiplies 1 or more real value Scl (for example, Scl = 2.0) like following Formula.

式（２０）〜（２３）によって乗算された結果のスカラー値が、｛Ａｍｌ’’（ｆ,ｊ）^２＋Ｂｍｌ’’（ｆ,ｊ）^２｝^１／２＞Ｍｍｌ（ｆ）、又は、｛Ａｍｒ’’（ｆ,ｊ）^２＋Ｂｍｒ’’（ｆ,ｊ）^２｝^１／２＞Ｍｍｒ（ｆ）となる場合、以下のようにＭｍｌ（ｆ）及びＭｍｒ（ｆ）を越えないようにＳｃｌを補正して乗算する。 The resulting scalar value multiplied by equations (20)-(23) is {Aml ″ (f, j) ² + Bml ″ (f, j) ² } ^1/2 > Mml (f), or { When Amr ″ (f, j) ² + Bmr ″ (f, j) ² } ^1/2 > Mmr (f), Scl should not exceed Mml (f) and Mmr (f) as follows: Correct and multiply.

｛Ａｍｌ’’（ｆ,ｊ）^２＋Ｂｍｌ’’（ｆ,ｊ）^２｝^１／２＞Ｍｍｌ（ｆ）の場合、制御部２１は、Ｓｃｌ’＝Ｍｍｌ（ｆ）／｛Ａｍｌ’（ｆ,ｊ）^２＋Ｂｍｌ’（ｆ,ｊ）^２｝^１／２を算出する。そして、制御部２１は、次式の通り、Ｓｃｌ’を乗算する。 When {Aml ″ (f, j) ² + Bml ″ (f, j) ² } ^1/2 > Mml (f), the control unit 21 determines that Scl ′ = Mml (f) / {Aml ′ (f, j) Calculate ² + Bml ′ (f, j) ² } ^1/2 . And the control part 21 multiplies Scl 'as following Formula.

同様に、｛Ａｍｒ’’（ｆ,ｊ）^２＋Ｂｍｒ’’（ｆ,ｊ）^２｝^１／２＞Ｍｍｒ（ｆ）の場合、制御部２１は、Ｓｃｌ’＝Ｍｍｒ（ｆ）／｛Ａｍｒ’（ｆ,ｊ）^２＋Ｂｍｒ’（ｆ,ｊ）^２｝^１／２を算出する。そして、制御部２１は、次式の通り、Ｓｃｌ’を乗算する。 Similarly, when {Amr ″ (f, j) ² + Bmr ″ (f, j) ² } ^1/2 > Mmr (f), the control unit 21 determines that Scl ′ = Mmr (f) / {Amr ′. (F, j) ² + Bmr ′ (f, j) ² } ^1/2 is calculated. And the control part 21 multiplies Scl 'as following Formula.

ここで、周波数次元圧縮処理４９の意義について説明する。
聴覚感度補正処理４５を行ってフィルタ関数を作成すると、そのフィルタ関数を用いてフィルタリング処理が行われた音楽信号は、スペクトル特性が１／ｆから１／ｆ^２特性に近づき、低域部の勾配が急峻になる。音楽信号は元々離散的な周波数特性をもつが、このようにフィルタリング処理が行われた音楽信号は、最も効果的にマスキングが働く白色雑音の特性から程遠いことになる。
一方、例えば、特開２０１０−０３１５０１において提案されているエアコンノイズは、１／ｆカーブの連続スペクトル特性をもち、平坦な特性をもつ白色ノイズに比べマスキング効果は若干小さくなる程度である。しかし、音楽信号を連続的なスペクトルに変換させるとノイジーな不快感を加えることになり、音楽ではなくなってしまう。
そこで、本発明の実施の形態では、周波数次元圧縮処理４９によって、離散的なスペクトル特性の状態を維持したまま、低域部を白色雑音のように若干平坦に近づける。これによって、音楽の音色を維持したまま、マスキング効果を高めることができる。 Here, the significance of the frequency dimension compression processing 49 will be described.
When the filter function is generated by performing the auditory sensitivity correction process 45, the music signal subjected to the filtering process using the filter function has a spectral characteristic that approaches 1 / f to 1 / f ² characteristic, and a low-frequency gradient. Becomes steep. The music signal originally has a discrete frequency characteristic, but the music signal that has been subjected to the filtering process as described above is far from the characteristic of white noise on which masking works most effectively.
On the other hand, for example, the air conditioner noise proposed in Japanese Patent Laid-Open No. 2010-031501 has a continuous spectrum characteristic of 1 / f curve, and the masking effect is slightly smaller than white noise having a flat characteristic. However, converting a music signal into a continuous spectrum adds noisy discomfort and is no longer music.
Therefore, in the embodiment of the present invention, the low-frequency part is made slightly flat like white noise while maintaining the state of discrete spectral characteristics by the frequency dimension compression processing 49. As a result, the masking effect can be enhanced while maintaining the timbre of the music.

図１４では、一般的な圧縮処理である時間次元圧縮処理と、本発明の実施の形態における周波数次元圧縮処理４９との作用の違いを模式的に示している。
時間次元圧縮処理が施されると、全体的に音圧が大きくなり、時間的起伏が少なくなる。つまり、時間次元圧縮処理を施すことによって生成される秘匿化データ７は、人間にとって煩わしく感じるものとなる。また、周波数特性には大きな変化が無いため、全体的にマスキング効果の増大はあまり期待できない。
一方、周波数次元圧縮処理４９が施されると、フラットな白色雑音特性が増える。また、時間的な振幅変化は維持される。つまり周波数次元圧縮処理４９を施すことによって生成される秘匿化データ７は、マスキング効果が高まると共に、人間にとって煩わしく感じることはない。 FIG. 14 schematically shows the difference in operation between the time-dimensional compression process, which is a general compression process, and the frequency-dimensional compression process 49 in the embodiment of the present invention.
When the time-dimensional compression process is performed, the sound pressure increases as a whole, and the time undulation is reduced. That is, the concealment data 7 generated by performing the time-dimensional compression process feels troublesome for humans. In addition, since there is no significant change in the frequency characteristics, an overall increase in masking effect cannot be expected.
On the other hand, when the frequency dimension compression processing 49 is performed, flat white noise characteristics increase. Further, the temporal amplitude change is maintained. That is, the concealment data 7 generated by performing the frequency dimension compression processing 49 has a high masking effect and does not feel troublesome for humans.

図１３の説明に戻る。フーリエ逆変換処理５０は、制御部２１が、周波数次元圧縮処理４９によって算出される再改変複素スペクトルのフーリエ逆変換を行い、秘匿化フレームデータＸｍｌ’（ｆ、ｉ）及びＸｍｒ’（ｆ、ｉ）（ｆ＝０、・・・、Ｆｍ−１；ｉ＝０、・・・、Ｎ−１）を算出する処理である。 Returning to the description of FIG. In the inverse Fourier transform process 50, the control unit 21 performs an inverse Fourier transform of the re-modified complex spectrum calculated by the frequency dimension compression process 49, and the concealed frame data Xml ′ (f, i) and Xmr ′ (f, i ) (F = 0,..., Fm-1; i = 0,..., N-1).

各フレームｆのＡｍl’（ｆ、ｊ）、Ｂｍl’（ｆ、ｊ）、Ａｍｒ’（ｆ、ｊ）、Ｂｍｒ’（ｆ、ｊ）の各要素に対して周波数次元圧縮処理４９の結果を各々Ａｍl’’（ｆ、ｊ）、Ｂｍl’’（ｆ、ｊ）、Ａｍｒ’’（ｆ、ｊ）、Ｂｍｒ’’（ｆ、ｊ）とする。
フーリエ逆変換処理５０では、制御部２１は、変換対象のフレームｆの秘匿化フレームデータＸｍｌ’（ｆ、ｉ）及びＸｍｒ’（ｆ、ｉ）に対して、直前に変換されたフレームｆ−１の秘匿化フレームデータＸｍｌ’（ｆ−１、ｉ）及びＸｍｒ’（ｆ−１、ｉ）が存在する場合、両者が時間軸においてＮ／２サンプル分重複することを考慮し、次式のように計算を行う。 The result of the frequency dimension compression processing 49 for each element of Am1 ′ (f, j), Bml ′ (f, j), Amr ′ (f, j), Bmr ′ (f, j) of each frame f Aml ″ (f, j), Bml ″ (f, j), Amr ″ (f, j), Bmr ″ (f, j).
In the inverse Fourier transform process 50, the control unit 21 converts the frame f−1 converted immediately before the concealment frame data Xml ′ (f, i) and Xmr ′ (f, i) of the frame f to be converted. When there is the concealment frame data Xml ′ (f−1, i) and Xmr ′ (f−1, i), the following equation is taken into consideration that both overlap by N / 2 samples on the time axis: To calculate.

以上、本発明の実施の形態における秘匿化データ生成処理について説明したが、本発明の実施の形態におけるフィルタ関数作成処理３３では、音声最大値スペクトルＶｖ（ｊ）に基づく値を、音楽平均値スペクトルＶｍ（ｊ）に基づく値によって互いに対応する周波数ｊごとに除した値である除算値スペクトルＤｉｖ（ｊ）を算出し、更に、除算値スペクトルＤｉｖ（ｊ）の各値に対して、互いに対応する周波数ｊごとにヒト聴覚感度の重みを定義した聴覚感度補正曲線Ｌ（ｊ）に基づく値を乗算することにより、フィルタ関数Ｆ（ｊ）を作成する。これによって、音声信号に対するマスキング効果を高めつつ、再生される音楽の音色を原音と同等に維持し、音量を絞って再生しても所定のマスキング効果を働かせることができる。
また、本発明の実施の形態におけるフィルタリング処理３４では、フレームｆごとに、フィルタ関数Ｆ（ｊ）が乗算された複素スペクトルに対して、所定の周波数の範囲の中で複素スペクトルの最大スカラー値を求め、更に、複素スペクトルの各要素に対して、当該要素のスカラー値が前記最大スカラー値を超えない範囲内において所定の１以上のスケール値を乗算させる補正を施した後、フーリエ逆変換を行う。これによって、音楽の音色を維持したまま、更にマスキング効果を高めることができる。
そして、本発明の実施の形態では、フィルタ加工を施すことによりＢＧＭ音楽の音色が不自然に変化することを避けることができ、従来よりもＢＧＭ音楽の再生音量を抑えながら、従来と同等以上のマスキング効果を働かせることができ、従来よりも静かで快適な音響環境で秘匿化効果を向上させることができる。 The concealment data generation process according to the embodiment of the present invention has been described above. In the filter function creation process 33 according to the embodiment of the present invention, a value based on the voice maximum value spectrum Vv (j) is represented as a music average value spectrum. Divide value spectrum Div (j), which is a value divided for each corresponding frequency j by a value based on Vm (j), is calculated, and further, each value of divided value spectrum Div (j) corresponds to each other. The filter function F (j) is created by multiplying the value based on the auditory sensitivity correction curve L (j) defining the weight of the human auditory sensitivity for each frequency j. As a result, while enhancing the masking effect on the audio signal, the timbre of the music to be reproduced is maintained at the same level as the original sound, and the predetermined masking effect can be exerted even if the sound is reproduced at a reduced volume.
Further, in the filtering process 34 according to the embodiment of the present invention, the maximum scalar value of the complex spectrum within a predetermined frequency range is obtained for the complex spectrum multiplied by the filter function F (j) for each frame f. Further, after applying correction for multiplying each element of the complex spectrum by a predetermined scale value within a range where the scalar value of the element does not exceed the maximum scalar value, inverse Fourier transform is performed. . As a result, the masking effect can be further enhanced while maintaining the timbre of the music.
In the embodiment of the present invention, it is possible to avoid unnatural changes in the timbre of the BGM music by applying the filter processing, and while suppressing the playback volume of the BGM music more than before, it is equal to or higher than the conventional one. The masking effect can be exerted, and the concealment effect can be improved in a quieter and more comfortable acoustic environment than before.

次に、図１５、図１６を参照しながら、秘匿化装置の設置例について説明する。図１５及び図１６に示す例では、秘匿化データ生成装置２によって秘匿化データ７が生成され、音楽再生装置３である音楽プレーヤ５２に記憶されているものとする。 Next, an installation example of the concealment device will be described with reference to FIGS. 15 and 16. In the example illustrated in FIGS. 15 and 16, it is assumed that the concealment data 7 is generated by the concealment data generation device 2 and stored in the music player 52 that is the music playback device 3.

図１５は、秘匿化装置１の第１の設置例を示している。
図１５に示す例では、平面スピーカ５１ａ及び５１ｂを挟んで左側が面談スペース６０であり、右側が待合スペース６５になっている。
面談スペース６０には、面談カウンターテーブル６１、店員用椅子６２、来客用椅子６３等が設置されている。面談カウンターテーブル６１は、パーティション６４によって区切られている。また、待合スペース６５には、待合ソファー６５が設置されている。顧客は、来店すると待合スペース６５において待機し、順番に面談スペース６０に呼ばれて店員と面談する。 FIG. 15 shows a first installation example of the concealment device 1.
In the example shown in FIG. 15, the left side is the interview space 60 and the right side is the waiting space 65 with the flat speakers 51 a and 51 b interposed therebetween.
In the interview space 60, an interview counter table 61, a clerk chair 62, a visitor chair 63, and the like are installed. The interview counter table 61 is divided by a partition 64. A waiting sofa 65 is installed in the waiting space 65. When the customer visits the store, the customer waits in the waiting space 65, and in turn is called to the interview space 60 to interview the store clerk.

平面スピーカ５１ａ及び５１ｂは、ハニカム構造のパネル及びスピーカ（エキサイタ）から構成されており、例えば、ポスラサウンドパネル（本出願人の登録商標）等である。
平面スピーカ５１ａ及び５１ｂのパネルは、待合スペース６５より面談カウンターテーブル６１にいる店員や来客が覗き込めないパーティション程度の大きさがあること望ましいが、Ａ３サイズ程度の面積しかない立て看板などでも十分に効果を発揮する。すなわち、会話音声７１が、平面スピーカ５１ａ及び５１ｂに物理的に遮られることなく、待合ソファー６５まで到達しても、本発明の秘匿化データ７によって十分なマスキング効果が得られる。
尚、ポスラ（本出願人の登録商標）サウンドパネルは、横幅１メートル程度まで製作可能である。 The flat speakers 51a and 51b are composed of a panel having a honeycomb structure and a speaker (exciter), such as a Posula sound panel (registered trademark of the present applicant).
The panels of the flat speakers 51a and 51b are preferably about the size of a partition that the clerk or visitor at the interview counter table 61 cannot look into from the waiting space 65, but even a signboard with an area of only A3 size is sufficient. Demonstrate the effect. That is, even if the conversation voice 71 reaches the waiting sofa 65 without being physically blocked by the flat speakers 51a and 51b, a sufficient masking effect can be obtained by the concealment data 7 of the present invention.
The Posula (registered trademark of the present applicant) sound panel can be manufactured to a width of about 1 meter.

音楽プレーヤ５２は、平面スピーカ５１ａ及び５１ｂと接続され、本発明の実施の形態に係る秘匿化データ７を再生する。
図１５に示す例では、平面スピーカ５１ａ及び５１ｂが、それぞれ、マスカー音であるＢＧＭサウンドＬ７２ａ及びＢＧＭサウンドＲ７２ｂを出力している（ステレオ再生）。尚、ＢＧＭサウンドは、モノラル再生でも良く、平面スピーカの数や配置位置は、環境に応じて適宜変更すれば良い。 The music player 52 is connected to the flat speakers 51a and 51b, and reproduces the concealment data 7 according to the embodiment of the present invention.
In the example shown in FIG. 15, the planar speakers 51a and 51b output BGM sound L72a and BGM sound R72b, which are masker sounds, respectively (stereo reproduction). The BGM sound may be reproduced in monaural, and the number and arrangement position of the flat speakers may be changed as appropriate according to the environment.

平面スピーカ５１ａ及び５１ｂは、音楽プレーヤ５２によって、秘匿化データ７の波面が平面波に近い音波として、平面から均一に放射する機構を有することが望ましい。これによって、待合スペース６５に伝搬される過程で減衰する音波のエネルギー量が、面談スペース６０から発声される会話音声７１に比べ平面スピーカ５１ａ及び５１ｂから出力されるＢＧＭサウンド７２ａ及び７２ｂの方が小さくなり、相対的にＢＧＭサウンド７２ａ及び７２ｂのエネルギー量が面談スペース６０から発声される会話音声７１に比べ大きくなるため、マスキング効果を高めることができる。このような平面スピーカ５１ａ及び５１ｂの一例としては、特開２００７−３０１８８８号公報に開示されている。特開２００７−３０１８８８号公報に開示されているスピーカは、微細な管構造アレイのパネルによって構成されており、平面波に近い音波を均一に放射する。 The flat speakers 51a and 51b preferably have a mechanism that causes the music player 52 to uniformly radiate from the plane as sound waves in which the wavefront of the concealment data 7 is close to a plane wave. Thereby, the energy amount of the sound wave attenuated in the process of propagating to the waiting space 65 is smaller in the BGM sounds 72a and 72b output from the flat speakers 51a and 51b than in the conversational sound 71 uttered from the interview space 60. Thus, the amount of energy of the BGM sounds 72a and 72b is relatively larger than that of the conversation voice 71 uttered from the interview space 60, so that the masking effect can be enhanced. An example of such flat speakers 51a and 51b is disclosed in Japanese Patent Laid-Open No. 2007-301888. The speaker disclosed in Japanese Patent Application Laid-Open No. 2007-301888 is constituted by a panel having a fine tube structure array, and uniformly emits sound waves close to plane waves.

ここで、平面スピーカ５１ａ及び５１ｂが平面波に近い音波を放射することによって、マスキング効果を高めることができる理由について説明する。
図１５に示すように、会話音声７１は、球面波の音波として、観測位置である待合スペース６５に到達する。同様に、通常のダイナミックスピーカから再生されるＢＧＭも、球面波の音波である。
ここで、球面波の場合、距離の２乗に比例して伝搬される表面積が大きくなり音源に集中していたエネルギーが分散するため、エネルギー（音圧）が距離の２乗に反比例して減衰していくことが知られている。一方、平面波の場合、距離が離れてもエネルギーがあまり減衰しない。 Here, the reason why the planar speakers 51a and 51b can enhance the masking effect by radiating sound waves close to a plane wave will be described.
As shown in FIG. 15, the conversation voice 71 reaches the waiting space 65 as the observation position as a spherical sound wave. Similarly, BGM reproduced from a normal dynamic speaker is also a spherical sound wave.
Here, in the case of a spherical wave, the surface area propagated in proportion to the square of the distance increases, and the energy concentrated on the sound source is dispersed. Therefore, the energy (sound pressure) is attenuated in inverse proportion to the square of the distance. It is known to do. On the other hand, in the case of a plane wave, energy is not attenuated so much even if the distance is long.

すなわち、通常のダイナミックスピーカから再生されるＢＧＭは、球面波の音波であり、離れるとエネルギーが減衰するから、面談スペース６０により近い位置に待機している顧客に合わせて音量を調節すると、面談スペース６０により遠い位置に待機している顧客にはマスキング効果が十分に働かない場合がある。
一方、平面波に近い音波を放射する平面スピーカ５１ａ及び５１ｂを用いれば、再生されるＢＧＭサウンドＬ７２ａ、ＢＧＭサウンドＲ７２ｂは、平面波の音波であり、離れてもエネルギーがあまり減衰しないから、面談スペース６０により近い位置に待機している顧客に合わせて音量を調節しても、面談スペース６０により遠い位置に待機している顧客に対して十分なマスキング効果が働く。 That is, BGM reproduced from a normal dynamic speaker is a sound wave of a spherical wave, and energy is attenuated when leaving, so if the volume is adjusted according to a customer who is waiting closer to the interview space 60, the interview space The masking effect may not work sufficiently for customers who are standing farther from 60.
On the other hand, if the planar speakers 51a and 51b that radiate sound waves close to plane waves are used, the reproduced BGM sound L72a and BGM sound R72b are plane wave sound waves, and the energy does not attenuate much even if they are separated. Even if the volume is adjusted according to the customer who is waiting at a close position, a sufficient masking effect is exerted on the customer who is waiting at a far position by the interview space 60.

図１６は、秘匿化装置１の第２の設置例を示している。
図１６に示す例では、平面スピーカ５１ｃ及び５１ｄを挟んで左側が第１応接スペース８１ａであり、右側が第２応接スペース８１ｂになっている。
第１応接スペース８１ａ及び第２応接スペース８１ｂには、それぞれ、１つの応接テーブル８２と４つの椅子８３が設置されている。
第１応接スペース８１ａ及び第２応接スペース８１ｂでは、それぞれ独立して、別々の顧客を応接するようになっている。 FIG. 16 shows a second installation example of the concealment device 1.
In the example shown in FIG. 16, the first reception space 81a is on the left side of the flat speakers 51c and 51d, and the second reception space 81b is on the right side.
One reception table 82 and four chairs 83 are respectively installed in the first reception space 81a and the second reception space 81b.
In the first reception space 81a and the second reception space 81b, different customers are received independently.

平面スピーカ５１ｃ及び５１ｄは、ハニカム構造のパネル及びスピーカ（エキサイタ）から構成されており、例えば、ポスラサウンドパネル（本出願人の登録商標）等である。図１６に示す平面スピーカ５１ｃ及び５１ｄは、第１の設置例よりも横幅のサイズを大きくして、パーティションの機能も果たすものである。
平面スピーカ５１ｃ及び５１ｄには、複数のスピーカ（エキサイタ）を備えており、それぞれのスピーカから、マスカー音であるＢＧＭサウンドＬ７２ａ、ＢＧＭサウンドＲ７２ｂが出力される。
第１の設置例と同様、平面スピーカ５１ｃ及び５１ｄは、音楽プレーヤ５２によって、秘匿化データ７の波面が平面波に近い音波として、平面から均一に放射する機構を有することが望ましい。 The flat speakers 51c and 51d are composed of a honeycomb-structured panel and a speaker (exciter), such as a Posula sound panel (registered trademark of the present applicant). The flat speakers 51c and 51d shown in FIG. 16 have a larger width than the first installation example and also function as a partition.
The planar speakers 51c and 51d include a plurality of speakers (exciters), and BGM sound L72a and BGM sound R72b, which are masker sounds, are output from each speaker.
As in the first installation example, it is desirable that the flat speakers 51c and 51d have a mechanism that causes the music player 52 to uniformly radiate the wavefront of the concealment data 7 from the plane as a sound wave close to a plane wave.

図１６に示すように、マスキー音である第１会話音声７１ａは、球面波の音波として、観測位置である第２応接スペース８１ｂに到達する。同様に、マスキー音である第２会話音声７１ｂは、球面波の音波として、観測位置である第１応接スペース８１ａに到達する。
第１会話音声７１ａに対しては、第２応接スペース８１ｂにおいて、平面スピーカ５１ｄから出力されるＢＧＭサウンドＬ７２ａ、ＢＧＭサウンドＲ７２ｂがマスカー音となり、マスキング効果を発揮する。同様に、第２会話音声７１ｂに対しては、第１応接スペース８１ａにおいて、平面スピーカ５１ｃから出力されるＢＧＭサウンドＬ７２ａ、ＢＧＭサウンドＲ７２ｂがマスカー音となり、マスキング効果を発揮する。 As shown in FIG. 16, the first conversation voice 71a, which is a Musky sound, reaches the second reception space 81b, which is the observation position, as a spherical sound wave. Similarly, the second conversation voice 71b, which is a Musky sound, reaches the first reception space 81a, which is the observation position, as a spherical sound wave.
For the first conversation voice 71a, in the second reception space 81b, the BGM sound L72a and BGM sound R72b output from the flat speaker 51d become masker sounds and exhibit a masking effect. Similarly, for the second conversation voice 71b, in the first reception space 81a, the BGM sound L72a and BGM sound R72b output from the flat speaker 51c become masker sounds and exhibit a masking effect.

以上、秘匿化装置１の設置例を説明したが、前述したように、楽曲信号を再生するスピーカとして、平面波に近い音波を放射する平面スピーカを使用することによって、比較的低い音量でＢＧＭを流しても音声秘匿化効果を発揮できる。
また、平面スピーカは、Ａ３サイズ程度の立て看板から、横幅１メートル程度のパーティションまで、様々な態様とすることができる。
また、平面スピーカのパネル面の絵柄としては、壁紙などのインテリア素材やポスター広告を用いることができ、視覚的にもスピーカがむき出しになるようなインテリア上の不自然さを回避することができる。 As described above, the installation example of the concealment device 1 has been described. As described above, a BGM is played at a relatively low volume by using a flat speaker that emits a sound wave close to a plane wave as a speaker that reproduces a music signal. However, the voice concealment effect can be exhibited.
Further, the flat speaker can take various forms from a standing signboard of about A3 size to a partition of about 1 meter wide.
In addition, as the design of the panel surface of the flat speaker, interior materials such as wallpaper and poster advertisements can be used, and the unnaturalness on the interior that the speaker is exposed visually can be avoided.

尚、前述の説明では、平面スピーカが立て看板やパーティションとしたが、本発明の実施の形態はこれに限定されない。例えば、スピーカを部屋の壁に内蔵し、部屋の四方からマスカー音であるＢＧＭサウンドを出力させることも可能である。 In the above description, the planar speaker is a standing signboard or partition, but the embodiment of the present invention is not limited to this. For example, a speaker can be built in the wall of the room, and BGM sound, which is a masker sound, can be output from all sides of the room.

次に、図１７〜図２４を参照しながら、実施例及び比較例について説明する。図１７、図１８は、実施例及び比較例に用いられるデータを示している。図１９、図２０は、比較例の結果を示している。図２１〜図２４は、実施例の結果を示している。
実施例では、聴覚感度補正処理４５、圧縮処理４９を行って、秘匿化データ７を生成した。一方、比較例では、聴覚感度補正処理４５、圧縮処理４９を行わずに、秘匿化データを生成した。 Next, examples and comparative examples will be described with reference to FIGS. 17 and 18 show data used in the examples and comparative examples. 19 and 20 show the results of the comparative example. 21 to 24 show the results of the examples.
In the embodiment, the concealment data 7 is generated by performing the auditory sensitivity correction process 45 and the compression process 49. On the other hand, in the comparative example, the concealment data was generated without performing the auditory sensitivity correction process 45 and the compression process 49.

図１７は、実施例及び比較例の音声最大値スペクトルを示す図である。図１７には、周波数解析処理３２によって出力される音声データ４の音声最大値スペクトルが示されている。尚、この音声最大値スペクトルは、１２〜１３ｋＨｚにピークがある。 FIG. 17 is a diagram showing the maximum audio spectrum of the example and the comparative example. FIG. 17 shows the maximum audio spectrum of the audio data 4 output by the frequency analysis process 32. This voice maximum spectrum has a peak at 12 to 13 kHz.

図１８は、実施例及び比較例の音楽平均値スペクトルを示す図である。図１８には、周波数解析処理３２によって出力される音楽データ５の音楽平均値スペクトルが示されている。
図１８を参照すると、この音楽平均値スペクトルは、１／ｆの曲線に近いことが分かる。 FIG. 18 is a diagram illustrating the music average spectrum of the example and the comparative example. FIG. 18 shows a music average value spectrum of the music data 5 output by the frequency analysis processing 32.
Referring to FIG. 18, it can be seen that this music average spectrum is close to a 1 / f curve.

図１９は、比較例のフィルタ関数を示す図である。比較例のフィルタ関数は、聴覚感度補正処理４５、圧縮処理４９を行わずに作成されている。
後述する図２２と比較して、図１９を参照すると、５ｋＨｚ〜１０ｋＨｚの周波数成分の値が高いことが分かる。この為、比較例のフィルタ関数を用いてフィルタリング処理を行うと、ヒト音声信号成分が多く含まれる５ｋＨｚ〜１０ｋＨｚの周波数成分が強調され易い。５ｋＨｚ〜１０ｋＨｚの周波数帯域は、ヒト聴覚系の感度特性が比較的低い領域ではあるが、このフィルタ関数を用いて音楽信号にフィルタ加工を施すと、音色が不自然に変化して煩くなる。 FIG. 19 is a diagram illustrating a filter function of a comparative example. The filter function of the comparative example is created without performing the auditory sensitivity correction process 45 and the compression process 49.
Compared with FIG. 22 described later, referring to FIG. 19, it can be seen that the value of the frequency component of 5 kHz to 10 kHz is high. For this reason, when the filtering process is performed using the filter function of the comparative example, the frequency components of 5 kHz to 10 kHz that contain many human audio signal components are easily emphasized. The frequency band of 5 kHz to 10 kHz is a region where the sensitivity characteristic of the human auditory system is relatively low. However, if the music signal is filtered using this filter function, the timbre changes unnaturally and becomes cumbersome.

図２０は、比較例のフィルタリング処理後の音楽信号を示す図である。図２０には、比較例のフィルタ関数を用いたフィルタリング処理が行われた音楽信号が示されている。
図２０に示す音楽信号は、前述したように、音色が不自然に変化して煩わしく感じる。
また、後述する図２３、図２４と比較して、図２０を参照すると、１０ｋＨｚ以上の高域部が高い値になっている。１０ｋＨｚ以上の周波数成分はヒト音声信号成分がそれなりに存在するが、ヒトの聴覚感度が低いため、マスキングにあまり寄与しない。従って、聴覚感度が高い４ｋＨｚ以下の帯域を基準に再生音量を設定することになり、不必要に全体の音圧レベルを上げてしまうことになる。従って、かなり音量を上げないと、マスキング効果が得られ難い。 FIG. 20 is a diagram illustrating the music signal after the filtering process of the comparative example. FIG. 20 shows a music signal that has been filtered using the filter function of the comparative example.
As described above, the music signal shown in FIG. 20 feels annoying because the timbre changes unnaturally.
In addition, compared with FIGS. 23 and 24 described later, referring to FIG. 20, the high frequency region of 10 kHz or higher has a higher value. Although a human audio signal component exists as it is in a frequency component of 10 kHz or higher, it does not contribute much to masking because human auditory sensitivity is low. Therefore, the reproduction volume is set based on a band of 4 kHz or less where the auditory sensitivity is high, and the entire sound pressure level is unnecessarily increased. Therefore, it is difficult to obtain a masking effect unless the volume is increased considerably.

図２１は、実施例の聴覚感度補正曲線を示す図である。図２１には、図４の下段に示す各周波数に対する聴覚感度補正曲線６の音圧レベルをプロットして示している。 FIG. 21 is a diagram illustrating an auditory sensitivity correction curve of the example. FIG. 21 plots the sound pressure level of the auditory sensitivity correction curve 6 for each frequency shown in the lower part of FIG.

図２２は、実施例のフィルタ関数を示す図である。図２２には、図２１に示す聴覚感度補正曲線６を用いて、フィルタ関数作成処理３３によって作成されたフィルタ関数が示されている。
図１９と比較して、図２２を参照すると、５ｋＨｚ〜１０ｋＨｚの周波数成分の値が低いことが分かる。この為、実施例のフィルタ関数を用いてフィルタリング処理３４を行うと、５ｋＨｚ〜１０ｋＨｚの周波数帯の強調度合いが抑えられて不自然な音色になることを回避できる。また、ヒトの聴覚感度が高く音声を識別するためのフォルマント成分に富む３００Ｈｚ〜３．４ｋＨｚの周波数帯が強調されることになり、再生音量をあまり上げなくてもマスキングが有効に働きやすくなる。 FIG. 22 is a diagram illustrating a filter function of the embodiment. FIG. 22 shows a filter function created by the filter function creation process 33 using the auditory sensitivity correction curve 6 shown in FIG.
Compared to FIG. 19, referring to FIG. 22, it can be seen that the value of the frequency component of 5 kHz to 10 kHz is low. For this reason, when the filtering process 34 is performed using the filter function of the embodiment, it is possible to prevent the degree of emphasis in the frequency band of 5 kHz to 10 kHz from being suppressed and an unnatural tone. In addition, the frequency band of 300 Hz to 3.4 kHz, which has a high human auditory sensitivity and is rich in formant components for identifying speech, is emphasized, and masking can easily work effectively without increasing the playback volume.

図２３は、実施例のフィルタリング処理後（圧縮なし）の音楽信号を示す図である。図２３には、実施例のフィルタ関数を用いたフィルタリング処理３４が行われた音楽信号が示されている。但し、図２３に示す音楽信号は、周波数次元圧縮処理４９が行われていない。
図２３に示す音楽信号は、前述したように、５ｋＨｚ〜１０ｋＨｚの周波数帯の強調度合いが抑えられて自然な音色となっている。また、ヒトの聴覚感度が高く音声を識別するためのフォルマント成分に富む３００Ｈｚ〜３．４ｋＨｚの周波数帯が強調され、再生音量をあまり上げなくてもマスキングが有効に働く。 FIG. 23 is a diagram illustrating a music signal after filtering processing (no compression) according to the embodiment. FIG. 23 shows a music signal that has been subjected to the filtering process 34 using the filter function of the embodiment. However, the music signal shown in FIG. 23 is not subjected to the frequency dimension compression processing 49.
As described above, the music signal shown in FIG. 23 has a natural tone color with the degree of enhancement in the frequency band of 5 kHz to 10 kHz being suppressed. Further, the frequency band of 300 Hz to 3.4 kHz, which has high human auditory sensitivity and is rich in formant components for identifying speech, is emphasized, and masking works effectively even if the reproduction volume is not increased so much.

図２４は、実施例のフィルタリング処理後（圧縮あり）の音楽信号を示す図である。図２４には、実施例のフィルタ関数を用いたフィルタリング処理３４が行われた音楽信号が示されている。また、図２４に示す音楽信号は、周波数次元圧縮処理４９が行われている。
図２４に示す音楽信号は、前述したように、５ｋＨｚ〜１０ｋＨｚの周波数帯の強調度合いが抑えられて自然な音色となっている。また、ヒトの聴覚感度が高く音声を識別するためのフォルマント成分に富む３００Ｈｚ〜３．４ｋＨｚの周波数帯が強調され、再生音量をあまり上げなくてもマスキングが有効に働く。
更に、図２３と比較して、図２４を参照すると、低域部が離散的な値を取りながら、全体として平坦になっていることが分かる。つまり、離散的なスペクトル特性の状態を維持したまま、低域部を白色雑音のように若干平坦に近づけることができている。これによって、音楽の音色を維持したまま、マスキング効果を高めることができる。 FIG. 24 is a diagram illustrating a music signal after filtering processing (with compression) according to the embodiment. FIG. 24 shows a music signal that has been subjected to the filtering process 34 using the filter function of the embodiment. Also, the music signal shown in FIG. 24 is subjected to frequency dimension compression processing 49.
As described above, the music signal shown in FIG. 24 has a natural tone color with the degree of emphasis in the frequency band of 5 kHz to 10 kHz being suppressed. Further, the frequency band of 300 Hz to 3.4 kHz, which has high human auditory sensitivity and is rich in formant components for identifying speech, is emphasized, and masking works effectively even if the reproduction volume is not increased so much.
Furthermore, referring to FIG. 24 as compared with FIG. 23, it can be seen that the low frequency region is flat as a whole while taking discrete values. That is, the low-frequency part can be made slightly flat like white noise while maintaining the state of discrete spectral characteristics. As a result, the masking effect can be enhanced while maintaining the timbre of the music.

以上、添付図面を参照しながら、本発明に係る秘匿化データ生成装置等の好適な実施形態について説明したが、本発明はかかる例に限定されない。当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the concealed data generation device and the like according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. Understood.

１………秘匿化装置
２………秘匿化データ生成装置
３………音楽再生装置
４………音声データ
５………音楽データ
６………聴覚感度補正曲線
７………秘匿化データ
１０………音声フレーム群
１１………音楽フレーム群
１２………音声最大値スペクトルデータ
１３………音声平均値スペクトルデータ
１４………フィルタ関数データ
３１………フレーム抽出処理
３２………周波数解析処理
３２ａ………周波数解析
３３………フィルタ関数作成処理
３４………フィルタリング処理
４１………瞬時スペクトル算出処理
４２………平均スペクトル算出処理
４３………臨界帯域幅補正処理
４４………除算処理
４５………聴覚感度補正処理
４６………平滑化処理
４７………フーリエ変換処理
４８………フィルタ関数乗算処理
４９………周波数次元圧縮処理
５０………フーリエ逆変換処理 DESCRIPTION OF SYMBOLS 1 ......... Concealment apparatus 2 ......... Concealment data generation apparatus 3 ......... Music reproduction apparatus 4 ......... Audio data 5 ......... Music data 6 ......... Hearing sensitivity correction curve 7 ......... Concealment data 10 ......... Audio frame group 11 ......... Music frame group 12 ......... Maximum audio spectrum data 13 ......... Audio average spectrum data 14 ......... Filter function data 31 ......... Frame extraction process 32 ......... Frequency analysis processing 32a ......... frequency analysis 33 ......... filter function creation processing 34 ......... filtering processing 41 ......... instantaneous spectrum calculation processing 42 ......... average spectrum calculation processing 43 ......... critical bandwidth correction processing 44 ... …… Division processing 45 …… Hearing sensitivity correction processing 46 ………… Smoothing processing 47 ……… Fourier transform processing 48 ……… Filter function multiplication processing 49 ……… Frequency processing Multidimensional compression process 50 ... Fourier inverse transform process

Claims

A concealment data generation device that generates concealment data that is music data for concealing dialogue voice,
Frequency analysis is performed on each of voice data and music data stored in advance, and a voice maximum value spectrum Vv (j) (j is a frequency) which is a maximum spectrum in the time axis direction of the voice data is calculated, Frequency analysis means for calculating a music mean value spectrum Vm (j), which is a spectrum averaged in the time axis direction of music data;
A division value spectrum Div (j), which is a value obtained by dividing a value based on the maximum speech value spectrum Vv (j) by a value based on the music average value spectrum Vm (j) for each corresponding frequency j, is calculated; Further, each value of the division value spectrum Div (j) is multiplied by a value based on an auditory sensitivity correction curve L (j) defining a weight of human auditory sensitivity for each frequency j corresponding to each other, thereby obtaining a filter. A filter function creating means for creating a function F (j);
The concealed data is generated by dividing the music data into frames f that are units of a predetermined section, Fourier transforming each divided frame f, multiplying by the filter function F (j), and inverse Fourier transform Filtering means to
A concealed data generating apparatus comprising:

The concealment data generation device according to claim 1, wherein the auditory sensitivity correction curve L (j) used by the filter function creation means is defined based on an equal loudness curve of 40 phones.

The filtering means obtains a maximum scalar value of the complex spectrum within a predetermined frequency range for the complex spectrum multiplied by the filter function F (j) for each frame f, and further, the complex spectrum And performing the inverse Fourier transform after performing a correction for multiplying a predetermined one or more scale values within a range in which the scalar value of the element does not exceed the maximum scalar value. The concealed data generation device according to claim 1.

The filter function creating means includes
Replacing the voice maximum value spectrum Vv (jc) (jc is a specific frequency) with a maximum value in a range higher than the frequency jc to calculate a replacement voice maximum value spectrum;
Replacing the music average value spectrum Vm (jc) with an average value within a range before and after the frequency jc to calculate a replacement music average value spectrum;
The concealment data according to any one of claims 1 to 3, wherein a value obtained by dividing the replacement speech maximum value spectrum by the replacement music average value spectrum is the division value spectrum Div (j). Generator.

The filter function creating means multiplies each value of the filter function F (j) by a value based on the auditory sensitivity correction curve L (j), and then replaces it with an average value within a range before and after the frequency j. 5. The concealed data generation device according to claim 1, wherein the filter function F (j) is smoothed.

The frequency analysis means calculates, for each frame f, a spectrum averaged in the time axis direction over M frames before and after each frame f of the music data as the music average value spectrum Vm (f, j),
The filter function creation means sets a value based on the voice maximum value spectrum Vv (j) as the divided value spectrum Div (f, j) to the music average value spectrum Vm (f, j) corresponding to the frame f. A value obtained by dividing each frequency j corresponding to each other by a value based on the calculated value, and further, for each value of the divided value spectrum Div (f, j), the auditory sensitivity correction curve L (for each frequency j corresponding to each other is calculated. 6. The concealed data generation device according to claim 1, wherein the filter function F (f, j) is created by multiplying a value based on j).

Music data storage means for storing a plurality of the music data;
Music data selection means for selecting a single piece of music data from the music data stored by the music data storage means;
Further comprising
The concealment data generation device according to claim 1, wherein the concealment data is generated based on the single music data selected by the music data selection unit.

A concealed data storage unit that stores a plurality of the concealment data generated by the concealment data generation device according to any one of claims 1 to 7,
Concealment data selection means for selecting a single concealment data from the concealment data stored by the concealment data storage means;
Concealed data reproducing means for reproducing the single concealed data selected by the concealed data selecting means;
A concealment device comprising:

9. The concealment method according to claim 8, wherein the concealment data reproducing means is constituted by a flat speaker having a mechanism for uniformly radiating the concealment data from a predetermined plane as a sound wave having a wavefront close to a plane wave. apparatus.

A concealment data generation method for generating concealment data, which is music data for concealing dialogue voice,
Frequency analysis is performed on each of voice data and music data stored in advance, and a voice maximum value spectrum Vv (j) (j is a frequency) which is a maximum spectrum in the time axis direction of the voice data is calculated, A frequency analysis step of calculating a music average value spectrum Vm (j) which is a spectrum averaged in the time axis direction of the music data;
A division value spectrum Div (j), which is a value obtained by dividing a value based on the maximum speech value spectrum Vv (j) by a value based on the music average value spectrum Vm (j) for each corresponding frequency j, is calculated; Further, each value of the division value spectrum Div (j) is multiplied by a value based on an auditory sensitivity correction curve L (j) defining a weight of human auditory sensitivity for each frequency j corresponding to each other, thereby obtaining a filter. A filter function creating step for creating a function F (j);
The concealed data is generated by dividing the music data into frames f that are units of a predetermined section, Fourier transforming each divided frame f, multiplying by the filter function F (j), and inverse Fourier transform A filtering step to
A method for generating concealed data, comprising:

A concealed data storage step of storing a plurality of the concealed data generated by the concealed data generation method according to claim 10;
A concealment data selection step of selecting a single concealment data from the concealment data stored by the concealment data storage step;
A concealed data reproduction step of reproducing the single concealment data selected by the concealment data selection step;
The concealment method characterized by including this.

On the computer,
Frequency analysis is performed on each of voice data and music data stored in advance, and a voice maximum value spectrum Vv (j) (j is a frequency) which is a maximum spectrum in the time axis direction of the voice data is calculated, A frequency analysis step of calculating a music average value spectrum Vm (j) which is a spectrum averaged in the time axis direction of the music data;
A division value spectrum Div (j), which is a value obtained by dividing a value based on the maximum speech value spectrum Vv (j) by a value based on the music average value spectrum Vm (j) for each corresponding frequency j, is calculated; Further, each value of the division value spectrum Div (j) is multiplied by a value based on an auditory sensitivity correction curve L (j) defining a weight of human auditory sensitivity for each frequency j corresponding to each other, thereby obtaining a filter. A filter function creating step for creating a function F (j);
The concealed data is generated by dividing the music data into frames f that are units of a predetermined section, Fourier transforming each divided frame f, multiplying by the filter function F (j), and inverse Fourier transform A filtering step to
A computer-readable program for executing the program.