JP5662712B2

JP5662712B2 - Voice changing device, voice changing method and voice information secret talk system

Info

Publication number: JP5662712B2
Application number: JP2010145039A
Authority: JP
Inventors: 孝芳中井; 川上　福司; 福司川上
Original assignee: Nippon Sheet Glass Environment Amenity Co Ltd
Current assignee: Nippon Sheet Glass Environment Amenity Co Ltd
Priority date: 2010-06-25
Filing date: 2010-06-25
Publication date: 2015-02-04
Anticipated expiration: 2030-06-25
Also published as: JP2012008392A

Description

本発明は、非線形関数を使用して音声を変更する音声変更装置、音声変更方法およびその音声変更装置を備える音声情報秘話システムに関する。 The present invention relates to a voice changing device that changes voice using a non-linear function, a voice changing method, and a voice information secret talk system including the voice changing device.

個人情報保護法などの施行により銀行やオフィスにおける会話情報の保護の必要性が高まっている。その手段として、従来から物理的に空間を分ける遮音・防音や、オープンプランオフィスなどにおいて会話音声を別の雑音・音楽などで隠蔽するＢＧＭ・マスキングシステムなどが提案されてきた。 With the enforcement of the Personal Information Protection Law, there is an increasing need to protect conversation information in banks and offices. Conventionally, sound insulation / soundproofing that physically separates the space, BGM / masking system that conceals conversational speech with other noise / music, etc. in an open plan office have been proposed.

音声情報の隠蔽という目的については従来から、
（１）対象音声を他の定常的な雑音で隠蔽するマスキングシステム（Masking System）
（２）室内の暗騒音や空調騒音で隠蔽するシェーディングシステム（Shading System）
（３）遮音・防音（対象室を空間的に区画し、音響的に分離する）
等があった。（１）の例は音声の存在そのものを（無理やり）消し去ろうとするもので、エネルギマスキング（Energy Masking）と位置付けられる。これは例えばオープンプランオフィスのブースや会議室に使用されている。 For the purpose of concealing voice information,
(1) Masking system that masks the target speech with other stationary noise
(2) Shading system concealed by indoor background noise and air conditioning noise
(3) Sound insulation / sound insulation (the target room is spatially separated and acoustically separated)
Etc. The example (1) attempts to (forcefully) erase the presence of speech, and is positioned as energy masking. This is used, for example, in an open plan office booth or conference room.

（１）のシステムの例が非特許文献１に報告されている。そこでは、天井内部などに専用のジェネレータやスピーカを設置し、マスキング音を発生して音声の隠蔽を行っている。その原理は、会話の邪魔にならない程度の（会話とは脈絡のない）音楽や雑音を生成し、いわゆるＳ／Ｎを低減して音声の内容を隠蔽したり、明瞭度・了解度を低減したりして、会話内容を理解できない程度まで隠蔽しようとするものである。システムには会話レベルや室内暗騒音などに応じてマスキング音を最適レベルに制御する制御装置（信号処理装置）・電力増幅器などが含まれる。 An example of the system (1) is reported in Non-Patent Document 1. There, a dedicated generator or speaker is installed inside the ceiling, etc., and masking sound is generated to conceal the sound. The principle is that it generates music and noise that is not in the way of conversation (contrast with conversation), conceals the contents of speech by reducing so-called S / N, and reduces clarity and intelligibility. Or to conceal the content of the conversation to the extent that it cannot be understood. The system includes a control device (signal processing device), a power amplifier, and the like that control the masking sound to the optimum level according to the conversation level, background noise, and the like.

また、この技術を利用した例としては、パーティションからブース内へマスキング用のノイズを放射し、対象空間領域をブースに限定することにより、室内全体の騒音レベルが上昇するのを抑えようとしたものがある。 In addition, as an example using this technology, noise for masking was radiated from the partition into the booth, and the target space area was limited to the booth to suppress the increase in the noise level in the entire room. There is.

（２）のシステムの例が非特許文献２に報告されている。そこでは、放射するマスキングノイズとして、室内の暗騒音そのものや、日常的に身近な空調騒音を使用した「Sound Shading System」が報告されている。このシステムでは、銀行の窓口などにおけるプライバシーの確保を目的とした視覚遮断的なパーティションに対し、会話のプライバシー保護を目的としてパーティション頂部にスピーカを設置する。このスピーカからマスキング音を再生し、それによりパーティションの反対側にいる人への会話内容の漏洩・伝達の阻止を図る。再生する音には街の雑踏をもとに生成した音や、その部屋の空調騒音を使用する。 An example of the system (2) is reported in Non-Patent Document 2. There are reports of “Sound Shading System” that uses indoor background noise itself and air-conditioning noise that is familiar everyday as radiating masking noise. In this system, a speaker is installed at the top of the partition for the purpose of protecting the privacy of conversation, in contrast to a visually interrupting partition for the purpose of ensuring privacy at a bank counter. A masking sound is reproduced from this speaker, thereby preventing the leakage / transmission of conversation contents to a person on the other side of the partition. The sound to be reproduced is the sound generated based on the crowds of the city or the air conditioning noise of the room.

（３）のシステムの例としては、別室として区画する遮音や、パーティションなどで区画する防音がある。 As an example of the system of (3), there is sound insulation partitioned as a separate room or soundproof partitioned by a partition.

特開２００８−２３３６７１号公報JP 2008-233671 A

コクヨ社プレスリリース、サウンドマスキング、２００６年１０月１８日KOKUYO Press Release, Sound Masking, October 18, 2006 杉本明子、中村隆宏、伊勢史郎、「会話のしやすさとプライバシーを考慮した音場を生成する Sound Shading System の評価」、日本音響学会２００５年春季研究発表会講演論文集、ｐ．８１７Akiko Sugimoto, Takahiro Nakamura, Shiro Ise, “Evaluation of Sound Shading System that generates sound field in consideration of ease of conversation and privacy”, Acoustical Society of Japan Spring Meeting 2005, Proceedings, p. 817 電子情報通信学会、聴覚と音声、１９７３年、ｐ．３７０−３７１The Institute of Electronics, Information and Communication Engineers, Auditory and Speech, 1973, p. 370-371 梶田、小林、武田、板倉、「ヒューマンスピーチライク雑音に含まれる音声的特徴の分析」、日本音響学会誌、１９９７年５月１日、５３（５）、ｐ．３３７−３４５Iwata, Kobayashi, Takeda, Itakura, “Analysis of phonetic features contained in human speech-like noise”, Journal of the Acoustical Society of Japan, May 1, 1997, 53 (5), p. 337-345

本発明者は、上述のマスキング／シェーディング技術に関して以下の課題を認識した。
（Ｉ）原音声とは脈絡のない新たな音を放射するので、違和感を伴い室内空間の騒音レベルを上昇させ得る。
（ＩＩ）音声発生のないいわゆる「無音時」にも騒音、つまりマスキング音が聞こえ得る。
（ＩＩＩ）会話とは関係のない別の音（騒音・音楽）を放射することにより、発声者・会話者・その他の在室者に少なからず違和感を与え得る。
（ＩＶ）音声の情報隠蔽は、性質の異なるもの同士は区別して認識する、という聴覚の性質により、雑音やBGMでは奏功しにくいという基本的な問題を含む（包絡線（エンベロープ）やスペクトルが似通った音声波形同士の方が聴覚認識上、区別されにくい）。 The inventor has recognized the following problems regarding the above-described masking / shading technique.
(I) Since the original sound emits a new sound having no context, the noise level in the indoor space can be raised with a sense of incongruity.
(II) Noise, that is, a masking sound can be heard even when the sound is not generated.
(III) By emitting another sound (noise / music) unrelated to conversation, it is possible to give a sense of incongruity to a speaker, a talker, and other people in the room.
(IV) Concealment of speech information includes the basic problem that it is difficult to succeed with noise and BGM due to the auditory nature of distinguishing and recognizing different things (envelope and spectrum are similar) Audio waveforms are more difficult to distinguish for auditory recognition).

（Ｉ）については、経験上原音声を完全にマスクするのに必要な雑音の相対レベルは略１５ｄＢである（非特許文献３参照）。この視点から見ると、雑音や音楽を流すことにより音声を隠蔽するという方法（masking approach）では、原音声に対してそれ以上のかなり大きな音量の雑音や音楽が必要となり、maskingであれshadingであれ、室内騒音レベルを大きく上昇させ得る。 As for (I), the relative level of noise necessary for completely masking the original voice is empirically about 15 dB (see Non-Patent Document 3). From this point of view, the method of masking the sound by playing noise and music (masking approach) requires much louder noise and music than the original sound, whether masking or shading. The room noise level can be greatly increased.

（ＩＩ）については、発話がない時にも音がするという違和感を伴う。またそもそも発話がない時に雑音や音楽を流すことは会話内容の隠蔽の観点からは無駄と言える。また無駄であるばかりでなく、室の等価騒音レベル（L_Aeq：A-weighted equivalent sound level＝A特性で補正した音声信号の一定区間の自乗平均音圧レベル、つまり平均的な騒音レベル）を上昇させる結果となりうる。雑音の代わりに音楽や音声から作成した「ＨＳＬ雑音（Human Speech-like noise）」（非特許文献４参照）を流した場合でも、一般的なＢＧＭとの区別は困難である。 Regarding (II), there is a sense of incongruity that a sound is produced even when there is no utterance. In the first place, playing noise and music when there is no utterance is useless from the viewpoint of concealing conversation content. Not only is it wasteful, but it also _{increases the} room equivalent noise level (L _Aeq : A-weighted equivalent sound level = the root mean square sound pressure level of the audio signal corrected with the A characteristic, that is, the average noise level). Can result. Even when “HSL noise (Human Speech-like noise)” (see Non-Patent Document 4) created from music or voice is used instead of noise, it is difficult to distinguish from general BGM.

また、（３）のアプローチについては、費用的にかなり大きなものとなり、また開放感を阻害するのでオープンプランオフィスなどでの使用には適さない。 In addition, the approach (3) is considerably large in cost and hinders a feeling of opening, and is not suitable for use in an open plan office or the like.

また、特許文献１に記載のサウンドマスキングシステムでは、入力音(声)の話速を分析し、これに応じたフレーム長で分割して処理し、処理音声を合成する方法が述べられている。しかしながら、このシステムは「約２秒単位で入力音(声)を一時記憶し一連の処理を行う」ので、処理音声はそれがマスキング対象とする音声とは別の、過去の音声から生成される。したがって、処理音声とそれがマスキング対象とする音声との関連性は薄く、マスキング効果は十分とは言えない。 Further, in the sound masking system described in Patent Document 1, a method is described in which the speech speed of an input sound (voice) is analyzed, divided and processed by a frame length corresponding to this, and the processed speech is synthesized. However, since this system “stores the input sound (voice) temporarily in about 2 seconds and performs a series of processing”, the processed sound is generated from the past sound that is different from the sound that is the masking target. . Therefore, the relevance between the processed voice and the voice targeted for masking is low, and the masking effect is not sufficient.

本発明はこうした課題に鑑みてなされたものであり、その目的は、騒音レベルや受聴者の不快感の増長を抑えた上で、実時間或いは実時間に準ずる処理速度で音声の内容を隠蔽する技術の提供にある。 The present invention has been made in view of these problems, and its purpose is to conceal the contents of audio at real time or at a processing speed equivalent to real time, while suppressing the increase in noise level and listener discomfort. The provision of technology.

本発明のある態様は、音声変更装置に関する。この音声変更装置は、発話中の音声を表す音声信号から変更対象部分の信号を抽出する部分抽出部と、部分抽出部によって抽出された変更対象部分の信号を非線形関数を使用して変更する非線形変更部と、少なくとも非線形変更部によって変更された変更対象部分の信号を、発話中の音声が受聴されている領域に音声を出力可能な音声出力手段に出力する出力部と、を備える。 One embodiment of the present invention relates to a sound changing device. The voice changing device includes a partial extraction unit that extracts a signal of a change target portion from a voice signal representing a voice being uttered, and a nonlinear that changes a signal of the change target portion extracted by the partial extraction unit using a nonlinear function A changing unit; and an output unit that outputs at least the signal of the change target portion changed by the non-linear changing unit to a sound output unit capable of outputting the sound to a region where the sound being spoken is received.

この態様によると、発話中の音声が受聴されている領域に、その発話中の音声に非線形処理を施した音声を実質的に実時間で出力することができる。 According to this aspect, it is possible to output, in real time, a sound obtained by performing non-linear processing on the sound being uttered in a region where the sound being uttered is being listened to.

本発明の別の態様は、音声情報秘話システムである。この音声情報秘話システムは、発話中の音声を受け、それを表す音声信号を生成する集音手段と、集音手段によって生成された音声信号を変更する音声変更装置と、音声変更装置によって変更された音声信号を音声に変換して発話中の音声が受聴されている領域に出力する音声出力手段と、を備える。音声変更装置は、集音手段によって生成された音声信号から変更対象部分の信号を抽出する部分抽出部と、部分抽出部によって抽出された変更対象部分の信号を非線形関数を使用して変更する非線形変更部と、少なくとも非線形変更部によって変更された変更対象部分の信号を音声出力手段に出力する出力部と、を含む。 Another aspect of the present invention is a speech information secret talk system. The voice information secret speech system is modified by a sound collecting unit that receives a voice being uttered and generates a voice signal representing the voice, a voice changing device that changes a voice signal generated by the sound collecting unit, and a voice changing device. Voice output means for converting the received voice signal into a voice and outputting the voice signal to a region where the voice being spoken is received. The voice changing device includes a partial extraction unit that extracts a signal of the change target portion from the voice signal generated by the sound collecting means, and a nonlinear that changes the signal of the change target portion extracted by the partial extraction unit using a nonlinear function A changing unit, and an output unit that outputs a signal of the change target portion changed by at least the nonlinear changing unit to the audio output unit.

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, or those obtained by replacing the constituent elements and expressions of the present invention with each other between apparatuses, methods, systems, computer programs, recording media storing computer programs, and the like are also included in the present invention. It is effective as an embodiment of

本発明によれば、騒音レベルや受聴者の不快感の増長を抑えた上で音声の内容を隠蔽できる。 ADVANTAGE OF THE INVENTION According to this invention, the content of an audio | voice can be concealed, suppressing the increase in a noise level and a listener's discomfort.

マスキングに関する従来のアプローチと実施の形態に係るアプローチをカテゴリに分けて示す説明図である。It is explanatory drawing which divides into a category the conventional approach regarding masking, and the approach which concerns on embodiment. 実施の形態に係る音声情報秘話システムが設けられたブースを模式的に示す斜視図である。It is a perspective view which shows typically the booth provided with the audio | voice information secret talk system which concerns on embodiment. 図２の音声情報秘話システムの機能および構成を模式的に示すブロック図である。It is a block diagram which shows typically the function and structure of the audio | voice information confidential system of FIG. 図２のＩＴパーティションの構成を示す側面図である。It is a side view which shows the structure of the IT partition of FIG. 図３のＳＤコントローラ部ＳＤの機能および構成を示すブロック図である。It is a block diagram which shows the function and structure of SD controller part SD of FIG. 部分決定部における変更対象部分の信号の決定基準を説明するための説明図である。It is explanatory drawing for demonstrating the determination criteria of the signal of the change object part in a part determination part. 第２変更部における処理の一例を示す説明図である。It is explanatory drawing which shows an example of the process in a 2nd change part. 受聴者位置におけるマスキーおよびマスカーを表す音声信号の波形を示す波形図である。It is a wave form diagram which shows the waveform of the audio | voice signal showing the maskee and masker in a listener position. ＳＤコントローラ部およびスピーカにおける一連の処理を示すフローチャートである。It is a flowchart which shows a series of processes in an SD controller part and a speaker. マスカーとマスキーの違いと認識率との関係を示すグラフである。It is a graph which shows the relationship between the difference between a masker and a maskee, and a recognition rate. 第１変形例に係る音声情報秘話システムの機能および構成を模式的に示すブロック図である。It is a block diagram which shows typically the function and structure of the audio | voice information confidential system which concerns on a 1st modification. 第２変形例に係る音声情報秘話システムの機能および構成を模式的に示すブロック図である。It is a block diagram which shows typically the function and structure of the audio | voice information confidential system which concerns on a 2nd modification.

以下、本発明を好適な実施の形態をもとに図面を参照しながら説明する。各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。 The present invention will be described below based on preferred embodiments with reference to the drawings. The same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated descriptions are omitted as appropriate.

特にオフィスなどにおいては、オープンプランの空間が有する開放性やコミュニケーションの円滑性を損なわずに音声情報、つまり音声の内容だけが隠蔽されることが望ましい。しかしながら、従来のＢＧＭやマスキングを使用する技術は、基本的には音声とは性質の異なる別の音を加えるので、聴覚的な違和感や室内の暗騒音を上昇させてしまうという嫌いがあった。本発明の実施の形態は、マイクロホンなどにより集音した音声信号そのものの構造を非線形関数を使用して変更し、変更された音声を原音声に対して実質的に実時間で出力することにより室内の暗騒音を上昇させることなく会話の内容を、理想的には会話の内容のみを、隠蔽／遮断し、円滑で快適な秘話環境を実現する。 Particularly in offices and the like, it is desirable to conceal only the voice information, that is, the voice content without impairing the openness and smoothness of communication of the open plan space. However, the conventional technology using BGM and masking basically adds another sound having a different property from that of speech, so that there is a dislike of increasing the sense of incongruity and background noise in the room. The embodiment of the present invention changes the structure of a sound signal itself collected by a microphone or the like using a nonlinear function, and outputs the changed sound substantially in real time with respect to the original sound. Concealing / blocking the content of the conversation without raising the background noise, ideally only the content of the conversation, a smooth and comfortable secret environment is realized.

図１は、マスキングに関する従来のアプローチと実施の形態に係るアプローチをカテゴリに分けて示す説明図である。（ａ）は、電気音響を用いたＳＲ（Sound Reinforcement）／ＰＡ（Public Address）である。これらは音量や明瞭度を高めて「よく聞こえるようにする」従来技術である。（ｆ）は、遮音（Sound Insulation）であり、空間を音響的に分離しできるだけ「聞こえないようにする」従来技術である。これらに対して実施の形態に係るアプローチは（ｅ）のＳＤ（Speech Deformation）であり、会話者本人の原音声を処理して準実時間で出力することにより、聞こえる聞こえないではなく会話内容を「分からなくする」一種の音声情報撹乱（聴覚翻弄）技術である。また、従来技術による（ｂ）ＥＭや（ｃ）ＳＳや（ｄ）ＩＭが多かれ少なかれ室内あるいは対象空間領域の騒音レベルを上昇させて不快感や違和感を増加させ得るのに対し、（ｅ）のＳＤではほとんど騒音レベルの上昇を伴わない。 FIG. 1 is an explanatory diagram showing a conventional approach related to masking and an approach according to an embodiment divided into categories. (A) is SR (Sound Reinforcement) / PA (Public Address) using electroacoustics. These are conventional technologies that increase the volume and clarity and “make them sound better”. (F) is Sound Insulation, which is a conventional technique for acoustically separating a space and making it “not audible” as much as possible. On the other hand, the approach according to the embodiment is SD (Speech Deformation) of (e). By processing the original voice of the conversation person and outputting it in near real time, the conversation contents are not heard but not heard. It is a kind of voice information disruption (hearing) technique that makes it “unknown”. Further, (b) EM, (c) SS, and (d) IM according to the prior art can increase the noise level in the room or the target space region to increase the unpleasantness and discomfort, SD hardly causes an increase in noise level.

本発明の実施の形態では、発話中の音声である原音声（以下、マスキーと称す）に処理音声（以下、マスカーと称す）を加えた全体の音量を低減するために、以下の併用／工夫が可能である。
マスカーの情報隠蔽効果を高めるために、ＡＮＣ（Active Noise Control）またはパラメータ固定のＰＮＣ（Passive Noise Control）技術を併用する。 In the embodiment of the present invention, in order to reduce the overall volume of processing voice (hereinafter referred to as “masker”) added to the original speech (hereinafter referred to as “maskee”) that is the speech being uttered, Is possible.
In order to increase the masking effect of the masker, ANC (Active Noise Control) or PNC (Passive Noise Control) technology with fixed parameters is used in combination.

図２は、実施の形態に係る音声情報秘話システム１００が設けられたブース２を模式的に示す斜視図である。図３は、図２の音声情報秘話システム１００の機能および構成を模式的に示すブロック図である。
音声情報秘話システム１００は、銀行の相談カウンターなど、簡易パーティションで区画されたブース２に設けられる。音声情報秘話システム１００は、マイクロホンＭｉｃと、ＳＤコントローラ部ＳＤと、２つのパワーアンプＰＡと、２つのスピーカＳＰと、を備える。スピーカＳＰおよびＳＤコントローラ部ＳＤは、ブース間を視覚的に隔てるＩＴパーティション４に組み込まれてもよい。 FIG. 2 is a perspective view schematically showing the booth 2 in which the audio information secret system 100 according to the embodiment is provided. FIG. 3 is a block diagram schematically showing the function and configuration of the speech information secret system 100 of FIG.
The voice information secret story system 100 is provided in a booth 2 partitioned by a simple partition, such as a bank consultation counter. The audio information secret system 100 includes a microphone Mic, an SD controller unit SD, two power amplifiers PA, and two speakers SP. The speaker SP and the SD controller unit SD may be incorporated in the IT partition 4 that visually separates the booths.

相談員と会話を行っている顧客６を発話者とする。この発話者のマスキーH'(t)はカウンター部分またはその近傍に設けられたマイクロホンＭｉｃによって集音される。マイクロホンＭｉｃにより集音されたマスキーH'(t)は音声信号に変換され、ＳＤコントローラ部ＳＤに送られる。この音声信号はＳＤコントローラ部ＳＤによって非線形的に変更される。ＳＤコントローラ部ＳＤにおける非線形処理を経た音声信号はパワーアンプＰＡを経てスピーカＳＰから左右の隣接ブース２’にマスカーH(t)として出力される。 A customer 6 who has a conversation with a counselor is a speaker. The speaker's maskee H ′ (t) is collected by a microphone Mic provided at or near the counter portion. The maskee H ′ (t) collected by the microphone Mic is converted into an audio signal and sent to the SD controller unit SD. This audio signal is non-linearly changed by the SD controller unit SD. The audio signal that has undergone nonlinear processing in the SD controller section SD is output as a masker H (t) from the speaker SP to the left and right adjacent booths 2 'via the power amplifier PA.

隣接ブース２’にはマスキーH'(t)が空中を回り込んでくるので、顧客６が発話中の音声は隣接ブース２’内にいる受聴者８（顧客６とは異なる別の者）によって受聴されうる。しかしながら本実施の形態では、空中を回り込んで漏洩するマスキーH'(t)はマスカーH(t)と合成されて隣接ブース２’内の受聴者８に届く。したがってマスカーH(t)による擾乱により、受聴者８はマスキーH'(t)に含まれる会話の内容を理解することができない。 Since Muskie H '(t) goes around the air in the adjacent booth 2', the voice being spoken by the customer 6 depends on the listener 8 (a different person from the customer 6) in the adjacent booth 2 '. Can be heard. However, in the present embodiment, the masky H ′ (t) that leaks through the air is combined with the masker H (t) and reaches the listener 8 in the adjacent booth 2 ′. Therefore, the listener 8 cannot understand the content of the conversation included in the maskee H ′ (t) due to the disturbance caused by the masker H (t).

スピーカＳＰは、それが接続されているＳＤコントローラ部ＳＤやマイクロホンＭｉｃが設置されているブース２の隣の隣接ブース２’に向けてマスカーH(t)を出力する。ここで隣接ブース２’は、空中を回り込んで漏洩するマスキーH'(t)が受聴されている領域である。言い換えると、マスキーH'(t)とマスカーH(t)とが実質的に実時間で受聴者８に届くように、マスカーH(t)がスピーカＳＰから出力される。この実時間性を保証する主体はＳＤコントローラ部ＳＤであってもスピーカＳＰであってもよいが、以下ではＳＤコントローラ部ＳＤがマスキーH'(t)とマスカーH(t)との実時間性を考慮し、変更された音声信号をスピーカＳＰに出力する場合について説明する。 The speaker SP outputs a masker H (t) toward the adjacent booth 2 'next to the booth 2 where the SD controller unit SD and the microphone Mic to which the speaker SP is connected are installed. Here, the adjacent booth 2 ′ is an area where the muskey H ′ (t) leaking around the air is received. In other words, the masker H (t) is output from the speaker SP so that the maskee H ′ (t) and the masker H (t) reach the listener 8 substantially in real time. The main body that guarantees the real-time property may be the SD controller unit SD or the speaker SP, but in the following, the SD controller unit SD performs the real-time property of the maskee H ′ (t) and the masker H (t). The case where the changed audio signal is output to the speaker SP will be described.

図４は、図２のＩＴパーティション４の構成を示す側面図である。ＩＴパーティション４は、第１吸音層１４２と、遮音層１４４と、第２吸音層１４６と、をこの順に積層してなる積層構造を有する。第１吸音層１４２および第２吸音層１４６はそれぞれ厚さが２０ｍｍのグラスウールの層である。遮音層１４４は厚さが１２ｍｍの石膏ボードである。 FIG. 4 is a side view showing the configuration of the IT partition 4 of FIG. The IT partition 4 has a stacked structure in which a first sound absorbing layer 142, a sound insulating layer 144, and a second sound absorbing layer 146 are stacked in this order. Each of the first sound absorbing layer 142 and the second sound absorbing layer 146 is a glass wool layer having a thickness of 20 mm. The sound insulation layer 144 is a gypsum board having a thickness of 12 mm.

図５は、図３のＳＤコントローラ部ＳＤの機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウェア的には、コンピュータのＣＰＵ（central processing unit）をはじめとする素子や機械装置で実現でき、ソフトウェア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウェア、ソフトウェアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 5 is a block diagram showing the function and configuration of the SD controller unit SD of FIG. Each block shown here can be realized in hardware by an element such as a CPU (central processing unit) or a mechanical device, and in software by a computer program or the like. Describes functional blocks realized by collaboration. Accordingly, it is understood by those skilled in the art who have touched this specification that these functional blocks can be realized in various forms by a combination of hardware and software.

ＳＤコントローラ部ＳＤは、Ａ／Ｄ部２０と、部分抽出部３０と、非線形変更部４０と、出力部５０と、を備える。 The SD controller unit SD includes an A / D unit 20, a partial extraction unit 30, a nonlinear change unit 40, and an output unit 50.

マイクロホンＭｉｃにより集音されたマスキーH'(t)は音声信号に変換され、該音声信号はマイクアンプ（不図示）を経てＡ／Ｄ部２０に入力される。Ａ／Ｄ部２０は、アナログ信号である音声信号をデジタルデータに変換する。Ａ／Ｄ部２０でデジタル化された音声信号は、例えば音圧の大きさに応じた電圧値が時刻と対応付けられたデジタルデータである。 The maskee H ′ (t) collected by the microphone Mic is converted into an audio signal, and the audio signal is input to the A / D unit 20 through a microphone amplifier (not shown). The A / D unit 20 converts an audio signal that is an analog signal into digital data. The audio signal digitized by the A / D unit 20 is digital data in which a voltage value corresponding to the magnitude of sound pressure is associated with time, for example.

部分抽出部３０は、Ａ／Ｄ部２０でデジタル化された音声信号から変更対象部分の信号を抽出する。部分抽出部３０は、信号分割部３２と、部分決定部３４と、第１包絡線取得部３６と、第１スイッチ３９と、を含む。
部分抽出部３０は、変更対象部分の信号の抽出に関し、少なくとも略１山抽出モードおよびランダム分割モードの２つのモードを有する。部分抽出部３０において両モードは選択可能に構成される。本実施の形態では、ユーザは第１スイッチ３９を切り替えることでモードを切り替える。なお、第１スイッチ３９はハードウエアスイッチとして実装されてもよく、またソフトウエアスイッチとして実装されてもよい。 The partial extraction unit 30 extracts the signal of the change target portion from the audio signal digitized by the A / D unit 20. The partial extraction unit 30 includes a signal division unit 32, a partial determination unit 34, a first envelope acquisition unit 36, and a first switch 39.
The partial extraction unit 30 has at least two modes, ie, approximately one mountain extraction mode and random division mode, regarding the extraction of the signal of the change target portion. In the partial extraction unit 30, both modes are configured to be selectable. In the present embodiment, the user switches the mode by switching the first switch 39. The first switch 39 may be implemented as a hardware switch or may be implemented as a software switch.

（略１山抽出モード）
第１スイッチ３９がＡ／Ｄ部２０と第１包絡線取得部３６とを接続するように設定されている場合、部分抽出部３０は略１山抽出モードで動作する。略１山抽出モードでは、第１包絡線取得部３６は、音声信号の包絡線を示すデータを取得する。このデータは、例えば包絡線の大きさに応じた電圧値が時刻と対応付けられたデジタルデータである。以下、包絡線を示すデータを単に包絡線と称す。第１包絡線取得部３６は、自乗音圧取得部３７と、ローパスフィルタ３８と、を有する。 (Approximately one mountain extraction mode)
When the first switch 39 is set so as to connect the A / D unit 20 and the first envelope acquisition unit 36, the partial extraction unit 30 operates in approximately one mountain extraction mode. In the approximately one mountain extraction mode, the first envelope acquisition unit 36 acquires data indicating the envelope of the audio signal. This data is digital data in which a voltage value corresponding to the size of the envelope is associated with time, for example. Hereinafter, data indicating an envelope is simply referred to as an envelope. The first envelope acquisition unit 36 includes a square sound pressure acquisition unit 37 and a low-pass filter 38.

自乗音圧取得部３７は、Ａ／Ｄ部２０でデジタル化された音声信号の自乗音圧波形を取得する。自乗音圧取得部３７は、音声信号を自乗し、必要に応じて所定の係数を乗ずることにより自乗音圧波形を得る。 The squared sound pressure acquisition unit 37 acquires the squared sound pressure waveform of the voice signal digitized by the A / D unit 20. The squared sound pressure acquisition unit 37 squares the audio signal and obtains a squared sound pressure waveform by multiplying by a predetermined coefficient as necessary.

ローパスフィルタ３８は、自乗音圧取得部３７によって取得された自乗音圧波形を数ｍｓｅｃから数１００ｍｓｅｃの時定数で平均化する。すなわちローパスフィルタ３８は自乗音圧波形に対してローパスフィルタ処理をする。これにより、自乗音圧波形から時定数程度よりも速い変化が取り除かれ、滑らかな波形が得られる。本実施の形態では、この滑らかな波形が音声信号の包絡線である。なお、他の方法で音声信号の包絡線を求めてもよいことは、本明細書に触れた当業者には理解される。また、本実施の形態において包絡線は、広義には音声信号の振幅の変化を示すデータである。
ローパスフィルタ３８は、必要であればローパスフィルタ処理されたデータの平方根をとる。 The low-pass filter 38 averages the squared sound pressure waveform acquired by the squared sound pressure acquisition unit 37 with a time constant of several msec to several hundred msec. That is, the low-pass filter 38 performs a low-pass filter process on the squared sound pressure waveform. Thereby, a change faster than the time constant is removed from the squared sound pressure waveform, and a smooth waveform is obtained. In this embodiment, this smooth waveform is the envelope of the audio signal. It should be understood by those skilled in the art who have touched this specification that the envelope of the audio signal may be obtained by other methods. In the present embodiment, the envelope is data indicating a change in the amplitude of the audio signal in a broad sense.
The low-pass filter 38 takes the square root of the low-pass filtered data if necessary.

部分決定部３４は、ローパスフィルタ３８によって得られた音声信号の包絡線のうち、数ｄＢ〜数１０ｄＢ、例えば５ｄＢ以上連続して上昇する上昇部分を検出する。次に部分決定部３４は、上昇部分の後で数ｄＢ〜数１０ｄＢ、例えば５ｄＢ以上連続して下降する下降部分を検出する。部分決定部３４は、上昇部分とそれに対応する下降部分との間の音声信号を変更対象部分の信号として決定する。このようにして決定される変更対象部分の信号の包絡線は略１山状となることが多い。 The part determination unit 34 detects a rising part that continuously rises from several dB to several tens dB, for example, 5 dB or more, from the envelope of the audio signal obtained by the low-pass filter 38. Next, the part determining unit 34 detects a descending part that descends continuously several dB to several tens dB, for example, 5 dB or more after the ascending part. The part determination part 34 determines the audio | voice signal between a raise part and the fall part corresponding to it as a signal of a change object part. In many cases, the envelope of the signal of the change target portion determined in this way is approximately one mountain.

図６は、部分決定部３４における変更対象部分の信号の決定基準を説明するための説明図である。図６（ａ）は、部分決定部３４において上昇部分と下降部分の検出に基づいて変更対象部分の信号が決定される場合を説明するための説明図である。図６（ａ）は、例示としての音声信号の波形２１１とその包絡線２０８とを示す。部分決定部３４は、包絡線２０８の変化率に基づき上昇部分２０２を検出する。次に部分決定部３４は上昇部分２０２の後の下降部分２０４を検出する。部分決定部３４は、上昇部分２０２と下降部分２０４とで挟まれる区間２０６（ピーク２０３より前の時刻ｔ１とピーク２０３より後の時刻ｔ２とで挟まれる区間）の音声信号を変更対象部分の信号として決定する。 FIG. 6 is an explanatory diagram for explaining the determination criteria of the signal of the change target part in the part determination unit 34. FIG. 6A is an explanatory diagram for explaining a case where the signal of the change target part is determined based on the detection of the rising part and the falling part in the part determination unit 34. FIG. 6A shows an exemplary audio signal waveform 211 and its envelope 208. The part determination unit 34 detects the rising part 202 based on the rate of change of the envelope 208. Next, the part determining unit 34 detects the descending part 204 after the ascending part 202. The part determining unit 34 converts the audio signal of the section 206 (the section sandwiched between the time t1 before the peak 203 and the time t2 after the peak 203) sandwiched between the rising section 202 and the descending section 204 to the signal to be changed. Determine as.

なお、部分決定部３４は、他の方法で変更対象部分の信号を決定してもよい。例えば、部分決定部３４は、包絡線が膨らんでいる部分を検出し、その部分に対応する音声信号を変更対象部分の信号として決定してもよい。あるいはまた、部分決定部３４は、包絡線のピークを検出し、その前後に所定の長さを有する区間の音声信号を変更対象部分の信号として決定してもよい。あるいはまた、部分決定部３４は、包絡線が所定のレベルを越えている連続的な区間の音声信号を変更対象部分の信号として決定してもよい。 Note that the part determination unit 34 may determine the signal of the change target part by another method. For example, the part determination unit 34 may detect a part where the envelope is inflated and determine a sound signal corresponding to the part as a signal of the part to be changed. Or the part determination part 34 may detect the peak of an envelope, and may determine the audio | voice signal of the area which has a predetermined length before and behind as a signal of a change object part. Alternatively, the part determining unit 34 may determine the audio signal of the continuous section where the envelope exceeds a predetermined level as the signal of the change target part.

図６（ｂ）は、部分決定部３４においてピークの検出に基づいて変更対象部分の信号が決定される場合を説明するための説明図である。図６（ｂ）は、例示としての音声信号の波形２１２とその包絡線２１４とを示す。部分決定部３４は、包絡線２１４のピーク２１６を検出する。部分決定部３４は、ピーク２１６の前後に所定の長さを有する区間２１８の音声信号を変更対象部分の信号として決定する。 FIG. 6B is an explanatory diagram for explaining a case where the signal of the change target portion is determined based on the peak detection in the portion determination unit 34. FIG. 6B shows an exemplary audio signal waveform 212 and its envelope 214. The partial determination unit 34 detects the peak 216 of the envelope 214. The part determining unit 34 determines the audio signal of the section 218 having a predetermined length before and after the peak 216 as the signal to be changed.

図６（ｃ）は、部分決定部３４において包絡線のレベルに基づいて変更対象部分の信号が決定される場合を説明するための説明図である。図６（ｃ）は、例示としての音声信号の波形２２０とその包絡線２２２とを示す。部分決定部３４は、包絡線２２２が所定のレベル２２４を越えている連続的な区間２２６を検出し、その区間２２６の音声信号を変更対象部分の信号として決定する。この場合、所定のレベルの取り方によっては、変更対象部分の信号が２以上のピークを含む場合がある。 FIG. 6C is an explanatory diagram for explaining a case where the signal of the change target portion is determined based on the envelope level in the portion determination unit 34. FIG. 6C shows an exemplary audio signal waveform 220 and its envelope 222. The part determining unit 34 detects a continuous section 226 in which the envelope 222 exceeds a predetermined level 224, and determines the audio signal in the section 226 as a signal to be changed. In this case, depending on how to obtain a predetermined level, the signal of the change target portion may include two or more peaks.

以上のように変更対象部分の信号の決定手法は種々考えられる。このように選択肢が多いことは、ＳＤによる会話内容の隠蔽をより効果的とするための大きな自由度を提供するという意味で好適である。 As described above, various methods for determining the signal of the change target portion are conceivable. Such a large number of options is preferable in the sense that it provides a great degree of freedom for making the concealment of conversation contents by SD more effective.

また、これら種々の決定手法に通じて言えることは、音声信号の波形に基づいて、特にその統計的な性質に基づいて信号のひとまとまりが判別され、そのように判別されたひとまとまりの信号が変更対象部分の信号として決定されていることである。すなわち、入来する音声信号に応じて適応的に変更対象部分が決定される。この場合、本発明者の当業者としての経験および予備的な実験によると、例えば予め定められた一定の間隔で音声信号を切り出す場合と比べてより会話内容擾乱効果が高いことが見出された。特に、本発明者によって行われた実験によると、包絡線の略１山を変更単位として抽出する場合は、例えば一定周期で切り出す場合や子音や母音を変更単位とする場合と比べて擾乱効果が高いことが見出された。 In addition, what can be said through these various determination methods is that a group of signals is determined based on the waveform of an audio signal, particularly based on its statistical properties, and the group of signals thus determined is That is, it is determined as a signal of the part to be changed. That is, the change target portion is adaptively determined according to the incoming audio signal. In this case, according to the experience of the present inventor as a person skilled in the art and preliminary experiments, it has been found that the conversation content disturbance effect is higher than, for example, the case where audio signals are cut out at predetermined intervals. . In particular, according to an experiment conducted by the present inventor, when approximately one peak of an envelope is extracted as a change unit, the disturbance effect is more effective than, for example, cutting out at a constant period or using a consonant or vowel as a change unit. It was found to be expensive.

図５に戻る。
部分決定部３４は、音声信号のうち変更対象部分の信号として決定されなかった部分を遅延調整部５２に出力する。 Returning to FIG.
The part determining unit 34 outputs a part of the audio signal that has not been determined as the signal to be changed to the delay adjusting unit 52.

（ランダム分割モード）
第１スイッチ３９がＡ／Ｄ部２０と信号分割部３２とを接続するように設定されている場合、部分抽出部３０はランダム分割モードで動作する。ランダム分割モードでは、信号分割部３２は、Ａ／Ｄ部２０でデジタル化された音声信号をランダムな長さを有する期間で分割する。期間の長さは数１０ｍｓｅｃ〜数１００ｍｓｅｃの間で変動する。または期間の長さは一定周期に対して±数１０％〜数１００％の範囲で変動する。例えば、期間の長さは、…、１１ｍｓｅｃ、１０ｍｓｅｃ，１２ｍｓｅｃ、…、と変化する。 (Random split mode)
When the first switch 39 is set to connect the A / D unit 20 and the signal division unit 32, the partial extraction unit 30 operates in the random division mode. In the random division mode, the signal division unit 32 divides the audio signal digitized by the A / D unit 20 in a period having a random length. The length of the period varies between several tens of milliseconds to several hundreds of milliseconds. Alternatively, the length of the period varies within a range of ± several tens of% to several hundreds of% with respect to a certain period. For example, the length of the period changes as follows: 11 msec, 10 msec, 12 msec,.

部分決定部３４は、音声信号のうち信号分割部３２で分割された期間のひとつに対応する信号を変更対象部分の信号として決定する。部分決定部３４は、分割された全ての期間を変更対象部分として選択してもよいし、例えば１つおきに変更対象部分として選択してもよい。後者の場合、部分決定部３４は変更対象部分として選択されなかった期間に対応する部分の音声信号を遅延調整部５２に出力する。
ランダム分割モードでは、期間の長さにランダム性が加味されているので、マスカーH(t)の自然性が向上する。 The part determining unit 34 determines a signal corresponding to one of the periods divided by the signal dividing unit 32 in the audio signal as a signal of the change target part. The part determination unit 34 may select all the divided periods as the change target part, or may select every other period as the change target part. In the latter case, the part determining unit 34 outputs the audio signal of the part corresponding to the period not selected as the change target part to the delay adjusting unit 52.
In the random division mode, since the randomness is added to the length of the period, the naturalness of the masker H (t) is improved.

非線形変更部４０は、部分抽出部３０によって抽出された変更対象部分を非線形関数を使用して実時間、或いは準実時間で変更する。非線形変更部４０は、第１変更部４２と、第２変更部４４と、第３変更部４６と、第２スイッチ４８と、を含む。
非線形変更部４０は、少なくとも第１変更モード、第２変更モード、第３変更モードの３つのモードを有する。非線形変更部４０においてそれらのモードは選択可能に構成される。本実施の形態では、ユーザは第２スイッチ４８を切り替えることでモードを切り替える。なお、第２スイッチ４８はハードウエアスイッチとして実装されてもよく、またソフトウエアスイッチとして実装されてもよい。 The non-linear changing unit 40 changes the change target portion extracted by the partial extracting unit 30 in real time or near real time using a non-linear function. The nonlinear changing unit 40 includes a first changing unit 42, a second changing unit 44, a third changing unit 46, and a second switch 48.
The nonlinear changing unit 40 has at least three modes of a first change mode, a second change mode, and a third change mode. These modes are configured to be selectable in the nonlinear changing unit 40. In the present embodiment, the user switches the mode by switching the second switch 48. Note that the second switch 48 may be implemented as a hardware switch or a software switch.

（第１変更モード）
第２スイッチ４８が部分決定部３４と第１変更部４２とを接続するように設定されている場合、非線形変更部４０は第１変更モードで動作する。第１変更モードでは、第１変更部４２は、部分決定部３４において決定された変更対象部分の信号を取得し、それに非線形処理を施す。第１変更部４２は、第２包絡線取得部６２と、第１非線形処理部６４と、積算部６６と、を有する。 (First change mode)
When the second switch 48 is set to connect the partial determination unit 34 and the first change unit 42, the nonlinear change unit 40 operates in the first change mode. In the first change mode, the first change unit 42 acquires the signal of the change target portion determined by the portion determination unit 34 and performs nonlinear processing on the signal. The first change unit 42 includes a second envelope acquisition unit 62, a first nonlinear processing unit 64, and an integration unit 66.

第２包絡線取得部６２は、第１包絡線取得部３６と同様の構成を有する。すなわち、第２包絡線取得部６２は、部分抽出部３０によって抽出された変更対象部分の信号から包絡線を取得する。あるいはまた、部分抽出部３０において略１山モードが使用される場合、第２包絡線取得部６２は第１包絡線取得部３６によって取得された包絡線から変更対象部分の信号に対応する包絡線を取得してもよい。 The second envelope acquisition unit 62 has the same configuration as the first envelope acquisition unit 36. That is, the second envelope acquisition unit 62 acquires an envelope from the signal of the change target portion extracted by the partial extraction unit 30. Alternatively, when the substantially single mountain mode is used in the partial extraction unit 30, the second envelope acquisition unit 62 uses the envelope corresponding to the signal of the change target portion from the envelope acquired by the first envelope acquisition unit 36. May be obtained.

第１非線形処理部６４は、部分抽出部３０によって抽出された変更対象部分の信号を非線形関数を使用して処理する。非線形関数としては、例えば絶対値と対数変換を基礎とする関数が使用される。特に第１非線形処理部６４は、変更対象部分の信号（ｙ（ｔ））の絶対値（｜ｙ（ｔ）｜）の底２に対する対数（ｌｏｇ_２｜ｙ（ｔ）｜）を演算する。 The first nonlinear processing unit 64 processes the signal of the change target portion extracted by the partial extraction unit 30 using a nonlinear function. As the nonlinear function, for example, a function based on absolute value and logarithmic transformation is used. In particular, the first nonlinear processing unit 64 calculates the logarithm (log ₂ | y (t) |) of the absolute value (| y (t) |) of the signal (y (t)) of the change target portion with respect to the base 2.

積算部６６は、第２包絡線取得部６２によって取得された包絡線に基づいて、第１非線形処理部６４によって処理された変更対象部分の信号を変更する。特に積算部６６は、第２包絡線取得部６２によって取得された包絡線と第１非線形処理部６４における演算結果とを積算する。これにより、第１非線形処理部６４における処理によって包絡線の形状が崩れる場合でも、積算部６６における処理によって包絡線の形状を回復することができる。 The integrating unit 66 changes the signal of the change target portion processed by the first nonlinear processing unit 64 based on the envelope acquired by the second envelope acquiring unit 62. In particular, the integrating unit 66 integrates the envelope acquired by the second envelope acquiring unit 62 and the calculation result in the first nonlinear processing unit 64. Thereby, even when the shape of the envelope is broken by the processing in the first nonlinear processing unit 64, the shape of the envelope can be recovered by the processing in the integrating unit 66.

第１変更部４２は、部分決定部３４において決定される変更対象部分の信号ごとに上記処理を繰り返し、そのように処理された信号を遅延調整部５２に出力する。 The first changing unit 42 repeats the above process for each signal of the change target portion determined by the portion determining unit 34, and outputs the signal thus processed to the delay adjusting unit 52.

（第２変更モード）
第２スイッチ４８が部分決定部３４と第２変更部４４とを接続するように設定されている場合、非線形変更部４０は第２変更モードで動作する。第２変更モードでは、第２変更部４４は、部分決定部３４において決定された変更対象部分の信号を取得し、それに非線形処理を施す。第２変更部４４は、置換部６８と、第２非線形処理部７０と、を有する。 (Second change mode)
When the second switch 48 is set to connect the partial determination unit 34 and the second change unit 44, the nonlinear change unit 40 operates in the second change mode. In the second change mode, the second change unit 44 acquires the signal of the change target portion determined by the portion determination unit 34 and performs nonlinear processing on the signal. The second changing unit 44 includes a replacement unit 68 and a second nonlinear processing unit 70.

置換部６８は、変更対象部分の信号内で、ある時刻の信号値と別の時刻の信号値とを置換する。
第２非線形処理部７０は、置換部６８によって置換された変更対象部分の信号を非線形関数を使用して処理する。 The replacement unit 68 replaces the signal value at a certain time with the signal value at another time in the signal of the change target portion.
The second nonlinear processing unit 70 processes the signal of the change target portion replaced by the replacement unit 68 using a nonlinear function.

図７は、第２変更部４４における処理の一例を示す説明図である。図７の横軸は時間、縦軸は電圧を示す。図７の実線２２８は、Ａ／Ｄ部２０に入力されるアナログ信号としての音声信号の波形を示す。部分抽出部３０によって区間２３０の音声信号が変更対象部分の信号として抽出されたとする。変更対象部分の信号はデジタルデータであり、時刻ｔ_０、ｔ_１、…、ｔ_Ｎ（Ｎは自然数）のそれぞれに対応する電圧値ｙ_０＝ｆ（ｔ_０）、ｙ_１＝ｆ（ｔ_１）、…、ｙ_Ｎ＝ｆ（ｔ_Ｎ）を有する。ここではｔ_０＜ｔ_Ｎとし、各時刻は等間隔で並ぶものとする。
図７では、第１データ点２３２は（ｔ_０、ｙ_０）、第２データ点２３４は（ｔ_N-ｉ、ｙ_N-ｉ）（ｉは自然数、０≦ｉ≦Ｎ）、第３データ点２３６は（ｔ_Ｎ、ｙ_Ｎ）、を示す。 FIG. 7 is an explanatory diagram illustrating an example of processing in the second changing unit 44. In FIG. 7, the horizontal axis represents time, and the vertical axis represents voltage. A solid line 228 in FIG. 7 shows a waveform of an audio signal as an analog signal input to the A / D unit 20. It is assumed that the voice signal of the section 230 is extracted by the partial extraction unit 30 as the signal of the change target part. The signal of the change target portion is digital data, and voltage values y ₀ = f (t ₀ ) and y ₁ = f (t ₁ ) corresponding to the times t ₀ , t ₁ ,..., T _N (N is a natural number). ),..., Y _N = f (t _N ). Here, it is assumed that t ₀ <t _N and the times are arranged at equal intervals.
In FIG. 7, the first data point 232 is (t ₀ , y ₀ ), the second data point 234 is (t _N−i , y _N−i ) (i is a natural number, 0 ≦ i ≦ N), the third data Point 236 indicates (t _N , y _N ).

置換部６８は、変更対象部分の信号を準関数ｙ’＝ｆ（ｔ_Ｎ−ｔ_ｉ）を使用して変更する。例えば時刻ｔ_ｉについては、置換部６８はｙ_ｉをｙ’_ｉ＝ｆ（ｔ_Ｎ−ｔ_ｉ）＝ｆ（ｔ_Ｎ−ｉ）＝ｙ_Ｎ−ｉで置き換える。このように置換した後の第４データ点２３８は（ｔ_ｉ、ｙ’_ｉ＝ｙ_Ｎ−ｉ）で示される。図７の一点鎖線２４０は、置換部６８で置換された信号の波形を示す。 The replacement unit 68 changes the signal of the part to be changed using a quasi-function y ′ = f (t _N −t _i ). For example, for the time t _i , the replacement unit 68 replaces y _i with y ′ _i = f (t _N −t _i ) = f (t _N−i ) = y _N−i . The fourth data point 238 after such replacement is denoted by (t _i , y ′ _i = y _N−i ). A one-dot chain line 240 in FIG. 7 shows the waveform of the signal replaced by the replacement unit 68.

第２非線形処理部７０は、置換部６８で置換された信号を、対数などの非線形関数Ｙ＝ｇ（ｙ’）を使用して変更する。例えば、第４データ点２３８については、第２非線形処理部７０はｙ’_ｉをＹ_ｉ＝ｇ（ｙ’_ｉ）＝ｇ（ｙ_Ｎ−ｉ）とする。このように変更した後の第５データ点２４２は（ｔ_ｉ、Ｙ_ｉ＝ｇ（ｙ_Ｎ−ｉ））で示される。図７の２点鎖線２４４は、第２非線形処理部７０で変更された信号の波形を示す。 The second nonlinear processing unit 70 changes the signal replaced by the replacement unit 68 using a nonlinear function Y = g (y ′) such as a logarithm. For example, for the fourth data point 238, the second non-linear processing unit 70 sets y ′ _i to Y _i = g (y ′ _i ) = g (y _N−i ). The fifth data point 242 after such a change is indicated by (t _i , Y _i = g (y _N−i )). A two-dot chain line 244 in FIG. 7 indicates the waveform of the signal changed by the second nonlinear processing unit 70.

図５に戻る。
第２変更部４４は、部分決定部３４において決定される変更対象部分の信号ごとに上記処理を繰り返し、そのように処理された信号を遅延調整部５２に出力する。
なお、第２変更部４４における処理は上述の処理に限られない。例えば、ｔ_０とｔ_Ｎとの大小関係や準関数ｆの形として他のものを採用してもよい。 Returning to FIG.
The second changing unit 44 repeats the above processing for each signal of the change target portion determined by the portion determining unit 34, and outputs the signal thus processed to the delay adjusting unit 52.
In addition, the process in the 2nd change part 44 is not restricted to the above-mentioned process. For example, it is possible to employ other things in the form of t ₀ and t _N and magnitude relations and Quasi function f.

（第３変更モード）
第２スイッチ４８が部分決定部３４と第３変更部４６とを接続するように設定されている場合、非線形変更部４０は第３変更モードで動作する。第３変更モードでは、第３変更部４６は、部分決定部３４において決定された変更対象部分の信号を取得し、それに非線形処理を施す。第３変更部４６は、前処理部７２と、ＬＰＣ分析部７４と、残差処理部７６と、周波数特性変換部７８と、合成部８０と、を有する。 (Third change mode)
When the second switch 48 is set to connect the partial determination unit 34 and the third change unit 46, the nonlinear change unit 40 operates in the third change mode. In the third change mode, the third change unit 46 acquires the signal of the change target portion determined by the portion determination unit 34 and performs nonlinear processing on the signal. The third changing unit 46 includes a preprocessing unit 72, an LPC analysis unit 74, a residual processing unit 76, a frequency characteristic conversion unit 78, and a synthesis unit 80.

第３変更部４６は、変更対象部分の信号にフォルマント変換を施す。フォルマント変換技術はヘリウムガスを用いた深海作業などで、変性音声をもとの原音声に近いものに戻す時などに用いられるものである。 The third changing unit 46 performs formant conversion on the signal of the change target part. The formant conversion technology is used when returning the modified sound to a sound close to the original sound, such as in deep sea work using helium gas.

フォルマント変換処理は以下のように行われる。前処理部７２は、変更対象部分の信号にプリエンファシスを施す。ＬＰＣ分析部７４は、前処理部７２においてプリエンファシスが施された信号を線形予測（ＬＰＣ）分析し、声道の周波数特性と音源（残差信号）とに分ける。周波数特性変換部７８は、声道の周波数特性を変形する。残差処理部７６は、残差信号に対して所望の周波数になるようにダウンサンプリングを行う。あるいはまた、残差処理部７６は残差信号をそのまま使用する。合成部８０は、周波数特性変換部７８の出力と残差処理部７６の出力とを合成する。合成部８０において合成された信号は、もとの変更対象部分の信号と比較した場合、ピッチ周波数（音声の基本周波数）は同じだがフォルマントが変更された変形処理音声を示す信号である。したがって、この変形処理音声の内容は一般的に理解不能となる。 The formant conversion process is performed as follows. The pre-processing unit 72 performs pre-emphasis on the signal of the change target part. The LPC analysis unit 74 performs linear prediction (LPC) analysis on the signal that has been subjected to pre-emphasis in the preprocessing unit 72, and divides the signal into a vocal tract frequency characteristic and a sound source (residual signal). The frequency characteristic converter 78 transforms the frequency characteristic of the vocal tract. The residual processing unit 76 down-samples the residual signal so as to have a desired frequency. Alternatively, the residual processing unit 76 uses the residual signal as it is. The synthesizer 80 synthesizes the output of the frequency characteristic converter 78 and the output of the residual processor 76. The signal synthesized in the synthesizing unit 80 is a signal indicating a modified processed voice in which the pitch frequency (sound fundamental frequency) is the same but the formant is changed when compared with the original signal to be changed. Therefore, the contents of the modified processed speech are generally unintelligible.

第３変更部４６は、部分決定部３４において決定される変更対象部分の信号ごとに上記処理を繰り返し、そのように処理された信号を遅延調整部５２に出力する。 The third changing unit 46 repeats the above process for each signal of the change target part determined by the part determining unit 34, and outputs the signal thus processed to the delay adjusting unit 52.

出力部５０は、非線形変更部４０からは非線形処理された変更対象部分の信号を、部分抽出部３０からは変更対象部分でない信号を、取得する。出力部５０は、それらをアナログ信号に変換し、パワーアンプＰＡを介してスピーカＳＰに出力する。出力部５０は、遅延調整部５２と、Ｄ／Ａ部５４と、を含む。 The output unit 50 acquires the signal of the change target portion subjected to the nonlinear processing from the nonlinear change unit 40 and the signal that is not the change target portion from the partial extraction unit 30. The output unit 50 converts them into analog signals and outputs them to the speaker SP via the power amplifier PA. The output unit 50 includes a delay adjustment unit 52 and a D / A unit 54.

遅延調整部５２は、非線形処理された変更対象部分の信号と変更対象部分でない信号とをつなぎ合わせて出力すべき出力音声信号を生成する。遅延調整部５２は、出力音声信号が出力部５０から出力されるタイミングを、マスキーH'(t)の伝搬にかかる時間に応じて調整する。特に遅延調整部５２は、出力音声信号に対して所定の遅延を与える。この遅延は、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れがマスキーH'(t)とマスカーH(t)とが実質的に実時間と言える程度の範囲内に収まるように設定される。 The delay adjustment unit 52 generates an output audio signal to be output by connecting the signal of the change target portion subjected to nonlinear processing and the signal that is not the change target portion. The delay adjustment unit 52 adjusts the timing at which the output audio signal is output from the output unit 50 according to the time required for propagation of the maskee H ′ (t). In particular, the delay adjustment unit 52 gives a predetermined delay to the output audio signal. This delay is within a range where the masker H (t) delay with respect to the masker H '(t) at the listener 8 position is such that the masker H' (t) and the masker H (t) can be said to be substantially real time. Set to fit.

マスキーH'(t)とマスカーH(t)とが実質的に実時間（準実時間）であることは、例えばマスキーH'(t)とマスカーH(t)とが隣接ブース２’内で少なくとも部分的に重畳することである。あるいはまた、出力部５０から出力された変更対象部分の信号がスピーカＳＰによって音声に変換され、その変換された音声が、マスキーH'(t)が隣接ブース２’内で受聴されている間に隣接ブース２’に出力されることである。あるいはまた、出力部５０から出力された変更対象部分の信号がスピーカＳＰによって音声に変換され、その変換された音声が、当該変更対象部分の信号に対応するマスキーH'(t)の部分が隣接ブース２’内で受聴されている間に隣接ブース２’に出力されることである。これは言い換えると、変更対象部分の信号に対応するマスキーH'(t)の部分と、当該変更対象部分の信号に対応するマスカーH(t)の部分とが隣接ブース２’内で少なくとも部分的に重畳することである。 The fact that the maskee H '(t) and the masker H (t) are substantially in real time (semi-realtime) means that the maskee H' (t) and the masker H (t) are in the adjacent booth 2 '. At least partially overlapping. Alternatively, the signal of the change target portion output from the output unit 50 is converted into sound by the speaker SP, and the converted sound is received while the maskee H ′ (t) is received in the adjacent booth 2 ′. It is to be output to the adjacent booth 2 ′. Alternatively, the signal of the change target portion output from the output unit 50 is converted into sound by the speaker SP, and the converted sound is adjacent to the portion of the maskee H ′ (t) corresponding to the signal of the change target portion. This is to be output to the adjacent booth 2 ′ while listening in the booth 2 ′. In other words, the portion of the maskee H ′ (t) corresponding to the signal of the change target portion and the portion of the masker H (t) corresponding to the signal of the change target portion are at least partially in the adjacent booth 2 ′. It is to superimpose on.

音声情報秘話システム１００を導入する際、マイクロホンＭｉｃおよびスピーカＳＰの位置は決まり、想定される顧客６の位置および想定される受聴者８の位置もある程度は決まる。また、ＳＤコントローラ部ＳＤにおける処理時間もある程度見積もることができる。したがって、音声情報秘話システム１００の導入時に、顧客６から受聴者８へのマスキーH'(t)の伝搬時間およびマスカーH(t)の伝搬時間をある程度見積もることができる。遅延調整部５２における遅延は、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れの所望値から逆算して設定される。 When the voice information secret system 100 is introduced, the positions of the microphone Mic and the speaker SP are determined, and the position of the assumed customer 6 and the assumed position of the listener 8 are also determined to some extent. In addition, the processing time in the SD controller unit SD can be estimated to some extent. Therefore, at the time of introducing the voice information secret talk system 100, the propagation time of the maskee H ′ (t) and the propagation time of the masker H (t) from the customer 6 to the listener 8 can be estimated to some extent. The delay in the delay adjusting unit 52 is set by calculating back from a desired value of the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position.

マスキーH'(t)に対するマスカーH(t)の遅れが大きいと、受聴者８位置においてエコーや残響が生じる虞がある。したがって、遅延調整部５２は、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れがそのような違和感を生じさせない程度の値となるような遅延を出力音声信号に対して与える。この遅延は実験により定められるが、代表的には約１００ｍｓｅｃ以下である。
また、上述のように本発明者は、音声情報の理解を制御するためには、音声信号を略１山単位で取り扱うと有利であることに想到した。この観点からは、遅延調整部５２における遅延は音声信号の略１山の部分の時間幅に応じた、特にそれよりも小さな値とされると好ましい。マスカーの略１山部分とマスキーの略１山部分との相互作用が期待されるからである。 If the masker H (t) has a large delay with respect to the maskee H ′ (t), echoes or reverberations may occur at the listener 8 position. Therefore, the delay adjusting unit 52 gives a delay to the output audio signal such that the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position is a value that does not cause such a sense of incongruity. give. This delay is determined by experiment, but is typically about 100 msec or less.
Further, as described above, the present inventor has come up with the idea that it is advantageous to handle audio signals in units of approximately one mountain in order to control the understanding of audio information. From this point of view, it is preferable that the delay in the delay adjustment unit 52 be set to a value corresponding to the time width of the approximately one peak portion of the audio signal, particularly smaller than that. This is because an interaction between approximately one mountain portion of the masker and approximately one mountain portion of the musky is expected.

Ｄ／Ａ部５４は、遅延調整部５２によって遅延が付与された出力音声信号を、スピーカＳＰを駆動するためのアナログの音声信号に変換してパワーアンプＰＡに出力する。 The D / A unit 54 converts the output audio signal provided with the delay by the delay adjustment unit 52 into an analog audio signal for driving the speaker SP, and outputs the analog audio signal to the power amplifier PA.

図８は、受聴者８位置におけるマスキーH'(t)およびマスカーH(t)を表す音声信号の波形を示す波形図である。図８（ａ）は、マスキーH'(t)を表す音声信号の波形を示す波形図である。図８（ａ）の波形は「あのー、彼とはもう相当長いんだよ、実は（ANO KARETOWA MOSOTONAGAINDAYO ZITSUWA）」という原音声をマイクロホンＭｉｃで音声信号に変換したものである。図８（ａ）の縦軸は信号強度を任意の単位で表し、横軸は時間を表す。図８（ｂ）は、図８（ａ）の音声信号をＳＤコントローラ部ＳＤにおいて略１山抽出モードおよび第１変更モードを使用して処理することで生成される音声信号の波形を示す波形図である。図８（ｂ）に示される波形のＮで示される部分は、図８（ａ）に示される波形のＭで示される部分に対応する。図８（ｃ）も同様である。図８（ｂ）の音声信号と図８（ｃ）の音声信号との違いは、遅延調整部５２で付与される遅延の値である。 FIG. 8 is a waveform diagram showing waveforms of audio signals representing the maskee H ′ (t) and the masker H (t) at the listener 8 position. FIG. 8A is a waveform diagram showing the waveform of an audio signal representing the maskee H ′ (t). The waveform in FIG. 8 (a) is obtained by converting the original voice “ANO KARETOWA MOSOTONAGAINDAYO ZITSUWA” into a voice signal with the microphone Mic. In FIG. 8A, the vertical axis represents signal intensity in arbitrary units, and the horizontal axis represents time. FIG. 8B is a waveform diagram showing a waveform of an audio signal generated by processing the audio signal of FIG. 8A in the SD controller unit SD using the approximately one mountain extraction mode and the first change mode. It is. The portion indicated by N in the waveform shown in FIG. 8B corresponds to the portion indicated by M in the waveform shown in FIG. The same applies to FIG. The difference between the audio signal in FIG. 8B and the audio signal in FIG. 8C is the delay value provided by the delay adjustment unit 52.

図８（ａ）の包絡線と図８（ｂ）や図８（ｃ）の包絡線とを比較するとそれ程変化していないことが分かる。つまり音声のイントネーションや抑揚にそれ程変化はない。しかしながら図８（ｂ）や図８（ｃ）の音声信号がスピーカＳＰで音声に変換され、マスカーH(t)として出力されると、受聴者８サイトではマスキーH'(t)とマスカーH(t)とが合成されて聞こえ、その意味内容は理解されにくくなる。つまり「わからない」となることが多い（他の音に聞こえる場合もある）。 Comparing the envelope of FIG. 8A with the envelopes of FIG. 8B and FIG. 8C, it can be seen that there is not much change. In other words, there is not much change in voice intonation and intonation. However, when the audio signals of FIG. 8B and FIG. 8C are converted into audio by the speaker SP and output as the masker H (t), at the listener 8 site, the masky H ′ (t) and the masker H ( t) is synthesized and heard, and its meaning is difficult to understand. In other words, it is often “I don't know” (may be heard by other sounds).

マイクロホンＭｉｃ、スピーカＳＰ、顧客６、受聴者８の位置関係によっては、遅延調整部５２で遅延を付与しないとした場合にマスカーH(t)がマスキーH'(t)よりも早く受聴者８位置に到達することもある。すなわち、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れが負となる場合がある。ここで遅延調整部５２で付与する遅延を小さくすると、例えば図８（ｃ）に示される通り、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れが−Ｄ１（Ｄ１は正）となりうる。この場合、受聴者８は、未だ聴いていない未来のマスキーH'(t)を基に生成されたマスカーH(t)を聴いていることとなる。 Depending on the positional relationship between the microphone Mic, the speaker SP, the customer 6, and the listener 8, the masker H (t) is positioned earlier than the maskee H '(t) when the delay adjustment unit 52 does not give a delay. May also be reached. That is, there is a case where the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position is negative. If the delay given by the delay adjusting unit 52 is reduced, for example, as shown in FIG. 8C, the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position is −D1 (D1 is Positive). In this case, the listener 8 is listening to the masker H (t) generated based on the future maskey H ′ (t) that has not yet been listened to.

遅延調整部５２で付与する遅延を大きくしていくと、ある値で受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れがゼロとなり、その後増加していく。例えば図８（ｂ）に示される通り、受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れがＤ２（Ｄ２は正）となりうる。時間マスキング(temporal masking)の視点からは、マスカーとマスキーを同タイミングとするよりマスカーを僅かに遅らせた方がマスキング効果が高い場合もある。聴覚には、音声の包絡線の時間変化で内容を理解する一面もあるからである。したがって、そのような場合は遅延調整部５２で付与する遅延を大きくして受聴者８位置におけるマスキーH'(t)に対するマスカーH(t)の遅れを正とすると好適である。 When the delay provided by the delay adjusting unit 52 is increased, the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position becomes zero at a certain value, and then increases. For example, as shown in FIG. 8B, the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position can be D2 (D2 is positive). From the viewpoint of temporal masking (temporal masking), there are cases where the masking effect is higher when the masker is slightly delayed than when the masker and the maskee are set at the same timing. This is because the auditory sense also has one aspect of understanding the content by changing the voice envelope over time. Therefore, in such a case, it is preferable to increase the delay given by the delay adjusting unit 52 so that the delay of the masker H (t) with respect to the maskee H ′ (t) at the listener 8 position is positive.

また、マイクロホンＭｉｃ、スピーカＳＰ、顧客６、受聴者８の位置関係によっては、遅延調整部５２で遅延を付与しないとした場合にマスカーH(t)がマスキーH'(t)よりもかなり遅く受聴者８位置に到達することもある。この場合、マスキーH'(t)とマスカーH(t)とを受聴者８位置で実質的に実時間で合成して情報隠蔽を行うためには、ＳＤコントローラ部ＳＤでのＳＤ処理時間を短縮しなければならない。この時間的な制約の存在、つまりＳＤ処理時間を短縮しなければならないことにより、非線形処理の精度を犠牲にしなけらばならない場合もある。しかしながら本実施の形態の目的は音声の明瞭度・了解度の低減にあり、想定／予定した処理自体の正確さが目的ではない。したがって本実施の形態では、マスカーH(t)の重畳によりマスキーH'(t)の意味内容が理解し難くなるという条件が満たされれば処理の精度は大きな問題とはならない。これは「意味内容が理解し難くなるという条件」は無数にあるからである。 Further, depending on the positional relationship between the microphone Mic, the speaker SP, the customer 6, and the listener 8, the masker H (t) is received much later than the maskee H ′ (t) when the delay adjusting unit 52 does not give a delay. The position of the listener 8 may be reached. In this case, in order to conceal the information by synthesizing the maskee H '(t) and the masker H (t) in the real time in the listener 8 position, the SD processing time in the SD controller unit SD is shortened. Must. In some cases, the accuracy of nonlinear processing must be sacrificed due to the existence of this time constraint, that is, the SD processing time must be shortened. However, the purpose of this embodiment is to reduce the intelligibility and intelligibility of speech, and the accuracy of the assumed / scheduled processing itself is not the purpose. Therefore, in this embodiment, if the condition that it becomes difficult to understand the meaning content of the maskee H ′ (t) due to the superposition of the maskers H (t), the processing accuracy does not become a big problem. This is because there are an infinite number of “conditions that make it difficult to understand the semantic content”.

図９は、音声情報秘話システム１００における一連の処理を示すフローチャートである。マイクロホンＭｉｃは、マスキーH'(t)を収集し、音声信号を生成する（ステップ３０２）。Ａ／Ｄ部２０は、マスキーH'(t)を表す音声信号をマイクロホンＭｉｃから取得する（ステップ３０４）。部分抽出部３０は、Ａ／Ｄ部２０によって取得されＡ／Ｄ変換された音声信号から変更対象部分の信号を抽出する（ステップ３０６）。非線形変更部４０は、部分抽出部３０によって抽出された変更対象部分の信号を非線形関数を使用して変更する（ステップ３０８）。出力部５０は、非線形変更部４０によって変更された変更対象部分の信号をスピーカＳＰに出力する（ステップ３１０）。スピーカＳＰは、受け取った信号を音声に変換してマスカーH(t)とし、そのマスカーH(t)をマスキーH'(t)が受聴されている隣接ブース２’に出力する（ステップ３１２）。 FIG. 9 is a flowchart showing a series of processes in the speech information secret system 100. The microphone Mic collects the maskee H ′ (t) and generates an audio signal (step 302). The A / D unit 20 acquires an audio signal representing the maskee H ′ (t) from the microphone Mic (step 304). The partial extraction unit 30 extracts the signal of the change target portion from the audio signal acquired by the A / D unit 20 and A / D converted (step 306). The nonlinear changing unit 40 changes the signal of the change target portion extracted by the partial extracting unit 30 using a nonlinear function (step 308). The output unit 50 outputs the signal of the change target portion changed by the nonlinear changing unit 40 to the speaker SP (step 310). The loudspeaker SP converts the received signal into a voice to obtain a masker H (t), and outputs the masker H (t) to the adjacent booth 2 'where the maskee H' (t) is being listened to (step 312).

以上の構成による音声情報秘話システム１００の動作を説明する。銀行のブース２に顧客６が座り、銀行の相談員と例えばローンについて相談する場合を考える。この際、ブース２の隣の隣接ブース２’には受聴者８がいて口座の開設を申請しているとする。顧客６は自己の事業の資金繰りが悪化したなどローンを申請する事情を説明している。無論このような話は受聴者８に漏れ聞こえないほうがよく、特に本実施の形態に係る音声情報秘話システム１００では主に顧客６が発話中の音声を非線形処理したものが準実時間で受聴者８に届くので、受聴者８は顧客６の発話内容を理解できない。加えて顧客６の発話がない場合はスピーカＳＰから隣接ブース２’への出力は実質的にないため、隣接ブース２’内の騒音レベルを不必要に上昇させることもない。 The operation of the speech information secret system 100 having the above configuration will be described. Consider a case in which a customer 6 sits in a bank booth 2 and consults with a bank counselor about, for example, a loan. At this time, it is assumed that there is a listener 8 in the adjacent booth 2 ′ next to the booth 2 and an application for opening an account is being made. Customer 6 explains the circumstances of applying for a loan, such as the worsening of the cash flow of his business. Of course, it is better for the listener 8 not to leak such a story. In particular, in the speech information secret speech system 100 according to the present embodiment, the non-linear processing of the speech being spoken by the customer 6 is mainly the listener in near real time. 8, the listener 8 cannot understand the utterance content of the customer 6. In addition, when there is no utterance by the customer 6, there is substantially no output from the speaker SP to the adjacent booth 2 ', so that the noise level in the adjacent booth 2' is not increased unnecessarily.

上述の実施の形態において、ＳＤコントローラ部ＳＤは記憶装置を備えてもよく、そのような記憶装置の例は、ハードディスクやメモリである。また、本明細書の記載に基づき、各ブロックを、図示しないＣＰＵや、インストールされたアプリケーションプログラムのモジュールや、システムプログラムのモジュールや、ハードディスクから読み出したデータの内容を一時的に記憶するメモリなどにより実現できることは本明細書に触れた当業者には理解されるところである。 In the above-described embodiment, the SD controller unit SD may include a storage device, and examples of such a storage device are a hard disk and a memory. In addition, based on the description of the present specification, each block is stored in a CPU (not shown), an installed application program module, a system program module, a memory that temporarily stores data read from the hard disk, or the like. It will be understood by those skilled in the art who have touched this specification that it can be realized.

本実施の形態に係る音声情報秘話システム１００によると、以下の作用効果を得ることができる。 According to the speech information secret system 100 according to the present embodiment, the following operational effects can be obtained.

（１）本実施の形態に係る音声情報秘話システム１００によると、会話の存在そのものの隠蔽や抹消ではなく、その内容、つまり会話音声に含まれる情報が隠蔽される。この点に関し本発明者は以下を認識した。
オープンプランのオフィスや銀行や証券会社のロビーカウンター、特に簡易パーティションにより仕切られた接客カウンターなどでは、会話している人以外の人にその会話の中身を理解不能とすれば、会話内容の隠蔽という点では十分にその目的が果たされる。つまり会話の内容さえ漏れなければ音声そのものは聞こえてもよい。むしろ発話者の存在が視認できる場合などは、音声のスペクトルや包絡線（音質やイントネーション、抑揚）が保存されたほうが自然である。本実施の形態に係る音声情報秘話システム１００は、以上の視点・ニーズに対応し、より自然な形で会話内容を隠蔽する。 (1) According to the speech information secret system 100 according to the present embodiment, the content, that is, the information included in the conversation speech is concealed instead of concealing or deleting the presence of the conversation itself. In this regard, the inventor has recognized the following.
In open-plan offices, bank counters, and securities company lobby counters, especially at customer service counters that are partitioned by simple partitions, concealing the content of a conversation is possible if the contents of the conversation cannot be understood by anyone other than the person who is speaking. The point serves its purpose well. In other words, the voice itself may be heard as long as the content of the conversation is not leaked. Rather, when the presence of a speaker can be visually recognized, it is more natural to preserve the speech spectrum and envelope (sound quality, intonation, and intonation). The voice information secret system 100 according to the present embodiment corresponds to the above viewpoints and needs, and conceals conversation contents in a more natural manner.

なお、包絡線が保存されるとはいえ、本実施の形態ではその保存の程度は、例えばマスカーの包絡線がマスキーの包絡線に対して時間的に少しずれることや、両包絡線の形状が少し異なることを許す。つまり、マスカーの包絡線とマスキーの包絡線とが類似する程度に保存されるということである。本発明者の当業者としての経験および予備的な実験によると、マスカーの包絡線とマスキーの包絡線とが、等しいとまでは言えないが類似している程度である場合に、音声情報攪乱効果がより高いことが見出された。 Although the envelope is preserved, in this embodiment, the degree of preservation is, for example, that the envelope of the masker is slightly shifted in time from the envelope of the maskee, or the shape of both envelopes is Allow a little different. That is, the masker envelope and the masky envelope are preserved to a similar degree. According to the inventor's experience as a person skilled in the art and preliminary experiments, if the masker envelope and the masky envelope are not equal, but are similar, the speech information disturbance effect Was found to be higher.

図１０は、マスカーとマスキーの違いと認識率との関係を示すグラフである。図１０の縦軸は認識率を任意の単位（図１０の例ではパーセント（％））で示し、横軸はマスカーとマスキーの違いの度合いを任意の単位で示す。認識率は、マスカーとマスキーの両者を受聴している状態での自立語の認識率である。マスカーとマスキーの違いは、ここでは、両者の包絡線の違いを示す。
マスカーとマスキーの違いがゼロに近い場合は、認識率は高い（ほぼ１００％）。また、マスカーとマスキーの違いが大きい場合も、聴覚が両者を区別して認識しやすくなるため、認識率は高い。本発明者は、それらの間に、マスカーとマスキーとが異なるものではあるが区別もされにくく、したがって最も認識率が低くなるところがあることに想到した。そこではいわば聴覚が翻弄されるわけである。本実施の形態では、例えば遅延調整部５２における遅延を調整し、マスカーとマスキーの違いがそのような認識率の極小値を与える程度となるようにすることも可能である。 FIG. 10 is a graph showing the relationship between the difference between the masker and the maskee and the recognition rate. The vertical axis in FIG. 10 indicates the recognition rate in arbitrary units (in the example of FIG. 10, percentage (%)), and the horizontal axis indicates the degree of difference between the masker and the maskee in arbitrary units. The recognition rate is a recognition rate of independent words in a state where both the masker and the maskee are listening. Here, the difference between the masker and the maskee shows the difference in the envelope between the two.
If the difference between the masker and the maskee is close to zero, the recognition rate is high (almost 100%). Also, when the difference between the masker and the maskee is large, the recognition rate is high because the auditory sense makes it easy to distinguish between the two. The present inventor has conceived that, among them, the masker and the maskee are different from each other but are not easily distinguished, and therefore the recognition rate is the lowest. That is why hearing is tossed. In the present embodiment, for example, it is possible to adjust the delay in the delay adjustment unit 52 so that the difference between the masker and the maskee gives such a minimum value of the recognition rate.

（２）例えば隣接ブース２’で受聴されているマスキーH'(t)との関連性が薄い音声、例えば過去の音声、から生成された処理音声をその隣接ブース２’に流してマスキングを図ろうとする場合、無音部分の位置の違いやアーティキュレーションの違いなどにより思ったほど情報隠蔽効果を得ることはできず、また、不自然さが増大する。これに対して本実施の形態に係る音声情報秘話システム１００では、非線形処理されたマスカーH(t)がマスキーH'(t)と実質的に実時間で受聴者８の耳に届く。したがって、上記の場合と比較して、情報隠蔽の度合いは高く、かつ不自然さは低い。 (2) For example, processing voice generated from a voice having a low relevance to Musky H ′ (t) received at the adjacent booth 2 ′, for example, a past voice, is sent to the adjacent booth 2 ′ for masking. When trying to do so, the information hiding effect cannot be obtained as much as expected due to the difference in the position of the silent part or the difference in articulation, and the unnaturalness increases. On the other hand, in the speech information secret system 100 according to the present embodiment, the non-linearly processed masker H (t) reaches the ear of the listener 8 substantially in real time with the masque H '(t). Therefore, compared with the above case, the degree of information concealment is high and the unnaturalness is low.

（３）実施の形態の略１山モードでは、変更対象部分の信号として略１山状の信号が抽出される。この場合、マスキーH'(t)の信号レベルが小さい部分で切り取りや貼り付けが行われるので、非線形処理によるクリック雑音などが低減される。すなわち、マスキーH'(t)が時間的に連続であればマスカーH(t)もほぼ連続となるので、一定時間で区画する場合には生じうる遮断部分におけるクリック雑音や、その低減を目的とした窓掛け処理による包絡線形状の崩壊（イントネーションの崩壊）も生じにくい。 (3) In the substantially single mountain mode of the embodiment, a substantially one mountain signal is extracted as the signal of the change target portion. In this case, the cut noise and the pasting are performed at a portion where the signal level of the maskee H ′ (t) is small, so that click noise due to nonlinear processing is reduced. That is, if the masker H ′ (t) is continuous in time, the masker H (t) is also substantially continuous. It is difficult for the envelope shape to collapse (intonation collapse) due to the windowing process.

（４）マスカーH(t)は発話者本人のマスキーH'(t)を基に作成され、原音声と並行してスピーカから出力される。したがって、特に第１変更モードや第２変更モードではマスキーH'(t)のスペクトルや包絡線はマスカーH(t)となってもある程度保存されうる。その結果、マスカーH(t)のスペクトルやイントネーションはマスキーH'(t)のそれとほぼ同じとなるので、違和感はそれ程無く自然に聞き手に受け取られる。 (4) The masker H (t) is created based on the speaker's own maskee H ′ (t), and is output from the speaker in parallel with the original voice. Therefore, especially in the first change mode and the second change mode, the spectrum and envelope of the maskee H ′ (t) can be preserved to some extent even if it becomes the masker H (t). As a result, since the spectrum and intonation of the masker H (t) are almost the same as those of the maskee H '(t), the sense of incongruity is naturally received by the listener.

（５）時間軸上でマスキーH'(t)がないとき、つまり会話がないときはマスカーH(t)も出力されない。つまり両者は時間的に実質的に重畳する。したがって、音声発生のない「無音時」におけるマスカーH(t)による室内騒音レベルの上昇は抑えられる。 (5) When there is no maskee H '(t) on the time axis, that is, when there is no conversation, the masker H (t) is not output. That is, both overlap substantially in time. Therefore, an increase in the room noise level due to the masker H (t) during “no sound” when no sound is generated can be suppressed.

（６）従来の技術を使用した場合に発生しうるマスカー断続やレベル変動（会話停止時に断〜レベル低減）による違和感や、会話とは関係のない別の音（騒音・音楽）を放射することによる発話者・会話者・その他の在室者に対する違和感が抑えられる。 (6) Dissipating a feeling of discomfort due to intermittent maskers and level fluctuations (disrupted when the conversation is stopped to reduced level) that may occur when using conventional technology, and other sounds (noise / music) that are unrelated to conversation This reduces the sense of discomfort for speakers, conversers, and other people in the room.

（７）従来の技術における物理的な遮音や個室化に対しては、空間的な遮断や移動を必要としないので、開放感やコミュニケーションが妨げられにくくなる。 (7) With respect to the physical sound insulation and private room formation in the prior art, no spatial blockage or movement is required, so that a feeling of openness and communication are hardly hindered.

（８）ＳＤコントローラ部ＳＤおよびスピーカＳＰはＩＴパーティション４に組み込まれるので、システムの設置や取付を大幅に簡略化できる。場合によってはマイクロホンＭｉｃをＩＴパーティション４に組み込んでもよい。この場合、さらに簡略化される。 (8) Since the SD controller unit SD and the speaker SP are incorporated in the IT partition 4, the installation and installation of the system can be greatly simplified. In some cases, the microphone Mic may be incorporated in the IT partition 4. In this case, it is further simplified.

（９）ＩＴパーティション４はそれ自体が吸音処理されている。したがって、ブース内での会話音声の明瞭度を上げつつ隣接ブースへの音漏れを低減できる。 (9) The IT partition 4 itself is subjected to sound absorption processing. Therefore, sound leakage to the adjacent booth can be reduced while increasing the clarity of the conversation voice in the booth.

（１０）マスカーH(t)は非線形処理によりマスキーH'(t)（原音声）とは電気信号的な相関がそれ程高くない信号となる。したがって、音声情報秘話システム１００の動作時においてハウリングなどのフィードバックに起因する異常が生じにくい。 (10) The masker H (t) becomes a signal whose electrical signal correlation is not so high with the maskee H ′ (t) (original voice) by non-linear processing. Therefore, abnormalities due to feedback such as howling are less likely to occur during the operation of the speech information confidential system 100.

（１１）実施の形態の第１変更モードや第２変更モードでは、マスキーH'(t)を表す音声信号の包絡線をほぼ保存したままそのキャリアに非線形処理を施していると言える。したがって、そのような変更処理を短い時間で行うことが可能となる。 (11) In the first change mode and the second change mode of the embodiment, it can be said that the carrier is subjected to nonlinear processing while the envelope of the audio signal representing the maskee H ′ (t) is substantially preserved. Therefore, such a change process can be performed in a short time.

以上、実施の形態に係る音声情報秘話システム１００およびそれに含まれるＳＤコントローラ部ＳＤの構成と動作について説明した。この実施の形態は例示であり、その各構成要素や各処理の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 Heretofore, the configuration and operation of the audio information secret system 100 according to the embodiment and the SD controller unit SD included therein have been described. This embodiment is an exemplification, and it is understood by those skilled in the art that various modifications can be made to each component and combination of processes, and such modifications are within the scope of the present invention.

実施の形態では、隣接ブースの片側からマスカーH(t)が出力される場合について説明したが、これに限られない。例えば、信号加算によりマスカーH(t)が隣接ブースの左右両側から出力されてもよい。図１１は、第１変形例に係る音声情報秘話システムの機能および構成を模式的に示すブロック図である。第１変形例に係る音声情報秘話システムは、マイクロホンＭｉｃと、ＳＤコントローラ部ＳＤと、４つのスピーカＳＰａ〜ＳＰｄ（ＳＰｄは不図示）と、４つのパワーアンプＰＡａ〜ＰＡｄ（ＰＡｄは不図示）と、４つの加算器２１０ａ〜２１０ｄ（２１０ｄは不図示）と、を備える。 In the embodiment, the case where the masker H (t) is output from one side of the adjacent booth has been described, but the present invention is not limited to this. For example, the masker H (t) may be output from the left and right sides of the adjacent booth by signal addition. FIG. 11 is a block diagram schematically showing the function and configuration of the audio information secret system according to the first modification. The audio information secret system according to the first modification includes a microphone Mic, an SD controller unit SD, four speakers SPa to SPd (SPd is not shown), and four power amplifiers PAa to PAd (PAd is not shown). Four adders 210a to 210d (210d not shown).

ＳＤコントローラ部ＳＤにおける処理を経た音声信号は、ブース２の左のスピーカＳＰａに対応する加算器２１０ａと、ブース２の右のスピーカＳＰｂに対応する加算器２１０ｂと、ブース２の左隣の隣接ブース２’の左のスピーカＳＰｃに対応する加算器２１０ｃと、ブース２の右隣の隣接ブースの右のスピーカＳＰｄ（不図示）に対応する加算器２１０ｄ（不図示）と、に入力される。それぞれの加算器２１０ａ〜２１０ｄに入力された音声信号は対応するパワーアンプＰＡａ〜ＰＡｄを経てスピーカＳＰａ〜ＳＰｄから出力される。加算器はそれが接続されたスピーカが音声を出力するブースの両隣のブースから、ＳＤコントローラ部ＳＤにおける処理を経た音声信号を取得して加算する。
本変形例によると、マスカーH(t)が隣接ブース２’の左右両側から出力されるので、ブース２における会話内容が受聴者８により伝わりにくくなる。 The audio signal that has undergone the processing in the SD controller unit SD is the adder 210a corresponding to the left speaker SPa of the booth 2, the adder 210b corresponding to the right speaker SPb of the booth 2, and the adjacent booth adjacent to the left of the booth 2. The signal is input to the adder 210c corresponding to the left speaker SPc of 2 ′ and the adder 210d (not shown) corresponding to the right speaker SPd (not shown) of the adjacent booth adjacent to the right of the booth 2. The audio signals input to the adders 210a to 210d are output from the speakers SPa to SPd via the corresponding power amplifiers PAa to PAd. The adder acquires and adds the audio signal that has undergone processing in the SD controller unit SD from the booths adjacent to the booth to which the speaker to which the speaker is connected outputs audio.
According to this modification, since the masker H (t) is output from both the left and right sides of the adjacent booth 2 ′, the conversation contents in the booth 2 are difficult to be transmitted to the listener 8.

また、マスキーH'(t)のレベルを低減するためにＰＮＣ（Passive Noise Controller）を併用してもよい。ＰＮＣは公知のＡＮＣ（Active Noise Control）を調整時に適応処理させ、運用時には設定されたパラメータを固定して使用することを意図するものである。
図１２は、第２変形例に係る音声情報秘話システムの機能および構成を模式的に示すブロック図である。本変形例では、図１１のＳＤコントローラ部ＳＤを図１２の破線で囲まれた部分で置き換える。この部分ではＳＤコントローラ部ＳＤとＰＮＣ部ＰＮＣとが並列に設けられ、マイクロホンＭｉｃからの音声信号がＳＤコントローラ部ＳＤとＰＮＣ部ＰＮＣとに入力される。ＳＤコントローラ部ＳＤの出力側にはスイッチＳＷ１が設けられ、スイッチＳＷ１によってＳＤコントローラ部ＳＤの動作のオンオフが制御される。そのスイッチＳＷ１の出力とＰＮＣ部ＰＮＣの出力とは加算器４０６で加算され、パワーアンプＰＡを介してスピーカＳＰから音声として出力される。 Further, a PNC (Passive Noise Controller) may be used in combination to reduce the level of the maskee H ′ (t). The PNC intends to use a known ANC (Active Noise Control) adaptively at the time of adjustment, and to fix and use the set parameters at the time of operation.
FIG. 12 is a block diagram schematically showing the function and configuration of the speech information secret system according to the second modification. In this modification, the SD controller unit SD in FIG. 11 is replaced with a part surrounded by a broken line in FIG. In this part, the SD controller unit SD and the PNC unit PNC are provided in parallel, and an audio signal from the microphone Mic is input to the SD controller unit SD and the PNC unit PNC. A switch SW1 is provided on the output side of the SD controller unit SD, and the operation of the SD controller unit SD is controlled by the switch SW1. The output of the switch SW1 and the output of the PNC unit PNC are added by an adder 406 and output as sound from the speaker SP via the power amplifier PA.

本変形例では、音源４０２とアンプ４０４を介して接続されたヘッドトルソシミュレータＨＡＴＳ（HATS: Head and Torso Simulator）などを発話者位置Ｐに置いて、ＰＮＣ部ＰＮＣの同定を行う。スイッチＳＷ１を開いてＳＤコントローラ部ＳＤの動作を切り、ＨＡＴＳから適切な音声信号を放射して隣接ブース２’の受聴者位置Ｑに置いたマイクロホンＭｉｃ’の出力が最小になるようにＰＮＣ部ＰＮＣを適応動作させてシステム同定を行う。 In this modification, a head torso simulator HATS (HATS: Head and Torso Simulator) connected to the sound source 402 via the amplifier 404 is placed at the speaker position P, and the PNC unit PNC is identified. The switch SW1 is opened to turn off the operation of the SD controller unit SD, and an appropriate audio signal is emitted from the HATS so that the output of the microphone Mic ′ placed at the listener position Q in the adjacent booth 2 ′ is minimized. System identification is performed by adaptively operating.

このときマイクロホンＭｉｃおよびスピーカＳＰを含むインパルス応答は-h(x)となり、絶対値がＰＮＣ発話者−受聴者間のそれh(x)にほぼ等しくなる。その後スイッチＳＷ１を閉じ、同定されたパラメータを固定した状態でＰＮＣ部を稼動させる。すると発話者と受聴者の位置Ｐ、ＱおよびマイクロホンＭｉｃとスピーカＳＰの位置はほぼ固定されているので、マスキーH'(t)のレベルは効果的に低減され、マスカーH(t)が優勢となる。その結果、情報隠蔽（Information Masking）の効果が強められる。必要に応じてマスカーH(t)のレベルを下げると、マスキーH'(t)を含むシステム全体のレベル、つまり室内の騒音レベルをさらに低減することもできる。
なお、上述のＰＮＣ機能はＳＤコントローラ部ＳＤが組み込まれているコンピュータに組み込まれてもよい。 At this time, the impulse response including the microphone Mic and the speaker SP is −h (x), and the absolute value is substantially equal to that h (x) between the PNC speaker and the listener. Thereafter, the switch SW1 is closed, and the PNC unit is operated with the identified parameters fixed. Then, since the positions P and Q of the speaker and the listener and the positions of the microphone Mic and the speaker SP are substantially fixed, the level of the maskee H ′ (t) is effectively reduced, and the masker H (t) is dominant. Become. As a result, the effect of information masking is enhanced. If the level of the masker H (t) is lowered as necessary, the level of the entire system including the maskee H ′ (t), that is, the noise level in the room can be further reduced.
Note that the PNC function described above may be incorporated in a computer in which the SD controller unit SD is incorporated.

ＡＮＣ／ＰＮＣは既存の技術であるが、広い音場を３次元にわたりくまなく制御するのには向いていない。一方でカウンターのパーティションで囲まれた狭い空間のほぼ定まった位置に受聴者の頭が存在するようなケースでは３次元でも有効な音響低減手段となる。 Although ANC / PNC is an existing technology, it is not suitable for controlling a wide sound field all over three dimensions. On the other hand, in the case where the listener's head is present at a substantially fixed position in a narrow space surrounded by the partition of the counter, the sound reduction means is effective even in three dimensions.

実施の形態における変更対象部分の信号の処理にあたり、ハニング窓などの時間窓やゼロクロス検出を併用して、切り取り時に発生しうるクリック音などを除去してもよい。この場合、受聴者８あるいは在室者に与えうる違和感がさらに低減される。 In processing the signal of the change target portion in the embodiment, a time window such as a Hanning window or zero cross detection may be used in combination to remove a click sound that may occur at the time of clipping. In this case, the uncomfortable feeling that can be given to the listener 8 or the people in the room is further reduced.

実施の形態では、部分抽出部３０は、略１山抽出モードまたはランダム分割モードにより音声信号から変更対象部分の信号を抽出する場合について説明したが、これに限られない。例えば、部分抽出部は、マスキーH'(t)の無音部分または一定レベル以下の部分を「無音部」として変更対象部分から外してもよい。出力部５０は、無音部として変更対象部分から外された部分をそのまま無音部として出力してもよい。この場合、マスカーH(t)の音量（音圧レベル）ひいては室内騒音レベルの上昇を極力抑えることができる。また反対に、撹乱効果を強調する必要がある場合などは、抽出された包絡線に対数圧縮・伸長などの処理を施してもよい。また、部分抽出部は、音声信号の全体を変更対象部分の信号として抽出してもよい。 In the embodiment, the partial extraction unit 30 has described the case where the signal of the change target portion is extracted from the audio signal in the approximately one-peak extraction mode or the random division mode, but the present invention is not limited thereto. For example, the partial extraction unit may exclude the silent part of the maskee H ′ (t) or a part below a certain level as the “silent part” from the change target part. The output unit 50 may output a part removed from the change target part as a silent part as it is as a silent part. In this case, an increase in the volume (sound pressure level) of the masker H (t) and thus the room noise level can be suppressed as much as possible. Conversely, when it is necessary to emphasize the disturbance effect, the extracted envelope may be subjected to processing such as logarithmic compression / decompression. The partial extraction unit may extract the entire audio signal as a signal of the change target portion.

以上、実施の形態にもとづき本発明を説明したが、実施の形態は、本発明の原理、応用を示しているにすぎないことはいうまでもなく、実施の形態には、請求の範囲に規定された本発明の思想を逸脱しない範囲において、多くの変形例や配置の変更が可能であることはいうまでもない。 Although the present invention has been described based on the embodiments, the embodiments merely show the principle and application of the present invention, and the embodiments are defined in the claims. Needless to say, many modifications and arrangements can be made without departing from the spirit of the present invention.

２ブース、４ＩＴパーティション、６顧客、８受聴者、２０Ａ／Ｄ部、３０部分抽出部、４０非線形変更部、５０出力部、１００音声情報秘話システム。 2 booths, 4 IT partitions, 6 customers, 8 listeners, 20 A / D sections, 30 partial extraction sections, 40 non-linear change sections, 50 output sections, 100 voice information secret talk system.

Claims

A partial extractor for extracting a signal of a change target part from a voice signal representing a voice being uttered;
A non-linear changing unit that changes the signal of the change target portion extracted by the partial extracting unit using a non-linear function;
An output unit that outputs at least the signal of the change target portion changed by the non-linear change unit to a sound output unit capable of outputting sound to a region where the sound being uttered is received ;
The partial extraction unit is a signal in a section sandwiched between a first time before the peak of the envelope of the waveform of the audio signal and a second time after the peak, and a substantially one mountain-shaped signal, A voice changing device, characterized in that it is determined as a signal of a change target portion .

The nonlinear changing unit is
From the signal of the change target portion extracted by the partial extraction unit, an envelope acquisition unit that acquires information indicating an envelope of the waveform;
A non-linear processing unit that processes the signal of the change target portion extracted by the partial extraction unit using a non-linear function,
The non-linear change unit is a change target extracted by the partial extraction unit based on information indicating the envelope acquired by the envelope acquisition unit and a signal of the change target part processed by the non-linear processing unit The voice changing device according to claim 1, wherein the signal of the portion is changed.

The voice change device according to claim 1, wherein the non-linear changing unit performs formant transformation on the signal of the change target portion extracted by the partial extraction unit.

The apparatus further comprises a timing adjusting unit that adjusts a timing at which the signal of the change target portion changed by the nonlinear changing unit is output from the output unit according to a time taken for propagation of the voice during the utterance. The voice changing device according to any one of claims 1 to 3 .

Sound collecting means for receiving the voice being uttered and generating a voice signal representing the voice;
A sound changing device for changing a sound signal generated by the sound collecting means;
Voice output means for converting the voice signal changed by the voice changing device into voice and outputting the voice to the area where the voice being spoken is received,
The voice changing device is
A partial extractor for extracting a signal of a change target portion from the audio signal generated by the sound collecting means;
A non-linear changing unit that changes the signal of the change target portion extracted by the partial extracting unit using a non-linear function;
See containing at least an output unit for outputting a signal of the modified change target area by the non-linear changing unit to said audio output means, and
The partial extraction unit is a signal in a section sandwiched between a first time before the peak of the envelope of the waveform of the audio signal and a second time after the peak, and a substantially one mountain-shaped signal, A speech information secret talk system characterized by being determined as a signal of a part to be changed .

Extracting a signal of a change target portion from a voice signal representing a voice being uttered;
Changing the extracted signal to be modified using a nonlinear function;
Converts a signal of the changed change target area in the voice, seen including the steps of: outputting the converted voice to the area where the sound in the speech is listening,
The extracting step is a signal in a section sandwiched between a first time before the peak of the envelope of the waveform of the audio signal and a second time after the peak, and a substantially one mountain-shaped signal, A voice changing method comprising a step of determining as a signal of a change target portion .