JP2007233284A

JP2007233284A - Voice processing device and voice processing method

Info

Publication number: JP2007233284A
Application number: JP2006058095A
Authority: JP
Inventors: Yoshihiro Irie; 佳洋入江; Yoshitane Tanaka; 良種田中
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 2006-03-03
Filing date: 2006-03-03
Publication date: 2007-09-13
Anticipated expiration: 2026-03-03
Also published as: JP4785563B2

Abstract

<P>PROBLEM TO BE SOLVED: To protect privacy of a speaker without displeasing a person by effectively preventing voice superimposed on the speaker's voice to be output from becoming high-pitched voice. <P>SOLUTION: By storing data concerning two or more different spectral envelopes in a spectral envelope database 15, extracting a spectral fine structure 14 from voice signals of the speaker, and selecting a data concerning the spectral envelope from among the data concerning the spectral envelope stored in the spectral envelope database 15, a spectrum 17 of the anti-hearing sound output by synthesizing the selected spectral envelope and the spectral fine structure and superimposing the speaker's voice on the synthetic sound is created. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、話者のプライバシーを保護する（会話の秘密保持含む）ために話者の音声に被せて出力される音声の音声スペクトルを生成する音声処理装置および音声処理方法に関し、特に、話者の音声に被せて出力される音声が甲高い音になってしまうのを効果的に防止し、人に不快感を与えることなく話者のプライバシーを保護することができる音声処理装置および音声処理方法に関する。 The present invention relates to a speech processing apparatus and speech processing method for generating a speech spectrum of speech output over a speaker's voice in order to protect the privacy of the speaker (including confidentiality of conversation). TECHNICAL FIELD The present invention relates to a speech processing apparatus and a speech processing method that can effectively prevent a sound output from being covered with a voice of a voice and protect a speaker's privacy without causing discomfort to a person. .

従来、銀行、病院、証券会社などのオープンスペースでは、プライバシーに関わる内容の会話が頻繁におこなわれている。このため、話者のプライバシーを保護することを目的として、話者の音声（会話による音声）に対してマスキング音を出力するマスキング装置が開発されている（たとえば、特許文献１を参照）。 Conventionally, in open spaces such as banks, hospitals, and securities companies, conversations related to privacy have been frequently performed. For this reason, for the purpose of protecting the privacy of a speaker, a masking device that outputs a masking sound with respect to a speaker's voice (voice by conversation) has been developed (see, for example, Patent Document 1).

具体的には、かかる「マスキング音」としてホワイトノイズやＢＧＭなど、話者の発話音声を不明瞭にする妨害音を話者の発話音声に被せて出力し、話者の発話音声をかき消して発言内容を聞き取りにくくすることにより、話者のプライバシーを保護する。 Specifically, white noise, BGM, and other disturbing sounds that obscure the speaker's speech are output over the speaker's speech as the “masking sound”, and the speaker's speech is erased. Protect the speaker's privacy by making the content difficult to hear.

特開平６−１７５６６６号公報JP-A-6-175666

しかしながら、上述した従来技術では、話者のプライバシーを精度良く保護することができないという問題点があった。すなわち、上述した従来技術では、話者の音声との関連性が少ない音声をマスキング音として出力するため、話者の音声とマスキング音は別々の音声であると区別して傍聴者に認識されてしまうこととなり、話者のプライバシーを精度良く保護することができなかった。 However, the above-described prior art has a problem that the privacy of the speaker cannot be protected with high accuracy. That is, in the above-described prior art, since the voice having little relation to the voice of the speaker is output as the masking sound, the voice of the speaker and the masking sound are recognized as separate voices and recognized by the listener. As a result, the privacy of the speaker could not be protected with high accuracy.

このような問題を解決するため、話者の発話音声を用いて防聴音（後述にて説明）を生成し、マスキング音として出力することも考えられる。具体的には、話者の発話音声の音声スペクトルを検出し、音声スペクトルにおける山および谷の位置を反転・シフトして音声スペクトルを変形することにより防聴音を生成する。 In order to solve such a problem, it is also conceivable to generate a hearing-proof sound (described later) using the voice of the speaker and output it as a masking sound. Specifically, the hearing spectrum is generated by detecting the speech spectrum of the speech voice of the speaker and transforming the speech spectrum by inverting and shifting the positions of peaks and valleys in the speech spectrum.

ところが、単に音声スペクトルにおける山および谷を反転・シフトした場合には、防聴音が甲高い音になってしまい、防聴音を聞く人に不快感を与えてしまうという問題があった。そのため、防聴音が甲高い音になってしまうのをいかに効果的に防止し、防聴音を聞く人に不快感を与えないようにするかが重要な問題となる。 However, when the peaks and valleys in the voice spectrum are simply inverted / shifted, the hearing loss sound becomes high-pitched sound, and there is a problem in that the hearing person is uncomfortable. Therefore, it is an important problem how to effectively prevent the hearing loss from becoming high-pitched sound and not to give unpleasant feeling to the person who hears the hearing loss.

本発明は、上述した従来技術による問題点を解消するためになされたものであり、防聴音が甲高い音になってしまうのを効果的に防止し、人に不快感を与えることなく話者のプライバシーを保護することができる音声処理装置および音声処理方法を提供することを目的とする。 The present invention has been made to solve the above-described problems caused by the prior art, and effectively prevents the hearing loss from becoming a high-pitched sound. An object of the present invention is to provide an audio processing apparatus and an audio processing method capable of protecting privacy.

上述した課題を解決し、目的を達成するため、請求項１の発明に係る音声処理装置は、話者の音声に被せて出力される音声の音声スペクトルを生成する音声処理装置であって、異なる複数のスペクトル包絡に係るデータを記憶するスペクトル包絡データベースと、話者の音声信号からスペクトル微細構造を抽出するスペクトル微細構造抽出手段と、前記スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択するスペクトル包絡選択手段と、前記スペクトル包絡選択手段により選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成する音声スペクトル生成手段とを備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, a speech processing device according to claim 1 is a speech processing device that generates a speech spectrum of speech output over a speaker's speech, and is different. A spectral envelope database for storing data related to a plurality of spectral envelopes, a spectral fine structure extracting means for extracting a spectral fine structure from a speech signal of a speaker, and data related to a spectral envelope stored in the spectral envelope database Spectral envelope selection means for selecting data related to the spectral envelope, and the voice spectrum of the voice output over the voice of the speaker by synthesizing the spectral envelope selected by the spectral envelope selection means and the spectral fine structure. And a voice spectrum generating means for generating.

また、請求項２の発明に係る音声処理装置は、請求項１の発明において、前記スペクトル包絡選択手段は、話者の音声の時間変化量が所定値以上である場合に前記スペクトル包絡データベースからスペクトル包絡に係るデータを新たに選択し、前記音声スペクトル生成手段は、前記スペクトル包絡選択手段により新たに選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを新たに生成することを特徴とする。 According to a second aspect of the present invention, there is provided the speech processing apparatus according to the first aspect of the present invention, wherein the spectrum envelope selection means is configured to generate a spectrum from the spectrum envelope database when a temporal change amount of a speaker's voice is a predetermined value or more. The data related to the envelope is newly selected, and the speech spectrum generation means outputs the voice over the speaker by synthesizing the spectrum envelope newly selected by the spectrum envelope selection means and the spectrum fine structure. A voice spectrum of voice is newly generated.

また、請求項３の発明に係る音声処理装置は、請求項１または２の発明において、前記スペクトル包絡選択手段は、前記スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータをランダムに選択することを特徴とする。 According to a third aspect of the present invention, there is provided the speech processing apparatus according to the first or second aspect, wherein the spectrum envelope selecting means relates to a spectrum envelope from data related to a spectrum envelope stored in the spectrum envelope database. Data is selected at random.

また、請求項４の発明に係る音声処理装置は、請求項１または２の発明において、話者の音声信号からスペクトル包絡を抽出するスペクトル包絡抽出手段をさらに備え、前記スペクトル包絡選択手段は、前記スペクトル包絡抽出手段により抽出されたスペクトル包絡と、前記スペクトル包絡データベースにデータが記憶されたスペクトル包絡との間の類似度に基づいて、前記スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択することを特徴とする。 According to a fourth aspect of the present invention, there is provided a speech processing apparatus according to the first or second aspect of the present invention, further comprising spectrum envelope extraction means for extracting a spectrum envelope from a speech signal of a speaker, wherein the spectrum envelope selection means is Based on the degree of similarity between the spectrum envelope extracted by the spectrum envelope extraction means and the spectrum envelope stored in the spectrum envelope database, the data related to the spectrum envelope stored in the spectrum envelope database It is characterized by selecting data relating to the spectral envelope.

また、請求項５の発明に係る音声処理装置は、話者の音声に被せて出力される音声の音声スペクトルを生成する音声処理装置であって、話者の音声信号からスペクトル微細構造を抽出するスペクトル微細構造抽出手段と、前記スペクトル微細構造抽出手段により抽出されたスペクトル微細構造と所定のスペクトル包絡とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成する音声スペクトル生成手段と、前記音声スペクトル生成手段により生成された音声スペクトルの所定の周波数領域におけるスペクトル強度を抑制することにより当該音声スペクトルを補正する周波数強度補正手段とを備えたことを特徴とする。 According to a fifth aspect of the present invention, there is provided a voice processing apparatus for generating a voice spectrum of a voice outputted over a speaker's voice, and extracting a spectral fine structure from the speaker's voice signal. A speech spectrum for generating a speech spectrum of speech output over a speaker's voice by synthesizing a spectral fine structure extraction means and the spectral fine structure extracted by the spectral fine structure extraction means and a predetermined spectral envelope It is characterized by comprising: generating means; and frequency intensity correcting means for correcting the sound spectrum by suppressing the spectrum intensity in a predetermined frequency region of the sound spectrum generated by the sound spectrum generating means.

また、請求項６の発明に係る音声処理装置は、請求項５の発明において、前記周波数強度補正手段は、話者の音声信号から得られる音声スペクトルと、前記音声スペクトル生成手段により生成された音声スペクトルとの差に基づいてスペクトル強度の補正量を設定することを特徴とする。 According to a sixth aspect of the present invention, in the fifth aspect of the present invention, the frequency intensity correcting means includes a voice spectrum obtained from a voice signal of a speaker and a voice generated by the voice spectrum generating means. The correction amount of the spectrum intensity is set based on the difference from the spectrum.

また、請求項７の発明に係る音声処理方法は、話者の音声に被せて出力される音声の音声スペクトルを生成する音声処理方法であって、話者の音声信号からスペクトル微細構造を抽出するスペクトル抽出工程と、前記スペクトル微細構造抽出工程によりスペクトル微細構造が抽出された場合に、あらかじめスペクトル包絡データベースに記憶された異なる複数のスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択するスペクトル包絡選択工程と、前記スペクトル包絡選択工程により選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成する音声スペクトル生成工程とを含んだことを特徴とする。 According to a seventh aspect of the present invention, there is provided a voice processing method for generating a voice spectrum of a voice outputted over a speaker's voice, and extracting a spectral fine structure from the speaker's voice signal. A spectrum for selecting data related to a spectrum envelope from a plurality of different spectrum envelope data stored in advance in a spectrum envelope database when a spectrum fine structure is extracted by the spectrum extraction step and the spectrum fine structure extraction step An envelope selection step, and a speech spectrum generation step of generating a speech spectrum of speech output over a speaker's speech by synthesizing the spectrum envelope selected by the spectrum envelope selection step and the spectrum fine structure. It is characterized by that.

また、請求項８の発明に係る音声処理方法は、話者の音声に被せて出力される音声の音声スペクトルを生成する音声処理方法であって、話者の音声信号からスペクトル微細構造を抽出するスペクトル微細構造抽出工程と、前記スペクトル微細構造抽出手段により抽出されたスペクトル微細構造と所定のスペクトル包絡とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成する音声スペクトル生成工程と、前記音声スペクトル生成工程により生成された音声スペクトルの所定の周波数領域におけるスペクトル強度を抑制することにより当該音声スペクトルを補正する周波数強度補正工程とを含んだことを特徴とする。 The voice processing method according to the invention of claim 8 is a voice processing method for generating a voice spectrum of a voice outputted over a speaker's voice, and extracts a spectral fine structure from the voice signal of the speaker. A speech spectrum for generating a speech spectrum of speech output over a speaker's speech by synthesizing a spectral fine structure extracted step and a spectral fine structure extracted by the spectral fine structure extracting means and a predetermined spectral envelope It includes a generation step and a frequency intensity correction step of correcting the voice spectrum by suppressing the spectrum intensity in a predetermined frequency region of the voice spectrum generated by the voice spectrum generation step.

請求項１または７の発明によれば、異なる複数のスペクトル包絡に係るデータをスペクトル包絡データベースが記憶し、話者の音声信号からスペクトル微細構造を抽出し、スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択し、選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成することとしたので、話者の音源情報を保持したスペクトル微細構造とスペクトル包絡とを利用して防聴音のスペクトルを生成するので、防聴音は話者の音源情報を保持しているため、話者の会話音声と融合し、話者の発言内容を聞き取りにくくすることができるとともに、話者の音声スペクトルを変形して防聴音を生成するのではなく、スペクトル包絡データベースにあらかじめ登録されたスペクトル包絡を用いて防聴音を生成するので、防聴音が甲高い音になって、人に不快感を与えてしまうことを効果的に防止することができるという効果を奏する。 According to the invention of claim 1 or 7, the spectrum envelope database stores data related to a plurality of different spectrum envelopes, extracts the spectral fine structure from the speech signal of the speaker, and stores the spectrum envelope stored in the spectrum envelope database. Since the data related to the spectral envelope is selected from the data, and the selected spectral envelope and the spectral fine structure are synthesized, the voice spectrum of the voice output over the speaker's voice is generated. Since the spectrum of the hearing loss is generated using the spectral fine structure and the spectrum envelope that hold the speaker's sound source information, the hearing loss holds the speaker's sound source information. This makes it difficult to hear the speaker's speech and transforms the speaker's voice spectrum to reduce hearing loss. Rather than creating a hearing loss using a spectrum envelope pre-registered in the spectrum envelope database, it effectively prevents the hearing loss from becoming a high pitched sound and causing discomfort to the person. There is an effect that can be.

また、請求項２の発明によれば、話者の音声の時間変化量が所定値以上である場合にスペクトル包絡データベースからスペクトル包絡に係るデータを新たに選択し、新たに選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを新たに生成することとしたので、話者の発言内容を聞き取りにくくするのに適したスペクトル包絡を話者の音声の変化に追従して選択することができるという効果を奏する。 According to the second aspect of the present invention, when the amount of time change of the speaker's voice is equal to or greater than a predetermined value, data related to the spectrum envelope is newly selected from the spectrum envelope database, and the newly selected spectrum envelope is selected. Since the speech spectrum of the speech that is output over the speaker's voice is newly generated by synthesizing with the spectral fine structure, the spectrum envelope suitable for making it difficult to hear the speech content of the speaker is spoken. The effect is that it is possible to select following the change of the person's voice.

また、請求項３の発明によれば、スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータをランダムに選択することとしたので、人に不快感を与えることのない防聴音の生成に用いられるスペクトル包絡を効率的に選択することができるという効果を奏する。 According to the invention of claim 3, since the data related to the spectrum envelope is randomly selected from the data related to the spectrum envelope stored in the spectrum envelope database, there is no discomfort to the person. There is an effect that it is possible to efficiently select a spectrum envelope used for generation of the hearing protection sound.

また、請求項４の発明によれば、話者の音声信号からスペクトル包絡をさらに抽出し、抽出されたスペクトル包絡と、スペクトル包絡データベースにデータが記憶されたスペクトル包絡との間の類似度に基づいて、スペクトル包絡データベースに記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択することとしたので、話者の音韻とかけ離れた音韻を表すスペクトル包絡を効果的に選択することができ、話者の発言内容を聞き取りにくくする防聴音を生成することができるという効果を奏する。 According to the invention of claim 4, the spectrum envelope is further extracted from the speech signal of the speaker, and based on the similarity between the extracted spectrum envelope and the spectrum envelope whose data is stored in the spectrum envelope database. Thus, since the data related to the spectral envelope is selected from the data related to the spectral envelope stored in the spectral envelope database, it is possible to effectively select the spectral envelope representing the phoneme far from the speaker's phoneme. It is possible to produce a hearing-proof sound that makes it difficult to hear the content of the speaker's speech.

また、請求項５または８の発明によれば、話者の音声信号からスペクトル微細構造を抽出し、抽出されたスペクトル微細構造と所定のスペクトル包絡とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを生成し、生成された音声スペクトルの所定の周波数領域におけるスペクトル強度を抑制することにより当該音声スペクトルを補正することとしたので、防聴音が甲高い音になる原因となる周波数領域のスペクトル強度を抑制することにより、人に不快感を与えてしまうことを効果的に防止することができるという効果を奏する。 According to the invention of claim 5 or 8, the spectral fine structure is extracted from the voice signal of the speaker, and the extracted spectral fine structure and a predetermined spectral envelope are synthesized to cover the voice of the speaker. Since the sound spectrum of the output sound is generated and the sound spectrum is corrected by suppressing the spectrum intensity in a predetermined frequency region of the generated sound spectrum, the hearing loss becomes a high-pitched sound. By suppressing the spectral intensity in the frequency domain, it is possible to effectively prevent a person from feeling uncomfortable.

また、請求項６の発明によれば、話者の音声信号から抽出されたスペクトル微細構造および所定のスペクトル包絡を合成することにより生成された音声スペクトルと、話者の音声信号から得られる音声スペクトルとの差に基づいてスペクトル強度の補正量を設定することとしたので、防聴音が甲高い音になる原因となる周波数領域のスペクトル強度の補正量を適切に設定することができるという効果を奏する。 According to the invention of claim 6, a speech spectrum generated by synthesizing the spectral fine structure extracted from the speech signal of the speaker and a predetermined spectrum envelope, and a speech spectrum obtained from the speech signal of the speaker Since the correction amount of the spectral intensity is set based on the difference between the two, the effect of being able to appropriately set the correction amount of the spectral intensity in the frequency domain that causes the hearing-proof sound to become a high-pitched sound is obtained.

以下に添付図面を参照して、本発明に係る音声処理装置および音声処理方法の好適な実施例を詳細に説明する。 Exemplary embodiments of an audio processing device and an audio processing method according to the present invention will be explained below in detail with reference to the accompanying drawings.

まず、本実施例１に係る音声処理の概念について説明する。図１は、実施例１に係る音声処理の概念を説明する図である。 First, the concept of audio processing according to the first embodiment will be described. FIG. 1 is a diagram illustrating the concept of sound processing according to the first embodiment.

図１に示すように、この音声処理では、話者の会話音声を不明瞭にする音声（この音声は会話音声が第三者に聞き取られる（聴かれる）のを防ぐことが目的であるため、以下、この音声を防聴音と呼ぶ）を生成する場合に、マイクロフォン等により話者の音声信号を取得し、所定の時間間隔で音声信号のスペクトル分析をおこなって、音圧や周波数分布などの会話音声の特徴を抽出する。図１には、「あ」、「い」、「う」、「え」、「お」という音声波形１０に対してスペクトル分析を適用した結果得られるスペクトログラム１１の例が示されている。 As shown in FIG. 1, this voice processing is intended to prevent the voice of the speaker from obscuring the conversation voice (this voice is intended to prevent the voice of the conversation from being heard by a third party) In the following, this voice is called hearing-proof sound), and the speaker's voice signal is acquired by a microphone, etc., and the spectrum of the voice signal is analyzed at a predetermined time interval, and the conversation such as sound pressure and frequency distribution is performed. Extract voice features. FIG. 1 shows an example of a spectrogram 11 obtained as a result of applying spectrum analysis to a speech waveform 10 of “A”, “I”, “U”, “E”, and “O”.

そして、このようなスペクトログラム１１から得られる短時間スペクトル１２から、音韻情報を表すスペクトル包絡１３と、音源情報を表すスペクトル微細構造１４とが抽出される。 Then, from the short-time spectrum 12 obtained from such a spectrogram 11, a spectrum envelope 13 representing phonological information and a spectral fine structure 14 representing sound source information are extracted.

一方、この音声処理においては、あらかじめ人の代表的な音声信号をクラスタリングなどの統計手法を用いて抽出し、抽出された音声信号のスペクトル包絡をスペクトル包絡データベース１５に複数登録しておく。 On the other hand, in this speech processing, representative speech signals of a person are extracted in advance using a statistical method such as clustering, and a plurality of spectrum envelopes of the extracted speech signals are registered in the spectrum envelope database 15.

そして、話者の会話音声のスペクトル分析から得られたスペクトル包絡１３と最も類似していない（スペクトル距離が最大である）スペクトル包絡がスペクトル包絡データベース１５に登録されたスペクトル包絡の中から選択され、選択されたスペクトル包絡１６により話者の会話音声から得られたスペクトル包絡１３が置換される。 Then, the spectrum envelope that is not most similar to the spectrum envelope 13 obtained from the spectrum analysis of the speaker's speech (the spectrum distance is maximum) is selected from the spectrum envelopes registered in the spectrum envelope database 15; The selected spectrum envelope 16 replaces the spectrum envelope 13 obtained from the speech of the speaker.

ここで、話者の会話音声のスペクトル分析から得られたスペクトル包絡１３と最も類似していないスペクトル包絡を選択する理由は、話者の音声の音韻とかけ離れた音韻を表すスペクトル包絡を基にして防聴音を生成することにより話者の発言内容を聞き取りにくくするためである。 Here, the reason why the spectrum envelope most similar to the spectrum envelope 13 obtained from the spectrum analysis of the speaker's speech is selected is based on the spectrum envelope representing the phoneme far from the phoneme of the speaker's speech. This is to make it difficult to hear the content of the speaker's speech by generating a hearing-proof sound.

続いて、話者の会話音声から得られたスペクトル包絡１３を置換したスペクトル包絡１６とスペクトル微細構造１４とが合成され、防聴音のスペクトル１７が生成される。そして、この防聴音のスペクトル１７から防聴音が生成され、スピーカからその防聴音が出力される。 Subsequently, the spectrum envelope 16 obtained by replacing the spectrum envelope 13 obtained from the conversational voice of the speaker and the spectrum fine structure 14 are synthesized, and a spectrum 17 of the hearing loss sound is generated. And the hearing-aid sound is produced | generated from the spectrum 17 of this hearing-aid sound, and the hearing-aid sound is output from a speaker.

このように、この音声処理では、話者の音源情報を保持したスペクトル微細構造１４と、人の代表的な音声信号のスペクトル包絡とを利用して防聴音のスペクトルを生成するので、防聴音は話者の音源情報を保持しているため、防聴音を話者の会話音声と融合させることができ、話者の発言内容を聞き取りにくくすることができる。 In this way, in this audio processing, the spectrum of the hearing loss sound is generated using the spectrum fine structure 14 holding the sound source information of the speaker and the spectrum envelope of the representative voice signal of the person. Since the sound source information of the speaker is held, the hearing-proof sound can be fused with the conversation voice of the speaker, and the content of the speaker's speech can be made difficult to hear.

また、話者の音声スペクトルにおける山および谷の位置を反転・シフトして防聴音を生成するのではなく、人の代表的な音声信号のスペクトル包絡をそのまま用いて防聴音を生成するため、防聴音が不自然に甲高い音になって防聴音を聞く人に不快感を与えてしまうことを防止することができる。 In addition, hearing loss is not generated by inverting and shifting the positions of peaks and valleys in the speaker's speech spectrum, but rather by using the spectral envelope of a typical human speech signal as it is. It is possible to prevent the hearing sound from becoming unnaturally high-pitched sound and causing discomfort to the person who hears the hearing-proof sound.

図２は、実施例１における防聴音のスペクトログラムと従来の防聴音のスペクトログラムとの比較を示す図である。図２には、話者が発声した原音声のスペクトログラム２０と、従来の防聴音のスペクトログラム２１と、本実施例の防聴音のスペクトログラム２２とが示されている。図２の各スペクトログラムにおいては色の濃い領域が強度の大きい領域に対応している。 FIG. 2 is a diagram showing a comparison between the spectrogram of the hearing protection sound in the first embodiment and the spectrogram of the conventional hearing protection sound. FIG. 2 shows a spectrogram 20 of the original voice uttered by the speaker, a spectrogram 21 of the conventional hearing aid sound, and a spectrogram 22 of the hearing aid sound of the present embodiment. In each spectrogram of FIG. 2, dark regions correspond to regions with high intensity.

ここで、従来の防聴音のスペクトログラム２１とは、話者の会話音声のスペクトル包絡１３における山および谷の位置を反転・シフトして、スペクトル微細構造１４と合成することにより生成された防聴音のスペクトログラムである。また、本実施例の防聴音のスペクトログラム２２とは、図１で説明したようにして生成された防聴音のスペクトログラムである。 Here, the spectrogram 21 of the conventional hearing protection sound is obtained by inverting and shifting the positions of peaks and valleys in the spectrum envelope 13 of the speaker's conversational speech and synthesizing with the spectrum fine structure 14. Spectrogram. Further, the hearing-aid spectrogram 22 of this embodiment is a spectrogram of the hearing-aid generated as described with reference to FIG.

図２に示すように、従来の防聴音のスペクトログラム２１では、中域（１ｋＨｚ〜４ｋＨｚ）の周波数領域における強度が原音声のスペクトログラム２０に比べて増大している。そして、この中域の周波数範囲における強度の増大が甲高い音の原因となっている。 As shown in FIG. 2, in the conventional deafening sound spectrogram 21, the intensity in the frequency range of the middle range (1 kHz to 4 kHz) is increased compared to the spectrogram 20 of the original speech. And the increase in the intensity in the mid frequency range causes a high-pitched sound.

それに対して、本実施例の防聴音のスペクトログラム２２では、中域の周波数範囲における強度の増大が抑制されていることがわかる。このように、中域の周波数範囲における強度の増大を抑制することにより、防聴音が甲高い音になることを効果的に防止することができる。 On the other hand, in the spectrogram 22 of the hearing-aid sound of the present embodiment, it can be seen that an increase in intensity in the mid-frequency range is suppressed. In this way, by suppressing an increase in intensity in the mid-frequency range, it is possible to effectively prevent the hearing loss from becoming a high-pitched sound.

なお、本実施例のように、人の代表的な音声信号のスペクトル包絡を利用して防聴音のスペクトルを生成する場合には、図２に示したような原音声のスペクトログラム２０の時間変化に応じて防聴音の生成に用いられるスペクトル包絡が再選択される。 Note that when the spectrum of the hearing loss sound is generated using the spectrum envelope of a typical human voice signal as in the present embodiment, the time change of the spectrogram 20 of the original voice as shown in FIG. In response, the spectral envelope used to generate the hearing loss is reselected.

具体的には、図２に示したように、原音声の時間変化が所定値以上になった場合には、話者の音声信号から抽出されたスペクトル包絡と最も類似していないスペクトル包絡がスペクトル包絡データベース１４に登録されたスペクトル包絡の中から新たに選択され、選択されたスペクトル包絡を用いて防聴音のスペクトルが生成される。これにより、話者の発言内容を聞き取りにくくするのに適したスペクトル包絡を原音声の変化に追従して適切に選択することができる。 Specifically, as shown in FIG. 2, when the time change of the original speech exceeds a predetermined value, the spectrum envelope most similar to the spectrum envelope extracted from the speaker's speech signal is the spectrum. A new spectrum is selected from the spectrum envelopes registered in the envelope database 14, and the spectrum of the hearing loss sound is generated using the selected spectrum envelope. As a result, it is possible to appropriately select a spectrum envelope suitable for making it difficult to hear the content of the speaker's speech following the change in the original speech.

つぎに、実施例１に係る音声処理装置の機能構成について説明する。図３は、実施例１に係る音声処理装置３０の機能構成を示す図である。図３に示すように、この音声処理装置３０は、入力部３１、表示部３２、音声入力受付部３３、スペクトル包絡データベース３４、音声生成部３５、音声出力部３６、制御部３７を有する。 Next, a functional configuration of the speech processing apparatus according to the first embodiment will be described. FIG. 3 is a diagram illustrating a functional configuration of the voice processing device 30 according to the first embodiment. As illustrated in FIG. 3, the speech processing device 30 includes an input unit 31, a display unit 32, a speech input reception unit 33, a spectrum envelope database 34, a speech generation unit 35, a speech output unit 36, and a control unit 37.

入力部３１は、各種情報の入力に用いられるキーボードやマウスなどの入力デバイスである。表示部３２は、各種情報を出力するディスプレイなどの表示デバイスである。音声入力受付部３３は、マイクロフォンなどから話者の音声信号を受け付け、Ａ／Ｄ変換および増幅処理をおこなって制御部３７に出力する受付部である。 The input unit 31 is an input device such as a keyboard and a mouse used for inputting various types of information. The display unit 32 is a display device such as a display that outputs various types of information. The voice input receiving unit 33 is a receiving unit that receives a speaker's voice signal from a microphone or the like, performs A / D conversion and amplification processing, and outputs the result to the control unit 37.

スペクトル包絡データベース３４は、図１で説明したようにして防聴音のスペクトルを生成する場合に、話者の音声信号から抽出されたスペクトル包絡を置き換える候補となるスペクトル包絡のデータを記憶したデータベースである。 The spectrum envelope database 34 is a database that stores spectrum envelope data that are candidates for replacing the spectrum envelope extracted from the speech signal of the speaker when the spectrum of the hearing loss sound is generated as described in FIG. .

音声生成部３５は、後に説明する制御部３７により生成された防聴音のスペクトルから防聴音の音声信号を生成する生成部である。音声出力部３６は、音声生成部３５により生成された音声信号のＤ／Ａ変換および増幅処理をおこなってスピーカに出力する出力部である。 The sound generation unit 35 is a generation unit that generates a sound signal of the hearing-aid sound from the spectrum of the hearing-aid sound generated by the control unit 37 described later. The audio output unit 36 is an output unit that performs D / A conversion and amplification processing on the audio signal generated by the audio generation unit 35 and outputs the result to a speaker.

制御部３７は、ＯＳ（Operating System）などの制御プログラム、各種処理の処理手順を規定したプログラム、および、各種データを格納するためのメモリを有し、種々の処理を実行する制御部である。 The control unit 37 includes a control program such as an OS (Operating System), a program that defines processing procedures for various processes, and a memory for storing various data, and is a control unit that executes various processes.

この制御部３７は、スペクトル分析部３７ａ、スペクトル微細構造抽出部３７ｂ、スペクトル包絡抽出部３７ｃ、スペクトル包絡選択部３７ｄ、スペクトル生成部３７ｅを有する。 The control unit 37 includes a spectrum analysis unit 37a, a spectrum fine structure extraction unit 37b, a spectrum envelope extraction unit 37c, a spectrum envelope selection unit 37d, and a spectrum generation unit 37e.

スペクトル分析部３７ａは、音声入力受付部３３からデジタル化された音声信号を受け付けてケプストラム分析をおこない、その結果得られるケプストラム係数のうち、高ケフレンシ部と低ケフレンシ部とをスペクトル微細構造抽出部３７ｂ、スペクトル包絡抽出部３７ｃにそれぞれ出力する分析部である。 The spectrum analysis unit 37a receives a digitized audio signal from the audio input reception unit 33 and performs cepstrum analysis. Among the cepstrum coefficients obtained as a result, the spectrum analysis unit 37a converts the high and low quefrency portions into a spectrum fine structure extraction unit 37b. And an analysis unit that outputs the spectrum envelope extraction unit 37c.

具体的には、スペクトル分析部３７ａは、音声信号に対してハニング窓やハミング窓などの所定の窓関数を適用し、高速フーリエ変換（FFT, Fast Fourier Transform）を用いた短時間スペクトル分析を実行する。 Specifically, the spectrum analysis unit 37a applies a predetermined window function such as a Hanning window or a Hamming window to the audio signal, and performs short-time spectrum analysis using Fast Fourier Transform (FFT). To do.

続いて、スペクトル分析部３７ａは、高速フーリエ変換の結果得られた値の絶対値を求め、さらにその絶対値の対数を算出する。そして、スペクトル分析部３７ａは、算出された対数の値に逆高速フーリエ変換（IFFT, Inverse Fast Fourier Transform）を適用し、ケプストラム係数を算出する。 Subsequently, the spectrum analysis unit 37a calculates an absolute value of a value obtained as a result of the fast Fourier transform, and further calculates a logarithm of the absolute value. Then, the spectrum analyzing unit 37a applies an inverse fast Fourier transform (IFFT) to the calculated logarithmic value to calculate a cepstrum coefficient.

その後、スペクトル分析部３７ａは、算出されたケプストラム係数に対してケプストラム窓を用いてリフタリングをおこなうことにより高ケフレンシ部と低ケフレンシ部とを抽出する。 Thereafter, the spectrum analyzing unit 37a extracts a high quefrency portion and a low quefrency portion by performing liftering on the calculated cepstrum coefficient using a cepstrum window.

また、このスペクトル分析部３７ａは、過去に音声入力受付部３３から受け付けた音声信号のスペクトルを記憶しておく。そして、スペクトル分析部３７ａは、新たに受け付けた音声信号のスペクトルと、過去に受け付けた音声信号のスペクトルとの間のスペクトル距離を算出し、そのスペクトル距離が所定値以上になった場合にスペクトル包絡選択部３７ｄに対して新たなスペクトル包絡を選択するよう指示する処理をおこなう。 The spectrum analyzing unit 37a stores the spectrum of the audio signal received from the audio input receiving unit 33 in the past. Then, the spectrum analysis unit 37a calculates a spectrum distance between the spectrum of the newly received sound signal and the spectrum of the sound signal received in the past, and the spectrum envelope when the spectrum distance becomes a predetermined value or more. Processing for instructing the selection unit 37d to select a new spectrum envelope is performed.

スペクトル微細構造抽出部３７ｂは、スペクトル分析部３７ａから高ケフレンシ部を受け付け、高速フーリエ変換を適用することによりスペクトル微細構造を抽出する抽出部である。スペクトル包絡抽出部３７ｃは、スペクトル分析部３７ａから低ケフレンシ部を受け付け、高速フーリエ変換を適用することによりスペクトル包絡を抽出する抽出部である。 The spectrum fine structure extraction unit 37b is an extraction unit that receives a high quefrency part from the spectrum analysis unit 37a and extracts a spectral fine structure by applying a fast Fourier transform. The spectrum envelope extraction unit 37c is an extraction unit that receives a low quefrency unit from the spectrum analysis unit 37a and extracts a spectrum envelope by applying a fast Fourier transform.

スペクトル包絡選択部３７ｄは、スペクトル包絡抽出部３７ｃにより抽出されたスペクトル包絡と、スペクトル包絡データベース３４に登録されたスペクトル包絡との間のスペクトル距離を算出し、スペクトル包絡データベース３４に登録されたスペクトル包絡のうちスペクトル距離が最大であるスペクトル包絡を、スペクトル包絡抽出部３７ｃにより抽出されたスペクトル包絡を置換するものとして選択する選択部である。 The spectrum envelope selection unit 37d calculates a spectral distance between the spectrum envelope extracted by the spectrum envelope extraction unit 37c and the spectrum envelope registered in the spectrum envelope database 34, and the spectrum envelope registered in the spectrum envelope database 34 Is a selection unit that selects the spectrum envelope having the maximum spectrum distance as a replacement for the spectrum envelope extracted by the spectrum envelope extraction unit 37c.

ここで、スペクトル距離としては、低ケフレンシ部の成分からなるベクトルのユークリッド距離が用いられる。なお、ここで用いられるスペクトル距離はこれに限定されず、ＦＦＴによるスペクトル距離や、線形予測（LPC, Linear Predictive Coding）分析により得られたスペクトル包絡に基づくスペクトル距離など、従来提案されているさまざまなスペクトル距離を用いてもよい。 Here, as the spectral distance, a Euclidean distance of a vector composed of the components of the low kerfrenality part is used. Note that the spectral distance used here is not limited to this, and various conventionally proposed spectral distances such as a spectral distance based on FFT and a spectral distance based on a spectral envelope obtained by linear predictive (LPC) analysis. Spectral distance may be used.

スペクトル生成部３７ｅは、スペクトル微細構造抽出部３７ｂにより抽出されたスペクトル微細構造と、スペクトル包絡選択部３７ｄにより選択されたスペクトル包絡とを合成して防聴音のスペクトルを生成する生成部である。 The spectrum generation unit 37e is a generation unit that synthesizes the spectrum fine structure extracted by the spectrum fine structure extraction unit 37b and the spectrum envelope selected by the spectrum envelope selection unit 37d to generate a spectrum of hearing loss.

つぎに、実施例１に係る音声処理の処理手順について説明する。図４は、実施例１に係る音声処理の処理手順を示すフローチャートである。図４に示すように、まず、音声処理装置３０の音声入力受付部３３は、マイクロフォンから音声信号の入力を受け付ける（ステップＳ１０１）。 Next, a processing procedure of audio processing according to the first embodiment will be described. FIG. 4 is a flowchart of the sound processing procedure according to the first embodiment. As shown in FIG. 4, first, the voice input receiving unit 33 of the voice processing device 30 receives an input of a voice signal from the microphone (step S101).

そして、スペクトル分析部３７ａは、入力された音声信号の音声波形のスペクトル分析を実行し、ケプストラム係数における高ケフレンシ部および低ケフレンシ部を算出する（ステップＳ１０２）。 Then, the spectrum analysis unit 37a performs spectrum analysis of the speech waveform of the input speech signal, and calculates a high quefrency portion and a low quefrency portion in the cepstrum coefficient (step S102).

続いて、スペクトル微細構造抽出部３７ｂは、スペクトル分析部３７ａから高ケフレンシ部を取得して、スペクトル微細構造を抽出する（ステップＳ１０３）。そして、スペクトル包絡抽出部３７ｃは、スペクトル分析部３７ａから低ケフレンシ部を取得して、スペクトル包絡を抽出する（ステップＳ１０４）。 Subsequently, the spectral fine structure extraction unit 37b acquires a high quefrency part from the spectral analysis unit 37a and extracts the spectral fine structure (step S103). And the spectrum envelope extraction part 37c acquires a low quefrency part from the spectrum analysis part 37a, and extracts a spectrum envelope (step S104).

その後、スペクトル分析部３７ａは、入力された音声信号のスペクトルと、過去に入力された音声信号のスペクトルとを比較して、スペクトルの時間変動が所定値以上となったか否かを調べる（ステップＳ１０５）。 After that, the spectrum analysis unit 37a compares the spectrum of the input voice signal with the spectrum of the voice signal input in the past, and checks whether or not the time variation of the spectrum has become a predetermined value or more (step S105). ).

スペクトルの時間変動が所定値以上でない場合には（ステップＳ１０５，Ｎｏ）、スペクトル包絡選択部３７ｄは、スペクトル包絡が選択済みか否かを調べる（ステップＳ１０６）。 When the time variation of the spectrum is not equal to or greater than the predetermined value (No at Step S105), the spectrum envelope selection unit 37d checks whether or not the spectrum envelope has been selected (Step S106).

そして、スペクトル包絡が選択済みでない場合には（ステップＳ１０６，Ｎｏ）、スペクトル包絡選択部３７ｄは、スペクトル包絡データベース３４に登録されたスペクトル包絡のデータを読み込む（ステップＳ１０７）。 If the spectrum envelope has not been selected (No at Step S106), the spectrum envelope selection unit 37d reads the spectrum envelope data registered in the spectrum envelope database 34 (Step S107).

続いて、スペクトル包絡選択部３７ｄは、スペクトル包絡抽出部３７ｃにより抽出されたスペクトル包絡とスペクトル距離が最も大きいスペクトル包絡をスペクトル包絡データベース３４に登録されたスペクトル包絡の中から選択する（ステップＳ１０８）。 Subsequently, the spectrum envelope selecting unit 37d selects the spectrum envelope having the largest spectrum distance and the spectrum envelope extracted by the spectrum envelope extracting unit 37c from the spectrum envelopes registered in the spectrum envelope database 34 (step S108).

その後、スペクトル生成部３７ｅは、選択されたスペクトル包絡と、スペクトル微細構造抽出部３７ｂにより抽出されたスペクトル微細構造とを合成した防聴音のスペクトルを生成する（ステップＳ１０９）。 Thereafter, the spectrum generation unit 37e generates a spectrum of the hearing loss sound obtained by synthesizing the selected spectrum envelope and the spectrum fine structure extracted by the spectrum fine structure extraction unit 37b (step S109).

ステップＳ１０５において、スペクトルの時間変動が所定値以上である場合には（ステップＳ１０５，Ｙｅｓ）、ステップＳ１０７に移行して、それ以後の処理を継続する。また、ステップＳ１０６において、スペクトル包絡が選択済みである場合には（ステップＳ１０６，Ｙｅｓ）、ステップＳ１０９に移行して、それ以後の処理を継続する。 In step S105, when the time variation of the spectrum is equal to or greater than the predetermined value (step S105, Yes), the process proceeds to step S107, and the subsequent processing is continued. In step S106, when the spectrum envelope has been selected (step S106, Yes), the process proceeds to step S109, and the subsequent processing is continued.

ステップＳ１０９の後、音声生成部３５は、スペクトル生成部３７ｅにより生成された防聴音のスペクトルから防聴音の音声信号を生成する（ステップＳ１１０）。そして、音声出力部３６は、音声生成部３５により生成された防聴音の音声信号をスピーカに出力する（ステップＳ１１１）。 After step S109, the sound generation unit 35 generates a hearing-aid sound signal from the hearing-aid spectrum generated by the spectrum generation unit 37e (step S110). Then, the audio output unit 36 outputs the hearing-aid audio signal generated by the audio generation unit 35 to the speaker (step S111).

その後、制御部３７は、防聴音の出力処理の終了指示がなされたか否かを調べ（ステップＳ１１２）、終了指示がなされた場合には（ステップＳ１１２，Ｙｅｓ）、この処理を終了する。終了指示がなされていない場合には（ステップＳ１１２，Ｎｏ）、ステップＳ１０１に移行して、それ以後の処理を繰り返す。 Thereafter, the control unit 37 checks whether or not an instruction to end the hearing-proof sound output process has been issued (step S112). If an instruction to end is given (Yes in step S112), the process ends. If the end instruction has not been given (No at Step S112), the process proceeds to Step S101, and the subsequent processing is repeated.

ステップＳ１０５において、スペクトルの時間変動が所定値以上となった場合には（ステップＳ１０５，Ｙｅｓ）、スペクトル生成部３７ｅは、設定済みのスペクトル包絡を棄却し、ステップＳ１０７に移行して、スペクトル包絡を新たに設定する処理をおこなう。 In step S105, when the time variation of the spectrum becomes equal to or greater than the predetermined value (step S105, Yes), the spectrum generation unit 37e rejects the set spectrum envelope, proceeds to step S107, and changes the spectrum envelope. Perform a new setting process.

なお、上記実施例では、スペクトル包絡データベース３４に登録されたスペクトル包絡の中からスペクトル包絡をスペクトル距離に基づいて選択することとしたが、スペクトル包絡データベース３４に登録されたスペクトル包絡の中からスペクトル包絡をランダムに選択してもよく、あるいは、その他の方法で選択してもよい。 In the above embodiment, the spectrum envelope is selected from the spectrum envelopes registered in the spectrum envelope database 34 based on the spectrum distance. However, the spectrum envelope is selected from the spectrum envelopes registered in the spectrum envelope database 34. May be selected randomly, or may be selected by other methods.

上述してきたように、実施例１によれば、異なる複数のスペクトル包絡に係るデータをスペクトル包絡データベース３４が記憶し、スペクトル微細構造抽出部３７ｂが、話者の音声信号からスペクトル微細構造を抽出し、スペクトル包絡選択部３７ｄが、スペクトル包絡データベース３４に記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択し、スペクトル生成部３７ｅが、選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される防聴音のスペクトルを生成することとしたので、話者の音源情報を保持したスペクトル微細構造とスペクトル包絡とを利用して防聴音のスペクトルを生成するので、防聴音は話者の音源情報を保持しているため、話者の会話音声と融合し、話者の発言内容を聞き取りにくくすることができるとともに、話者の音声スペクトルを変形して防聴音を生成するのではなく、スペクトル包絡データベース３４にあらかじめ登録されたスペクトル包絡を用いて防聴音を生成するので、防聴音が甲高い音になって、人に不快感を与えてしまうことを効果的に防止することができる。 As described above, according to the first embodiment, the spectral envelope database 34 stores data related to a plurality of different spectral envelopes, and the spectral fine structure extraction unit 37b extracts the spectral fine structure from the speech signal of the speaker. The spectrum envelope selection unit 37d selects the data related to the spectrum envelope from the data related to the spectrum envelope stored in the spectrum envelope database 34, and the spectrum generation unit 37e displays the selected spectrum envelope and the spectrum fine structure. Since it was decided to generate a spectrum of the hearing loss sound that is output over the speaker's voice by combining it, the spectrum of the hearing loss sound is obtained by using the spectral fine structure and the spectrum envelope that hold the speaker's sound source information. Because the sound-proofing sound holds the sound source information of the speaker, In addition, it is possible to make it difficult to hear the content of the speaker's speech, and to prevent the use of a spectrum envelope registered in the spectrum envelope database 34 in advance, instead of generating a hearing-proof sound by modifying the speaker's speech spectrum. Since the hearing sound is generated, it is possible to effectively prevent the hearing-proof sound from becoming a high-pitched sound and causing discomfort to the person.

また、実施例１によれば、スペクトル包絡選択部３７ｄが、話者の音声の時間変化量が所定値以上である場合にスペクトル包絡データベース３４からスペクトル包絡に係るデータを新たに選択し、スペクトル生成部３７ｅが、新たに選択されたスペクトル包絡とスペクトル微細構造とを合成することにより話者の音声に被せて出力される音声の音声スペクトルを新たに生成することとしたので、話者の発言内容を聞き取りにくくするのに適したスペクトル包絡を話者の音声の変化に追従して選択することができる。 Further, according to the first embodiment, the spectrum envelope selection unit 37d newly selects data related to the spectrum envelope from the spectrum envelope database 34 when the temporal change amount of the speaker's voice is equal to or greater than a predetermined value, and generates a spectrum. Since the unit 37e newly generates the speech spectrum of the speech that is output over the speech of the speaker by synthesizing the newly selected spectral envelope and the spectral fine structure, the content of the speech of the speaker Thus, it is possible to select a spectral envelope suitable for making it difficult to hear the voice following the change of the speaker's voice.

また、実施例１によれば、スペクトル包絡選択部３７ｄが、スペクトル包絡データベース３４に記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータをランダムに選択することとしたので、人に不快感を与えることのない防聴音の生成に用いられるスペクトル包絡を効率的に選択することができる。 Further, according to the first embodiment, the spectrum envelope selection unit 37d randomly selects the data related to the spectrum envelope from the data related to the spectrum envelope stored in the spectrum envelope database 34. It is possible to efficiently select a spectrum envelope used for generating a hearing-proof sound that does not give a pleasant feeling.

また、実施例１によれば、スペクトル包絡抽出部３７ｃが、話者の音声信号からスペクトル包絡を抽出し、スペクトル包絡選択部３７ｄが、抽出されたスペクトル包絡と、スペクトル包絡データベース３４にデータが記憶されたスペクトル包絡との間のスペクトル距離に基づいて、スペクトル包絡データベース３４に記憶されたスペクトル包絡に係るデータの中からスペクトル包絡に係るデータを選択することとしたので、話者の音韻とかけ離れた音韻を表すスペクトル包絡を効果的に選択することができ、話者の発言内容を聞き取りにくくする防聴音を生成することができる。 Further, according to the first embodiment, the spectrum envelope extraction unit 37c extracts the spectrum envelope from the speech signal of the speaker, and the spectrum envelope selection unit 37d stores the extracted spectrum envelope and the data in the spectrum envelope database 34. Since the spectrum envelope data is selected from the spectrum envelope data stored in the spectrum envelope database 34 based on the spectrum distance between the measured spectrum envelope and the spectrum envelope database 34, it is far from the phoneme of the speaker. A spectrum envelope representing a phoneme can be effectively selected, and a hearing-proof sound that makes it difficult to hear the content of a speaker's speech can be generated.

ところで、実施例１では、人の代表的な音声信号のスペクトル包絡を利用して、甲高さが抑制された防聴音のスペクトルを生成することとしたが、生成された防聴音のスペクトルにおいて甲高さの原因となる周波数領域のスペクトル強度を抑制することにより、防聴音が甲高い音になるのを防止することとしてもよい。そこで、本実施例２では、防聴音のスペクトルにおいて甲高さの原因となる周波数領域のスペクトル強度を抑制する場合について説明する。 By the way, in Example 1, the spectrum envelope of the representative hearing sound signal is used to generate the spectrum of the hearing-aid sound in which the instep height is suppressed. By suppressing the spectral intensity in the frequency region that causes the height, the hearing-proof sound may be prevented from becoming a high-pitched sound. Therefore, in the second embodiment, a case will be described in which the spectrum intensity in the frequency domain that causes the upper height in the spectrum of the hearing-proof sound is suppressed.

まず、本実施例２に係る音声処理の概念について説明する。図５は、実施例２に係る音声処理を説明する図である。 First, the concept of audio processing according to the second embodiment will be described. FIG. 5 is a diagram illustrating audio processing according to the second embodiment.

この音声処理では、防聴音のスペクトルにおいて甲高さの原因となる周波数領域のスペクトル強度補正量４０をあらかじめ算出しておく。図５の例では、１ｋＨｚ〜２ｋＨｚの周波数領域が甲高さの大きな原因となっており、特に補正量が大きくなっている。 In this sound processing, a spectrum intensity correction amount 40 in the frequency domain that causes the height of the hearing-proof sound spectrum is calculated in advance. In the example of FIG. 5, the frequency range of 1 kHz to 2 kHz is a major cause of the instep height, and the correction amount is particularly large.

図５に示したようなスペクトル強度補正量４０は、さまざまな話者の音声信号のスペクトルの特徴と、それらの話者の音声信号に基づいて生成された防聴音のスペクトルとを比較することにより算出される。 The spectral intensity correction amount 40 as shown in FIG. 5 is obtained by comparing the characteristics of the spectrum of the voice signals of various speakers with the spectrum of the hearing loss generated based on the voice signals of the speakers. Calculated.

図６は、スペクトル強度補正量４０の算出方法について説明する図である。図６に示すように、スペクトル強度補正量４０を算出する場合には、さまざまな話者の音声信号のスペクトルを収集し、収集したスペクトルの平均値（原音のスペクトル平均）を算出する。一方で、さまざまな話者の音声信号から生成された防聴音の音声信号のスペクトルを収集し、収集したスペクトルの平均値（防聴音のスペクトル平均）を算出する。 FIG. 6 is a diagram for explaining a method for calculating the spectrum intensity correction amount 40. As shown in FIG. 6, when calculating the spectrum intensity correction amount 40, the spectrums of the speech signals of various speakers are collected, and the average value of the collected spectra (the spectrum average of the original sound) is calculated. On the other hand, the spectrum of the hearing signal sound signal generated from the speech signals of various speakers is collected, and the average value of the collected spectrum (spectrum average of the hearing sound) is calculated.

そして、防聴音のスペクトル平均から原音のスペクトル平均を差し引いたスペクトルの増加分（防聴音のスペクトル増加分）を算出する。そして、防聴音のスペクトル増加分が正の値である周波数帯域を検出し、その周波数帯域における防聴音のスペクトル増加分を防聴音のスペクトルのスペクトル強度から減ずるスペクトル強度補正量４０として設定する。 Then, an increase in the spectrum obtained by subtracting the spectrum average of the original sound from the spectrum average of the hearing loss (a spectrum increase of the hearing loss) is calculated. Then, a frequency band in which the spectrum increase of the hearing loss sound is a positive value is detected, and the spectrum increase amount of the hearing loss sound in that frequency band is set as a spectrum intensity correction amount 40 that is subtracted from the spectrum intensity of the spectrum of the hearing protection sound.

このようにして、甲高さの原因となる周波数領域のスペクトル強度を抑制することにより、防聴音が甲高い音になることを効果的に防止することができる。 In this way, by suppressing the spectral intensity in the frequency region that causes the height of the instep, it is possible to effectively prevent the hearing-proof sound from becoming a high-intensity sound.

つぎに、実施例２に係る音声処理装置の機能構成について説明する。図７は、実施例２に係る音声処理装置５０の機能構成を示す図である。なお、ここでは、音声処理装置５０は、防聴音のスペクトルを話者の音声信号から抽出したスペクトル包絡を変化させることにより生成することとする。しかしながら、防聴音のスペクトルの生成方法はこれに限定されず、実施例１で説明したような方法など、その他の方法で生成することとしてもよい。 Next, a functional configuration of the sound processing apparatus according to the second embodiment will be described. FIG. 7 is a diagram illustrating a functional configuration of the speech processing apparatus 50 according to the second embodiment. Here, it is assumed that the speech processing device 50 generates the spectrum of the hearing-proof sound by changing the spectrum envelope extracted from the speech signal of the speaker. However, the generation method of the hearing-proof sound spectrum is not limited to this, and may be generated by other methods such as the method described in the first embodiment.

図７に示すように、この音声処理装置５０は、入力部５１、表示部５２、音声入力受付部５３、音声生成部５４、音声出力部５５、制御部５６を有する。 As illustrated in FIG. 7, the voice processing device 50 includes an input unit 51, a display unit 52, a voice input reception unit 53, a voice generation unit 54, a voice output unit 55, and a control unit 56.

入力部５１は、各種情報の入力に用いられるキーボードやマウスなどの入力デバイスである。表示部５２は、各種情報を出力するディスプレイなどの表示デバイスである。音声入力受付部５３は、マイクロフォンなどから話者の音声信号を受け付け、Ａ／Ｄ変換および増幅処理をおこなって制御部５６に出力する受付部である。 The input unit 51 is an input device such as a keyboard and a mouse used for inputting various information. The display unit 52 is a display device such as a display that outputs various types of information. The voice input reception unit 53 is a reception unit that receives a speaker's voice signal from a microphone or the like, performs A / D conversion and amplification processing, and outputs the result to the control unit 56.

音声生成部５４は、後に説明する制御部５６により生成された防聴音のスペクトルから防聴音の音声信号を生成する生成部である。音声出力部５５は、音声生成部５４により生成された音声信号のＤ／Ａ変換および増幅処理をおこなってスピーカに出力する出力部である。 The sound generation unit 54 is a generation unit that generates a hearing-aid sound signal from the hearing-aid spectrum generated by the control unit 56 described later. The audio output unit 55 is an output unit that performs D / A conversion and amplification processing on the audio signal generated by the audio generation unit 54 and outputs the result to a speaker.

制御部５６は、ＯＳなどの制御プログラム、各種処理の処理手順を規定したプログラム、および、各種データを格納するためのメモリを有し、種々の処理を実行する制御部である。 The control unit 56 includes a control program such as an OS, a program that defines processing procedures for various processes, and a memory for storing various data, and is a control unit that executes various processes.

この制御部５６は、スペクトル分析部５６ａ、スペクトル微細構造抽出部５６ｂ、スペクトル包絡抽出部５６ｃ、スペクトル包絡変形部５６ｄ、スペクトル生成部５６ｅ、周波数強度補正量算出部５６ｆ、周波数強度補正部５６ｇを有する。 The control unit 56 includes a spectrum analysis unit 56a, a spectrum fine structure extraction unit 56b, a spectrum envelope extraction unit 56c, a spectrum envelope deformation unit 56d, a spectrum generation unit 56e, a frequency intensity correction amount calculation unit 56f, and a frequency intensity correction unit 56g. .

スペクトル分析部５６ａは、実施例１で説明したスペクトル分析部５６ａと同様にして、音声入力受付部５３からデジタル化された音声信号を受け付けてケプストラム分析をおこない、その結果得られるケプストラム係数のうち、高ケフレンシ部と低ケフレンシ部とをスペクトル微細構造抽出部５６ｂ、スペクトル包絡抽出部５６ｃにそれぞれ出力する分析部である。 Similarly to the spectrum analysis unit 56a described in the first embodiment, the spectrum analysis unit 56a receives a digitized voice signal from the voice input reception unit 53, performs cepstrum analysis, and among the cepstrum coefficients obtained as a result, It is an analysis part which outputs a high quefrency part and a low quefrency part to the spectrum fine structure extraction part 56b and the spectrum envelope extraction part 56c, respectively.

スペクトル微細構造抽出部５６ｂは、スペクトル分析部５６ａから高ケフレンシ部を受け付け、高速フーリエ変換を適用することによりスペクトル微細構造を抽出する抽出部である。スペクトル包絡抽出部５６ｃは、スペクトル分析部５６ａから低ケフレンシ部を受け付け、高速フーリエ変換を適用することによりスペクトル包絡を抽出する抽出部である。 The spectral fine structure extraction unit 56b is an extraction unit that receives a high quefrency part from the spectral analysis unit 56a and extracts a spectral fine structure by applying a fast Fourier transform. The spectrum envelope extraction unit 56c is an extraction unit that receives a low quefrency unit from the spectrum analysis unit 56a and extracts a spectrum envelope by applying a fast Fourier transform.

スペクトル包絡変形部５６ｄは、抽出されたスペクトル包絡の山や谷の位置を変化させることによりスペクトル包絡の形状を変形させる変形部である。具体的には、スペクトル包絡変形部５６ｄは、スペクトル包絡に対して所定の反転軸を設定して、その反転軸を中心として山や谷の位置を反転させる。 The spectrum envelope deformation unit 56d is a deformation unit that deforms the shape of the spectrum envelope by changing the positions of the peaks and valleys of the extracted spectrum envelope. Specifically, the spectrum envelope deforming unit 56d sets a predetermined inversion axis with respect to the spectrum envelope, and inverts the positions of peaks and valleys around the inversion axis.

スペクトル生成部５６ｅは、スペクトル微細構造抽出部５６ｂにより抽出されたスペクトル微細構造と、スペクトル包絡変形部５６ｄにより変形されたスペクトル包絡とを合成して防聴音のスペクトルを生成する生成部である。 The spectrum generation unit 56e is a generation unit that generates a spectrum of hearing loss by synthesizing the spectrum fine structure extracted by the spectrum fine structure extraction unit 56b and the spectrum envelope deformed by the spectrum envelope deformation unit 56d.

周波数強度補正量算出部５６ｆは、スペクトル生成部５６ｅにより生成された防聴音のスペクトルにおけるスペクトル強度の補正量を算出する算出部である。具体的には、周波数強度補正量算出部５６ｆは、スペクトル分析部５６ａから、さまざまな話者の音声信号のスペクトルの情報を受信し、受信したスペクトルの平均値（原音のスペクトル平均）を算出する。 The frequency intensity correction amount calculation unit 56f is a calculation unit that calculates the correction amount of the spectrum intensity in the spectrum of the hearing aid sound generated by the spectrum generation unit 56e. Specifically, the frequency intensity correction amount calculation unit 56f receives the spectrum information of the speech signals of various speakers from the spectrum analysis unit 56a, and calculates the average value of the received spectrum (the spectrum average of the original sound). .

また、周波数強度補正量算出部５６ｆは、スペクトル生成部５６ｅからさまざまな話者の音声信号に基づいて生成された防聴音のスペクトルの情報を受信し、受信したスペクトルの平均値（防聴音のスペクトル平均）を算出する。 Further, the frequency intensity correction amount calculation unit 56f receives information on the spectrum of the hearing loss generated based on the voice signals of various speakers from the spectrum generation unit 56e, and receives the average value of the received spectrum (the spectrum of the hearing loss) Average).

その後、周波数強度補正量算出部５６ｆは、防聴音のスペクトル平均から原音のスペクトル平均を差し引いたスペクトルの増加分（防聴音のスペクトル増加分）を算出する。そして、周波数強度補正量算出部５６ｆは、防聴音のスペクトル増加分が正の値である周波数帯域を検出し、その周波数帯域における防聴音のスペクトル増加分をスペクトル強度の補正量として設定する。 Thereafter, the frequency intensity correction amount calculation unit 56f calculates an increase in spectrum (a spectrum increase in the hearing-aid sound) by subtracting the spectrum average in the original sound from the spectrum average in the hearing-aid sound. Then, the frequency intensity correction amount calculation unit 56f detects a frequency band in which the increase in the spectrum of the hearing loss is a positive value, and sets the increase in the spectrum of the hearing loss in that frequency band as the correction amount of the spectrum intensity.

周波数強度補正部５６ｇは、スペクトル生成部５６ｅにより生成された防聴音のスペクトルの所定の周波数領域におけるスペクトル強度を補正し、スペクトル強度が補正された防聴音のスペクトルを音声生成部５４に出力する補正部である。具体的には、周波数強度補正部５６ｇは、周波数強度補正量算出部５６ｆにより算出された補正量の情報に基づいて、防聴音のスペクトルのスペクトル強度を補正する。 The frequency intensity correction unit 56g corrects the spectrum intensity in a predetermined frequency region of the spectrum of the hearing loss generated by the spectrum generation unit 56e, and outputs the spectrum of the hearing loss whose spectrum intensity is corrected to the sound generation unit 54. Part. Specifically, the frequency intensity correction unit 56g corrects the spectrum intensity of the hearing-aid sound spectrum based on the correction amount information calculated by the frequency intensity correction amount calculation unit 56f.

つぎに、実施例２に係る音声処理の処理手順について説明する。図８は、実施例２に係る音声処理の処理手順を示すフローチャートである。図８に示すように、まず、音声処理装置５０の音声入力受付部５３は、マイクロフォンから音声信号の入力を受け付ける（ステップＳ２０１）。 Next, a processing procedure of audio processing according to the second embodiment will be described. FIG. 8 is a flowchart of the sound processing procedure according to the second embodiment. As shown in FIG. 8, first, the voice input receiving unit 53 of the voice processing device 50 receives an input of a voice signal from the microphone (step S201).

そして、スペクトル分析部５６ａは、入力された音声信号の音声波形のスペクトル分析を実行し、ケプストラム係数における高ケフレンシ部および低ケフレンシ部を算出する（ステップＳ２０２）。 Then, the spectrum analysis unit 56a performs spectrum analysis of the speech waveform of the input speech signal, and calculates a high quefrency portion and a low quefrency portion in the cepstrum coefficient (step S202).

続いて、スペクトル微細構造抽出部５６ｂは、スペクトル分析部５６ａから高ケフレンシ部を取得して、スペクトル微細構造を抽出する（ステップＳ２０３）。そして、スペクトル包絡抽出部５６ｃは、スペクトル分析部５６ａから低ケフレンシ部を取得して、スペクトル包絡を抽出する（ステップＳ２０４）。 Subsequently, the spectral fine structure extraction unit 56b acquires a high quefrency part from the spectral analysis unit 56a and extracts the spectral fine structure (step S203). And the spectrum envelope extraction part 56c acquires a low quefrency part from the spectrum analysis part 56a, and extracts a spectrum envelope (step S204).

その後、スペクトル包絡変形部５６ｄは、スペクトル包絡抽出部５６ｃにより抽出されたスペクトル包絡の山と谷の位置を変化させることによりスペクトル包絡を変形する（ステップＳ２０５）。 Thereafter, the spectrum envelope deforming unit 56d deforms the spectrum envelope by changing the positions of the peaks and valleys of the spectrum envelope extracted by the spectrum envelope extracting unit 56c (step S205).

そして、スペクトル生成部５６ｅは、スペクトル包絡変形部５６ｄにより変形されたスペクトル包絡と、スペクトル微細構造抽出部５６ｂにより抽出されたスペクトル微細構造とを合成した防聴音のスペクトルを生成する（ステップＳ２０６）。 Then, the spectrum generation unit 56e generates a spectrum of hearing loss that combines the spectrum envelope deformed by the spectrum envelope deformation unit 56d and the spectrum fine structure extracted by the spectrum fine structure extraction unit 56b (step S206).

続いて、周波数強度補正部５６ｇは、スペクトル生成部５６ｅにより生成された防聴音のスペクトルのあらかじめ設定された周波数領域におけるスペクトル強度を補正する（ステップＳ２０７）。このスペクトル強度の補正量の設定手順は、後に詳しく説明する。 Subsequently, the frequency intensity correction unit 56g corrects the spectrum intensity in the preset frequency region of the spectrum of the hearing aid sound generated by the spectrum generation unit 56e (step S207). The procedure for setting the correction amount of the spectral intensity will be described in detail later.

そして、音声生成部５４は、周波数強度補正部５６ｇによりスペクトル強度が補正された防聴音のスペクトルから防聴音の音声信号を生成する（ステップＳ２０８）。そして、音声出力部５５は、音声生成部５４により生成された防聴音の音声信号をスピーカに出力する（ステップＳ２０９）。 Then, the sound generation unit 54 generates a hearing-aid sound signal from the hearing-aid spectrum whose spectrum intensity is corrected by the frequency intensity correction unit 56g (step S208). Then, the audio output unit 55 outputs the audio signal of the hearing-proof sound generated by the audio generation unit 54 to the speaker (step S209).

その後、制御部５６は、防聴音の出力処理の終了指示がなされたか否かを調べ（ステップＳ２１０）、終了指示がなされた場合には（ステップＳ２１０，Ｙｅｓ）、この処理を終了する。終了指示がなされていない場合には（ステップＳ２１０，Ｎｏ）、ステップＳ２０１に移行して、それ以後の処理を繰り返す。 Thereafter, the control unit 56 checks whether or not an instruction to end the hearing-proof sound output process has been issued (step S210), and if an instruction to end is given (step S210, Yes), the process ends. If no termination instruction has been given (No at step S210), the process proceeds to step S201, and the subsequent processing is repeated.

つぎに、スペクトル強度補正量の設定処理の処理手順について説明する。図９は、スペクトル強度補正量の設定処理の処理手順を示すフローチャートである。図９に示すように、まず、音声処理装置５０の周波数強度補正量算出部５６ｆは、スペクトル分析部５６ａから入力音声のスペクトルの情報を受信して、その情報を蓄積する（ステップＳ３０１）。 Next, a processing procedure for setting the spectral intensity correction amount will be described. FIG. 9 is a flowchart showing a processing procedure for setting the spectral intensity correction amount. As shown in FIG. 9, first, the frequency intensity correction amount calculation unit 56f of the speech processing device 50 receives the spectrum information of the input speech from the spectrum analysis unit 56a and accumulates the information (step S301).

また、周波数強度補正量算出部５６ｆは、周波数強度補正量算出部５６ｆは、スペクトル生成部５６ｅから入力音声に基づいて生成された防聴音のスペクトルの情報を取得して、その情報を蓄積する（ステップＳ３０２）。 Further, the frequency intensity correction amount calculation unit 56f acquires the information of the spectrum of the hearing-aid sound generated based on the input sound from the spectrum generation unit 56e, and accumulates the information ( Step S302).

その後、周波数強度補正量算出部５６ｆは、入力音声のスペクトルの平均（原音のスペクトル平均）を算出する（ステップＳ３０３）。また、周波数強度補正量算出部５６ｆは、防聴音のスペクトルの平均（防聴音のスペクトル平均）を算出する（ステップＳ３０４）。 Thereafter, the frequency intensity correction amount calculating unit 56f calculates the average of the spectrum of the input sound (the average of the spectrum of the original sound) (step S303). Further, the frequency intensity correction amount calculation unit 56f calculates the average of the spectrum of the hearing loss (the spectrum average of the hearing loss) (step S304).

そして、周波数強度補正量算出部５６ｆは、防聴音のスペクトル平均から入力音声信号のスペクトル平均を差し引いたスペクトルの増加分（防聴音のスペクトル増加分）を算出する（ステップＳ３０５）。 Then, the frequency intensity correction amount calculation unit 56f calculates an increase in spectrum (a decrease in the spectrum of the hearing loss) obtained by subtracting the spectrum average of the input sound signal from the spectrum average of the hearing loss (step S305).

その後、周波数強度補正量算出部５６ｆは、防聴音のスペクトル増加分が正の値である周波数帯域を検出し（ステップＳ３０６）、その周波数帯域における防聴音のスペクトル増加分をスペクトル強度の補正量として設定する（ステップＳ３０７）。 Thereafter, the frequency intensity correction amount calculation unit 56f detects a frequency band in which the increase in the spectrum of the hearing loss is a positive value (step S306), and uses the increase in the spectrum of the hearing loss in that frequency band as the correction amount of the spectrum intensity. Setting is performed (step S307).

なお、上記実施例２では、周波数強度補正部５６ｇが、防聴音のスペクトルのスペクトル強度をあらかじめ設定された補正量だけ自動的に補正することとしたが、周波数強度補正部５６ｇが、入力部５１を介してユーザにより入力された補正量を受け付け、その補正量分だけ防聴音のスペクトルのスペクトル強度を補正することとしてもよい。 In the second embodiment, the frequency intensity correction unit 56g automatically corrects the spectrum intensity of the spectrum of the hearing-aid sound by a preset correction amount. However, the frequency intensity correction unit 56g has the input unit 51. It is also possible to accept the correction amount input by the user via the, and to correct the spectrum intensity of the hearing-aid sound spectrum by the correction amount.

この場合、周波数強度補正部５６ｇは、図６に示したような防聴音のスペクトル増加分の情報を表示部５２に出力するとともに、周波数領域ごとにスペクトル強度の補正量の指定をユーザから受け付けるスペクトル強度補正受付画面を表示部５２に出力する。 In this case, the frequency intensity correction unit 56g outputs the information on the increase in the spectrum of the hearing loss sound as shown in FIG. 6 to the display unit 52, and also accepts the specification of the correction amount of the spectrum intensity for each frequency region from the user. The intensity correction acceptance screen is output to the display unit 52.

図１０は、表示部５２に出力されるスペクトル強度補正受付画面６０の一例を示す図である。ユーザは、防聴音が甲高い音になることを防止するため、防聴音のスペクトル増加分の情報を参照し、周波数帯域ごとにマウス等を操作してスペクトル強度の増減を調節するスライダを動かし、スペクトル強度の補正量を決定する。 FIG. 10 is a diagram illustrating an example of the spectrum intensity correction reception screen 60 output to the display unit 52. In order to prevent the hearing loss from becoming a high pitched sound, the user refers to the information on the increase in the spectrum of the hearing loss, operates the mouse for each frequency band to move the slider that adjusts the increase / decrease of the spectrum intensity, Determine the amount of intensity correction.

そして、周波数強度補正部５６ｇは、このスペクトル強度の補正量の情報を受け付け、受け付けた補正量の情報に基づいて防聴音のスペクトルのスペクトル強度の補正をおこなう。 Then, the frequency intensity correction unit 56g receives the information on the correction amount of the spectrum intensity, and corrects the spectrum intensity of the spectrum of the hearing loss based on the received information on the correction amount.

上述してきたように、実施例２によれば、スペクトル微細構造抽出部５６ｂが、話者の音声信号からスペクトル微細構造を抽出し、スペクトル生成部５６ｅが、抽出されたスペクトル微細構造と所定のスペクトル包絡とを合成することにより話者の音声に被せて出力される防聴音のスペクトルを生成し、周波数強度補正部５６ｇが、生成されたスペクトルの所定の周波数領域におけるスペクトル強度を抑制することにより当該スペクトルを補正することとしたので、防聴音が甲高い音になる原因となる周波数領域のスペクトル強度を抑制することにより、人に不快感を与えてしまうことを効果的に防止することができる。 As described above, according to the second embodiment, the spectral fine structure extracting unit 56b extracts the spectral fine structure from the speech signal of the speaker, and the spectrum generating unit 56e uses the extracted spectral fine structure and the predetermined spectrum. By synthesizing the envelope, the spectrum of the hearing loss sound that is output over the speaker's voice is generated, and the frequency intensity correction unit 56g suppresses the spectrum intensity in a predetermined frequency region of the generated spectrum. Since the spectrum is corrected, it is possible to effectively prevent the person from feeling uncomfortable by suppressing the spectrum intensity in the frequency region that causes the hearing-proof sound to become a high-pitched sound.

また、実施例２によれば、周波数強度補正量算出部５６ｆが、話者の音声信号から抽出されたスペクトル微細構造および所定のスペクトル包絡を合成することにより生成された防聴音のスペクトルと、話者の音声信号から得られるスペクトルとの差に基づいてスペクトル強度の補正量を設定することとしたので、防聴音が甲高い音になる原因となる周波数領域のスペクトル強度の補正量を適切に設定することができる。 In addition, according to the second embodiment, the frequency intensity correction amount calculation unit 56f combines the spectrum fine structure extracted from the speech signal of the speaker and the predetermined spectrum envelope, and the spectrum of the hearing loss sound, Since the spectrum intensity correction amount is set based on the difference from the spectrum obtained from the person's voice signal, the spectrum intensity correction amount in the frequency domain that causes the hearing-proof sound to become a high-pitched sound is set appropriately. be able to.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、上記特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施例にて実施されてもよいものである。 Although the embodiments of the present invention have been described so far, the present invention can be implemented in various different embodiments within the scope of the technical idea described in the claims other than the embodiments described above. Is also good.

また、上記実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the above embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

なお、上記実施例で説明した音声処理方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。このプログラムは、インターネットなどのネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 The voice processing method described in the above embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program can be distributed via a network such as the Internet. The program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and being read from the recording medium by the computer.

以上のように、本発明にかかる音声処理装置および音声処理方法は、話者の音声に被せて出力される音声が甲高い音になってしまうのを効果的に防止し、人に不快感を与えることなく話者のプライバシーを保護することが必要な音声処理システムに対して有用である。 As described above, the sound processing device and the sound processing method according to the present invention effectively prevent the sound output over the speaker's voice from becoming a high-pitched sound, and make the person uncomfortable. This is useful for speech processing systems that need to protect the privacy of the speaker without the need.

実施例１に係る音声処理の概念を説明する図である。It is a figure explaining the concept of the audio | voice process which concerns on Example 1. FIG. 実施例１における防聴音のスペクトログラムと従来の防聴音のスペクトログラムとの比較を示す図である。It is a figure which shows the comparison with the spectrogram of the hearing-aid sound in Example 1, and the spectrogram of the conventional hearing-aid sound. 実施例１に係る音声処理装置３０の機能構成を示す図である。1 is a diagram illustrating a functional configuration of a voice processing device 30 according to a first embodiment. 実施例１に係る音声処理の処理手順を示すフローチャートである。3 is a flowchart illustrating a processing procedure of audio processing according to the first embodiment. 実施例２に係る音声処理を説明する図である。It is a figure explaining the audio | voice process which concerns on Example 2. FIG. スペクトル強度補正量４０の算出方法について説明する図である。It is a figure explaining the calculation method of the spectrum intensity correction amount. 実施例２に係る音声処理装置５０の機能構成を示す図である。It is a figure which shows the function structure of the audio | voice processing apparatus 50 which concerns on Example 2. FIG. 実施例２に係る音声処理の処理手順を示すフローチャートである。10 is a flowchart illustrating a processing procedure of audio processing according to the second embodiment. スペクトル強度補正量の設定処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the setting process of a spectrum intensity correction amount. 表示部５２に出力されるスペクトル強度変更受付画面６０の一例を示す図である。It is a figure which shows an example of the spectrum intensity change reception screen 60 output to the display part 52. FIG.

Explanation of symbols

１０音声波形
１１スペクトログラム
１２短時間スペクトル
１３スペクトル包絡
１４スペクトル微細構造
１５スペクトル包絡データベース
１６置換したスペクトル包絡
１７防聴音のスペクトル
２０原音声のスペクトログラム
２１従来の防聴音のスペクトログラム
２２本実施例の防聴音のスペクトログラム
３０音声処理装置
３１入力部
３２表示部
３３音声入力受付部
３４スペクトル包絡データベース
３５音声生成部
３６音声出力部
３７制御部
３７ａスペクトル分析部
３７ｂスペクトル微細構造抽出部
３７ｃスペクトル包絡抽出部
３７ｄスペクトル包絡選択部
３７ｅスペクトル生成部
４０スペクトル強度補正量
５０音声処理装置
５１入力部
５２表示部
５３音声入力受付部
５４音声生成部
５５音声出力部
５６制御部
５６ａスペクトル分析部
５６ｂスペクトル微細構造抽出部
５６ｃスペクトル包絡抽出部
５６ｄスペクトル包絡変形部
５６ｅスペクトル生成部
５６ｆ周波数強度補正量算出部
５６ｇ周波数強度補正部
６０スペクトル強度補正受付画面 DESCRIPTION OF SYMBOLS 10 Speech waveform 11 Spectrogram 12 Short-time spectrum 13 Spectrum envelope 14 Spectrum fine structure 15 Spectrum envelope database 16 Replaced spectrum envelope 17 Spectrum of hearing loss 20 Spectrogram of original speech 21 Spectrogram of conventional hearing loss 21 Spectrogram of conventional hearing loss 22 Spectrogram 30 speech processing device 31 input unit 32 display unit 33 speech input reception unit 34 spectrum envelope database 35 speech generation unit 36 speech output unit 37 control unit 37a spectrum analysis unit 37b spectrum fine structure extraction unit 37c spectrum envelope extraction unit 37d spectrum envelope selection Unit 37e spectrum generation unit 40 spectrum intensity correction amount 50 audio processing device 51 input unit 52 display unit 53 audio input reception unit 54 audio generation unit 55 audio output unit 6 control unit 56a spectral analyzer 56b spectrum fine structure extracting unit 56c spectrum envelope extracting unit 56d spectrum envelope deforming unit 56e spectrum generating unit 56f frequency intensity correction amount calculation unit 56g frequency intensity corrector 60 spectral intensity correction acceptance screen

Claims

A speech processing device that generates a speech spectrum of speech output over a speaker's speech,
A spectral envelope database for storing data relating to a plurality of different spectral envelopes;
Spectral fine structure extraction means for extracting the spectral fine structure from the speech signal of the speaker;
Spectrum envelope selection means for selecting data related to a spectrum envelope from data related to a spectrum envelope stored in the spectrum envelope database;
Voice spectrum generation means for generating a voice spectrum of a voice to be output over a speaker's voice by synthesizing the spectrum envelope selected by the spectrum envelope selection means and the spectrum fine structure. Voice processing device.

The spectrum envelope selection means newly selects data related to a spectrum envelope from the spectrum envelope database when a temporal change amount of a speaker's voice is a predetermined value or more, and the voice spectrum generation means selects the spectrum envelope selection. The speech spectrum of claim 1, wherein a speech spectrum of the speech output over the speech of the speaker is newly generated by synthesizing the spectrum envelope newly selected by the means and the spectrum fine structure. Processing equipment.

The speech processing apparatus according to claim 1 or 2, wherein the spectrum envelope selection unit randomly selects data related to a spectrum envelope from data related to a spectrum envelope stored in the spectrum envelope database.

The apparatus further comprises spectrum envelope extraction means for extracting a spectrum envelope from the speech signal of the speaker, wherein the spectrum envelope selection means includes the spectrum envelope extracted by the spectrum envelope extraction means and a spectrum in which data is stored in the spectrum envelope database. The speech processing according to claim 1 or 2, wherein data related to a spectrum envelope is selected from data related to a spectrum envelope stored in the spectrum envelope database based on a similarity between the envelope and the envelope. apparatus.

A speech processing device that generates a speech spectrum of speech output over a speaker's speech,
Spectral fine structure extraction means for extracting the spectral fine structure from the speech signal of the speaker;
A voice spectrum generating means for generating a voice spectrum of a voice to be output over a speaker's voice by synthesizing the spectral fine structure extracted by the spectral fine structure extracting means and a predetermined spectral envelope;
An audio processing apparatus comprising: frequency intensity correcting means for correcting the audio spectrum by suppressing the spectrum intensity in a predetermined frequency region of the audio spectrum generated by the audio spectrum generating means.

The frequency intensity correction means sets a spectrum intensity correction amount based on a difference between a voice spectrum obtained from a voice signal of a speaker and a voice spectrum generated by the voice spectrum generation means. Item 6. The voice processing device according to Item 5.

A speech processing method for generating a speech spectrum of speech output over a speaker's speech,
A spectral extraction process for extracting the spectral fine structure from the speech signal of the speaker;
When a spectral fine structure is extracted by the spectral fine structure extraction step, a spectral envelope selection step of selecting data related to a spectral envelope from among a plurality of different spectral envelope data stored in advance in a spectral envelope database;
A speech spectrum generation step of generating a speech spectrum of speech output over a speaker's speech by synthesizing the spectrum envelope selected by the spectrum envelope selection step and the spectrum fine structure. Voice processing method.

A speech processing method for generating a speech spectrum of speech output over a speaker's speech,
A spectral fine structure extraction step for extracting the spectral fine structure from the speech signal of the speaker;
A speech spectrum generation step of generating a speech spectrum of a speech output over a speaker's speech by synthesizing the spectrum microstructure extracted by the spectrum microstructure extraction means and a predetermined spectrum envelope;
And a frequency intensity correction step of correcting the audio spectrum by suppressing a spectrum intensity in a predetermined frequency region of the audio spectrum generated by the audio spectrum generation step.