JP5982671B2

JP5982671B2 - Audio signal processing method and audio signal processing system

Info

Publication number: JP5982671B2
Application number: JP2012098973A
Authority: JP
Inventors: 慶華孫; 永松　健司; 健司永松; 藤田　雄介; 雄介藤田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-04-24
Filing date: 2012-04-24
Publication date: 2016-08-31
Anticipated expiration: 2032-04-24
Also published as: JP2013228472A

Description

本発明は、入力された音声データに対して音声信号処理を実行するシステムで実行される音声信号処理方法に関し、特に、個人性を秘匿するための音声信号処理方法に関する。 The present invention relates to an audio signal processing method executed by a system that executes audio signal processing on input audio data, and more particularly to an audio signal processing method for concealing personality.

近年、音声データを記憶領域に蓄積する音声データ蓄積技術、及び音声データを検索する音声データ検索技術の進歩によって、大量の音声データを記憶領域に蓄積し、記憶領域に蓄積された音声データを活用する場面が多くなってきた。特に、医療現場、コールセンター、及びウェブコンテンツ（例えば、Ｆａｃｅｂｏｏｋ（登録商標、以下同じ）及びＹＯＵＴＵＢＥ（登録商標、以下同じ）など）では、蓄積された音声データに含まれている個人情報及び個人性情報を慎重に扱うことを要求される。 In recent years, due to advances in voice data storage technology that stores voice data in the storage area and voice data search technology that searches for voice data, a large amount of voice data is stored in the storage area and the voice data stored in the storage area is utilized. There are many scenes to do. In particular, in medical settings, call centers, and web contents (for example, Facebook (registered trademark, the same applies hereinafter) and YOUTUBE (registered trademark, the same applies hereinafter)), personal information and personality information included in accumulated audio data Is required to be treated carefully.

ここで、個人情報とは、音声データの内容に含まれる名前、住所、及び電話番号などの個人を特定可能な情報である。例えば、看護師が発話した内容を録音した音声データに含まれる“ＡさんがＢ病気となっています。全治Ｃ日です。”との内容は個人情報に該当する。また、コールセンターで録音されたオペレーターの音声データに含まれる“Ｄに在住のＥさんですね！暗証番号をＦに変更します。”との内容は個人情報に該当する。個人情報を保護するために、予め設定されたキーワードを自動的に検出し、検出されたキーワードを他の単語に置き換える技術が知られている。 Here, the personal information is information that can identify an individual such as a name, an address, and a telephone number included in the contents of the audio data. For example, the content “A-san is illness B. All day C” included in the audio data recorded by the nurse speaks personal information. In addition, the content of the operator's voice data recorded at the call center “Person E residing in D! Change the PIN to F” corresponds to personal information. In order to protect personal information, a technique for automatically detecting a preset keyword and replacing the detected keyword with another word is known.

一方、個人性情報は、音声データの音声（人の声が含まれている音）自体が有する情報のうち、個人を特定可能な音声情報を指す。例えば、個人性情報は、音声データの音声波形に含まれる声の高さ、訛り、大きさ、及び声道形状などの情報である。個人性情報の一つである声紋情報を利用して、個人を特定可能なことはよく知られている。 On the other hand, the personality information refers to voice information that can identify an individual among information included in the voice of the voice data (sound that includes a human voice). For example, the personality information is information such as the pitch, the volume, the volume, and the vocal tract shape included in the speech waveform of the speech data. It is well known that an individual can be identified using voiceprint information which is one of personality information.

したがって、個人性情報は肖像権のような個人の権利であり、個人性情報がむやみに他人に知られると悪用される危険もあるので、個人性情報は保護されるべきものである。しかし、コールセンターに電話する場合、及びサーバに音声データをアップロードする場合などには、発話者の声が不特定多数の人に聞かれる可能性があるため、個人性情報の制御が要求される。 Therefore, personality information is an individual right such as a portrait right, and personality information should be protected because there is a risk of misuse if personality information is unknowingly known to others. However, when calling a call center or uploading voice data to a server, the voice of a speaker may be heard by an unspecified number of people, so control of personality information is required.

個人性情報には様々な種類（例えば、発話内容に含まれる個人性情報、韻律に含まれる個人性情報、及び音韻に含まれる個人性情報）がある。 There are various types of personality information (for example, personality information included in utterance content, personality information included in prosody, and personality information included in phoneme).

発話内容に含まれる個人性情報としては、例えば、発話癖（「あのー」、及び「えーと」などのフィラー使用頻度）、教養（文法、及び単語の選択など）、及び方言（「だべー」など）の使用頻度がある。 The personality information included in the utterance content includes, for example, utterance 癖 (filler usage frequency such as “Ano” and “Eto”), culture (grammar, selection of words, etc.), and dialect (“Dabe”). Etc.).

韻律に含まれる個人性情報としては、性別（声の高さなど）、方言及び外国人（アクセントなど）、感情（イントネーションなど）、並びに体調（声帯雑音及び鼻音など）がある。 Personality information included in the prosody includes gender (such as voice pitch), dialect and foreigner (such as accent), emotion (such as intonation), and physical condition (such as vocal cord noise and nasal sound).

音韻に含まれる個人性情報としては、身体特徴（声道形状など）、発話癖（滑舌の良さ、口の開き方、及び舌の出し方）、及び録音環境（背景音及び残響など）がある。 The personality information included in the phoneme includes body features (such as vocal tract shape), speech utterances (good tongue, how to open the mouth, how to put out the tongue), and recording environment (such as background sound and reverberation). is there.

個人性情報の制御技術として、声を高くするボイスチェンジャーが知られている。ボイスチェンジャーは、例えば、テレビ等で出演者の特定を困難にするために用いられる。また、ボイスチェンジャーの他にも、個人性情報を制御する技術として、女性の声を男性の声に変換する技術も知られている（例えば、特許文献１参照）。 As a control technique for personal information, a voice changer that raises voice is known. The voice changer is used, for example, to make it difficult to specify a performer on a television or the like. In addition to the voice changer, a technique for converting a female voice into a male voice is also known as a technique for controlling personality information (see, for example, Patent Document 1).

特開平１０−２４０２９２号公報JP-A-10-240292

しかし、ボイスチェンジャーは、声を高くするだけなので、個人性情報を除去できない。同じく、特許文献１に記載された技術も、声の高さを変化させるだけなので、他の個人性情報を除去できない。例えば、訛り、声紋、声質、及び発話スタイルなどの個人性情報に基づいて、個人が特定される可能性がある。 However, since the voice changer only raises the voice, personality information cannot be removed. Similarly, since the technique described in Patent Document 1 only changes the pitch of the voice, other personality information cannot be removed. For example, an individual may be identified based on personality information such as speech, voiceprint, voice quality, and speech style.

また、すべての個人性情報を一律に除去することも考えられるが、個人性情報を除去するための処理によって、音声データの音質が劣化してしまい、音声データの可聴性が著しく低下し、音声データの利用価値がなくなる。 In addition, it may be possible to remove all personality information uniformly, but the sound quality of the audio data deteriorates due to the process for removing the personality information, and the audibility of the audio data is significantly reduced. The use value of data is lost.

以上によって、本発明は、個人性の強い個人性情報を除去し、音声データの劣化を可能な限り防止する音声信号処理方法を提供することを目的とする。 In view of the above, an object of the present invention is to provide an audio signal processing method that removes personality information with strong individuality and prevents deterioration of audio data as much as possible.

本発明の代表的な一例を示せば、入力された音声データに実行される音声信号処理方法において、前記方法は、ＣＰＵ、記憶領域、及びインタフェースを備えるシステムで実行され、前記入力された音声データを定量化した少なくとも一つの特徴量を算出する特徴量算出ステップと、前記特徴量差算出ステップで算出された特徴量に基づいて、前記特徴量の個人性の強さを定量化した個人性度を算出する個人性度算出ステップと、所定の個人性度条件を満たしていない個人性度がある場合、当該個人性度が当該個人性度条件を満たすような音声信号処理を前記入力された音声データに実行する音声信号処理ステップと、を含むことを特徴とする。 In a typical example of the present invention, in an audio signal processing method executed on input audio data, the method is executed by a system including a CPU, a storage area, and an interface. A feature amount calculation step for calculating at least one feature amount obtained by quantifying the feature amount, and a personality degree obtained by quantifying the strength of the individuality of the feature amount based on the feature amount calculated in the feature amount difference calculation step The personality degree calculating step for calculating the personality degree, and if there is a personality degree that does not satisfy the predetermined personality degree condition, the audio signal processing is performed so that the individuality degree satisfies the personality degree condition. An audio signal processing step to be performed on the data.

本願において開示される発明のうち代表的なものによって得られる効果を簡潔に説明すれば、下記の通りである。すなわち、個人性の強い個人性情報を除去し、音声データの劣化を可能な限り防止できる。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows. That is, it is possible to remove personal information having strong personality and prevent deterioration of audio data as much as possible.

本発明の第１実施形態の音声信号処理システムのハードウェア構成図である。It is a hardware block diagram of the audio | voice signal processing system of 1st Embodiment of this invention. 本発明の第１実施形態の音声信号処理システムの機能ブロック図である。1 is a functional block diagram of an audio signal processing system according to a first embodiment of the present invention. 本発明の第１実施形態の音声個人性特徴分析部の機能ブロック図である。It is a functional block diagram of the voice individuality characteristic analysis part of 1st Embodiment of this invention. 本発明の第１実施形態の音声データの音声波形の説明図である。It is explanatory drawing of the audio | voice waveform of the audio | voice data of 1st Embodiment of this invention. 本発明の第１実施形態の言語特徴量の説明図である。It is explanatory drawing of the language feature-value of 1st Embodiment of this invention. 本発明の第１実施形態の韻律特徴量として算出された音声波形のパワーの時系列データの説明図である。It is explanatory drawing of the time series data of the power of the speech waveform calculated as a prosodic feature-value of 1st Embodiment of this invention. 本発明の第１実施形態の韻律特徴量として算出された音声データの音声波形の基本周波数の時系列データの説明図である。It is explanatory drawing of the time series data of the fundamental frequency of the audio | voice waveform of the audio | voice data calculated as the prosodic feature-value of 1st Embodiment of this invention. 本発明の第１実施形態の韻律特徴量として算出された音声データの音声波形の音素継続長の時系列データの説明図である。It is explanatory drawing of the time series data of the phoneme continuation length of the audio | voice waveform of the audio | voice data calculated as a prosodic feature-value of 1st Embodiment of this invention. 本発明の第１実施形態の音韻特徴量として算出された音韻スペクトルの説明図である。It is explanatory drawing of the phoneme spectrum calculated as a phoneme feature-value of 1st Embodiment of this invention. 本発明の第１実施形態の個人性特徴量リストの説明図である。It is explanatory drawing of the individuality feature-value list | wrist of 1st Embodiment of this invention. 本発明の第１実施形態の平均Ｆ０値に対応する個人性度算出モデルの説明図である。It is explanatory drawing of the individuality degree calculation model corresponding to the average F0 value of 1st Embodiment of this invention. 本発明の第１実施形態の音声信号処理パラメータ決定部の機能ブロック図である。It is a functional block diagram of the audio | voice signal processing parameter determination part of 1st Embodiment of this invention. 本発明の第１実施形態の個人性度条件の説明図である。It is explanatory drawing of the individuality degree condition of 1st Embodiment of this invention. 本発明の第１実施形態の音声信号処理パラメータの説明図である。It is explanatory drawing of the audio | voice signal processing parameter of 1st Embodiment of this invention. 本発明の第１実施形態の音声信号処理パラメータ調整手法の説明図である。It is explanatory drawing of the audio | voice signal processing parameter adjustment method of 1st Embodiment of this invention. 本発明の第１実施形態の音声信号処理部の機能ブロック図である。It is a functional block diagram of the audio | voice signal processing part of 1st Embodiment of this invention. 本発明の第１実施形態の平均Ｆ０値が調整される前の基本周波数の時系列データである。It is time series data of the fundamental frequency before the average F0 value of 1st Embodiment of this invention is adjusted. 本発明の第１実施形態の平均Ｆ０値が調整された後の基本周波数の時系列データである。It is time series data of the fundamental frequency after the average F0 value of 1st Embodiment of this invention is adjusted. 本発明の第１実施形態の情報処理システムの構成の説明図である。It is explanatory drawing of a structure of the information processing system of 1st Embodiment of this invention. 本発明の第２実施形態のサーバの機能ブロック図である。It is a functional block diagram of the server of 2nd Embodiment of this invention. 本発明の第２実施形態の音声信号処理パラメータ決定部の機能ブロック図である。It is a functional block diagram of the audio | voice signal processing parameter determination part of 2nd Embodiment of this invention. 本発明の第２実施形態のパラメータ調整終了条件の具体例の説明図である。It is explanatory drawing of the specific example of the parameter adjustment completion conditions of 2nd Embodiment of this invention. 本発明の第２実施形態の音質劣化と平均Ｆ０値の個人性度変化との関係を示すグラフである。It is a graph which shows the relationship between the sound quality degradation of 2nd Embodiment of this invention, and the individuality degree change of an average F0 value. 本発明の第２実施形態の個人性権限設定ルールの具体例の説明図である。It is explanatory drawing of the specific example of the individuality authority setting rule of 2nd Embodiment of this invention. 本発明の第３実施形態の音声信号処理パラメータ決定部の機能ブロック図である。It is a functional block diagram of the audio | voice signal processing parameter determination part of 3rd Embodiment of this invention. 本発明の第３実施形態の遅延時間と音声信号処理による音声データの変化量との関係を示すグラフである。It is a graph which shows the relationship between the delay time of 3rd Embodiment of this invention, and the variation | change_quantity of the audio | voice data by audio | voice signal processing.

まず、本発明の概略について説明する。 First, the outline of the present invention will be described.

本発明では、入力された音声データを定量化した個人性特徴量を算出し、算出した個人性特徴量の個人性の強さを定量化した個人性度を算出する。そして、個人性度が個人性度条件を満たしていない場合、当該個人性度が個人性条件を満たすような音声信号処理を実行する。これによって、個人性の強い個人性情報を除去し、音声データの劣化を可能な限り防止できる。 In the present invention, a personality feature value obtained by quantifying the input voice data is calculated, and a personality degree obtained by quantifying the strength of the personality of the calculated personality feature value is calculated. If the individuality degree does not satisfy the individuality degree condition, audio signal processing is executed such that the individuality degree satisfies the individuality condition. As a result, personality information with strong personality can be removed, and deterioration of audio data can be prevented as much as possible.

芸能人の物まねをするパフォーマーの音声データを分析したところ、このパフォーマーの音声データの物理特徴量が物まねの対象の芸能人と全く似ていないことが判明した。つまり、人が音声から話者を特定する場合、話者の身体特徴に由来する声道特徴より、発話スタイル、リズム、及びイントネーションなどの方が重要となる。さらに、種々の音声データを分析したところ、音声の個人性特徴（個人を特定する手掛りを指す。個人性情報と同じ意味）の物理特徴量（個人性度特徴量）が話者によって異なることが判明した。 Analyzing the voice data of a performer imitating a celebrity, it was found that the physical features of the performer's voice data are not similar to the celebrity's target celebrity. In other words, when a person identifies a speaker from speech, utterance style, rhythm, intonation, and the like are more important than vocal tract features derived from the speaker's physical characteristics. Furthermore, when various audio data were analyzed, it was found that the physical feature amount (personality degree feature amount) of the voice individuality feature (referring to a cue for identifying an individual. The same meaning as personality information) differs depending on the speaker. found.

従来技術では、すべての話者において同じ音声信号処理が実行されるので、個人性特徴を完全に除去できない。仮に、個人性特徴となり得るすべての物理パラメータ（韻律特徴量、音韻特徴量、及び言語特徴量など）に対して個人性特徴を除去するように音声信号処理を実行することによって、個人性特徴を完全に除去できたとしても、現在の音声処理技術では、音声データに大量に信号処理が実行されると、音声が劣化してしまい、ユーザが音声を聞き取れなくなり、実用的ではない。 In the prior art, since the same audio signal processing is performed for all speakers, the personality feature cannot be completely removed. For example, by performing speech signal processing to remove personality features for all physical parameters (prosodic features, phoneme features, linguistic features, etc.) that can be personality features, Even if it can be completely removed, with the current voice processing technology, if a large amount of signal processing is performed on the voice data, the voice deteriorates and the user cannot hear the voice, which is not practical.

そこで、本特許では、個人性特徴量の個人性の強さを定量的に評価可能な個人性度を定義し、個人性度が所定の個人性度条件を満たすように音声データの物理パラメータに対して音声信号処理を実行することによって、個人性の強さが所定値以上の個人性特徴を除去できる。 Therefore, in this patent, the individuality degree that can quantitatively evaluate the individuality strength of the individuality feature quantity is defined, and the physical parameters of the audio data are set so that the individuality degree satisfies the predetermined individuality degree condition. On the other hand, by executing the audio signal processing, it is possible to remove a personality feature having a personality strength of a predetermined value or more.

以下、図面を参照して、この発明に係る音声合成装置および音声合成方法の好適な実施の形態を詳細に説明する。
（第１実施形態）
本発明の第１実施形態を図１〜図１５を用いて説明する。 Exemplary embodiments of a speech synthesizer and a speech synthesis method according to the present invention will be explained below in detail with reference to the drawings.
(First embodiment)
A first embodiment of the present invention will be described with reference to FIGS.

図１は、本発明の第１実施形態の音声信号処理システム１００のハードウェア構成図である。 FIG. 1 is a hardware configuration diagram of an audio signal processing system 100 according to the first embodiment of this invention.

音声信号処理システム１００は、ＣＰＵ１０３、主記憶装置であるメモリ１０４、補助記憶装置１０１、音声入力インタフェース（Ｉ／Ｆ）１０２、及び音声出力インタフェース（Ｉ／Ｆ）１０５を備える。ＣＰＵ１０３、主記憶装置であるメモリ１０４、補助記憶装置１０１、音声入力Ｉ／Ｆ１０２、及び音声出力Ｉ／Ｆ１０５は、バス１０６を介して相互に接続される。 The audio signal processing system 100 includes a CPU 103, a memory 104 as a main storage device, an auxiliary storage device 101, an audio input interface (I / F) 102, and an audio output interface (I / F) 105. The CPU 103, the main memory device 104, the auxiliary storage device 101, the audio input I / F 102, and the audio output I / F 105 are connected to each other via a bus 106.

例えば、音声信号処理システム１００は、携帯電話機及びコンピュータ等の計算機に音声処理ユニットとして組み込まれる。この場合、図１に示すハードウェア構成は、音声信号処理システム１０００が組み込まれた計算機のハードウェアを用いて実現してもよいし、音声信号処理システム１０００が組み込まれた計算機と別個に設けられてもよい。 For example, the audio signal processing system 100 is incorporated as an audio processing unit in a computer such as a mobile phone and a computer. In this case, the hardware configuration shown in FIG. 1 may be realized by using hardware of a computer in which the audio signal processing system 1000 is incorporated, or provided separately from the computer in which the audio signal processing system 1000 is incorporated. May be.

ＣＰＵ１０３は、音声信号処理システム１００の全体の制御を司る。メモリ１０４は、ＣＰＵ１０３のワークエリアとして使用される。補助記憶装置１０１は、不揮発性の記憶媒体であり、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）、及びフラッシュメモリ等である。補助記憶装置１０１は、個人性特徴量を算出するためのプログラム、及び個人性度を算出するためのプログラム等の各種プログラム、並びに各種データを格納する。 The CPU 103 governs overall control of the audio signal processing system 100. The memory 104 is used as a work area for the CPU 103. The auxiliary storage device 101 is a non-volatile storage medium, such as an HDD (Hard Disk Drive), an FD (Flexible Disk), and a flash memory. The auxiliary storage device 101 stores various programs such as a program for calculating a personality feature amount, a program for calculating a personality degree, and various data.

音声入力Ｉ／Ｆ１０２は、音声データを音声信号処理システム１００に入力するためのインタフェースである。音声入力Ｉ／Ｆ１０２にマイクなどの音声入力装置が直接接続され、音声入力装置から音声データが音声信号処理システム１００に直接入力されてもよい。また、可搬型記憶媒体等の記憶装置から音声データが音声信号処理システム１００に入力されてもよい。また、ネットワークを介して音声データを受信することによって、音声データが音声信号処理システム１００に入力されてもよい。 The audio input I / F 102 is an interface for inputting audio data to the audio signal processing system 100. A voice input device such as a microphone may be directly connected to the voice input I / F 102, and voice data may be directly input to the voice signal processing system 100 from the voice input device. In addition, audio data may be input to the audio signal processing system 100 from a storage device such as a portable storage medium. Further, the audio data may be input to the audio signal processing system 100 by receiving the audio data via a network.

音声出力Ｉ／Ｆ１０５は、音声データを音声信号処理システム１００から出力するためのインタフェースである。音声出力Ｉ／Ｆ１０５にスピーカなどの音声出力装置が直接接続され、音声出力装置から音声データが出力されてもよい。また、音声信号処理システム１００が可搬型記憶媒体等の記憶装置に音声データを出力してもよい。また、音声出力Ｉ／Ｆ１０５からネットワークを介して音声データが送信されることによって、音声データが音声信号処理システム１００から出力されてもよい。 The audio output I / F 105 is an interface for outputting audio data from the audio signal processing system 100. An audio output device such as a speaker may be directly connected to the audio output I / F 105, and audio data may be output from the audio output device. Also, the audio signal processing system 100 may output audio data to a storage device such as a portable storage medium. Further, the audio data may be output from the audio signal processing system 100 by transmitting the audio data from the audio output I / F 105 via the network.

図２は、本発明の第１実施形態の音声信号処理システム１００の機能ブロック図である。 FIG. 2 is a functional block diagram of the audio signal processing system 100 according to the first embodiment of this invention.

音声信号処理システム１００は、音声入力Ｉ／Ｆ１０２、音声個人性特徴分析部１１０、音声信号処理パラメータ決定部１２０、音声信号処理部１３０、及び音声出力Ｉ／Ｆ１０５を備える。 The audio signal processing system 100 includes an audio input I / F 102, an audio personality feature analysis unit 110, an audio signal processing parameter determination unit 120, an audio signal processing unit 130, and an audio output I / F 105.

音声入力Ｉ／Ｆ１０２は、音声データの入力を受け付けるインタフェースであり、例えば、マイク、記憶媒体、又はネットワーク等を音声信号処理システム１００に接続するインタフェースが該当する。 The audio input I / F 102 is an interface that receives input of audio data, and corresponds to an interface that connects a microphone, a storage medium, a network, or the like to the audio signal processing system 100, for example.

音声入力Ｉ／Ｆ１０２がマイクを接続するインタフェースである場合、音声データは、マイクによって音声が変換された電気信号として音声信号処理システム１００に入力される。また、音声入力Ｉ／Ｆ１０２が記憶媒体を接続するインタフェースである場合、記憶媒体が記憶している音声データがロードされることによって、音声信号処理システム１００に音声データが入力される。また、音声入力Ｉ／Ｆ１０２がネットワークを接続するインタフェースである場合、音声データは、例えば音声ストリームデータとして音声信号処理システム１００に入力される。 When the audio input I / F 102 is an interface for connecting a microphone, the audio data is input to the audio signal processing system 100 as an electrical signal in which the audio is converted by the microphone. When the audio input I / F 102 is an interface for connecting a storage medium, the audio data is input to the audio signal processing system 100 by loading the audio data stored in the storage medium. When the audio input I / F 102 is an interface for connecting a network, the audio data is input to the audio signal processing system 100 as audio stream data, for example.

音声個人性特徴分析部１１０は、音声入力Ｉ／Ｆ１０２に入力された音声データに基づいて、当該音声データの個人性に関する情報（音声特徴量、個人性特徴量、及び個人性度）を算出する。音声個人性特徴分析部１１０は、音声特徴量を算出する音声特徴量算出部１１１、個人性特徴量を算出する個人性特徴量算出部１１２、個人性度を算出する個人性度算出部１１３を備える。 Based on the voice data input to the voice input I / F 102, the voice personality feature analysis unit 110 calculates information (speech feature amount, individuality feature amount, and individuality degree) related to the personality of the voice data. . The voice individuality feature analysis unit 110 includes a voice feature amount calculation unit 111 that calculates a voice feature amount, a personality feature amount calculation unit 112 that calculates a personality feature amount, and a personality degree calculation unit 113 that calculates a personality degree. Prepare.

音声特徴量算出部１１１は、公知の音声特徴量解析モデルを用いて、音声入力Ｉ／Ｆ１０２に入力された音声データを分析し、音韻特徴量（Mel-cepstrumなど）、及び韻律特徴量（基本周波数F0パターン及びパワーなど）などの時系列データである少なくとも一つ以上の音声特徴量を算出する。 The speech feature quantity calculation unit 111 analyzes speech data input to the speech input I / F 102 using a known speech feature quantity analysis model, and acquires phoneme feature quantities (such as Mel-cepstrum) and prosodic feature quantities (basic At least one or more audio feature quantities that are time-series data such as a frequency F0 pattern and power are calculated.

個人性特徴量算出部１１２は、音声認識モデル、フェラー識別モデル、アクセント識別モデル、声道形状識別モデル、及び声帯雑音抽出モデルなどの公知のモデルを用いて、音声特徴量算出部１１１によって算出された音声特徴量に基づいて複数の個人性特徴量を算出する。換言すれば、個人性特徴量は、予め設定された複数のモデルによって音声データを定量化したものである。 The personality feature amount calculation unit 112 is calculated by the speech feature amount calculation unit 111 using a known model such as a speech recognition model, a feller identification model, an accent identification model, a vocal tract shape identification model, and a vocal cord noise extraction model. A plurality of personality feature amounts are calculated based on the voice feature amount. In other words, the personality feature amount is obtained by quantifying audio data using a plurality of preset models.

個人性度算出部２０４では、個人性特徴量算出部１１２によって算出された個人性特徴量に基づいて、各個人性特徴量の個人性の強さを定量化した個人性度を算出する。 Based on the individuality feature amount calculated by the individuality feature amount calculation unit 112, the individuality degree calculation unit 204 calculates an individuality degree that quantifies the strength of individuality of each individuality feature amount.

音声信号処理パラメータ決定部１２０は、個人性度算出部１１３によって算出された個人性度のうち、所定の個人性度条件１００１（図１１参照）を満たしていない個人性度がある場合、当該個人性度が当該個人性度条件１００１を満たすような音声信号処理を実行するためのパラメータである音声信号処理パラメータ１００３（図１２参照）を決定する。詳細は後述するが、音声信号処理部１３０は、音声信号処理パラメータ１００３に基づいて音声データに音声信号処理を実行する。音声信号処理パラメータ決定部１２０は、個人性度条件判定部１２１、音声信号処理パラメータ調整部１２２、及び個人性度再算出部１２３を備える。 The audio signal processing parameter determination unit 120, when there is an individuality degree that does not satisfy the predetermined individuality degree condition 1001 (see FIG. 11) among the individuality degrees calculated by the individuality degree calculation unit 113, An audio signal processing parameter 1003 (see FIG. 12), which is a parameter for executing audio signal processing whose sexiness satisfies the personality level condition 1001, is determined. Although details will be described later, the audio signal processing unit 130 executes audio signal processing on the audio data based on the audio signal processing parameter 1003. The audio signal processing parameter determination unit 120 includes an individuality degree condition determination unit 121, an audio signal processing parameter adjustment unit 122, and an individuality degree recalculation unit 123.

個人性度条件判定部１２１は、個人性度が所定の個人性度条件１００１を満たすか否かを判定する。個人性度条件判定部１２１は、個人性度が個人性度条件１００１を満たすと判定した場合、音声信号処理パラメータ１００３を音声信号処理部１３０に出力する。一方、個人性度条件判定部１２１は、個人性度が個人性度条件１００１を満たしていないと判定した場合、音声信号処理パラメータ調整部１２２に処理を移行する。 The individuality degree condition determination unit 121 determines whether the individuality degree satisfies a predetermined individuality degree condition 1001. When it is determined that the individuality degree satisfies the individuality degree condition 1001, the individuality degree condition determination unit 121 outputs the audio signal processing parameter 1003 to the audio signal processing unit 130. On the other hand, when it is determined that the individuality degree does not satisfy the individuality degree condition 1001, the individuality degree condition determination unit 121 shifts the processing to the audio signal processing parameter adjustment unit 122.

音声信号処理パラメータ調整部１２２は、個人性度条件判定部１２１によって個人性度が個人性度条件１００１を満たしていないと判定された場合、個人性度条件１００１を満たすように音声信号処理パラメータ１００３を調整する。個人性度再算出部１２３は、音声信号処理パラメータ調整部１２２によって調整した音声信号処理パラメータ１００３に基づいて、音声信号処理が実行された場合の音声データの個人性度を再度算出し、算出した個人性度を個人性度条件判定部１２１に出力する。個人性度条件判定部１２１は、個人性度再算出部１２３によって再度算出された調整後の個人性度が個人性度条件１００１を満たすか否かを判定する。 The audio signal processing parameter adjustment unit 122 determines that the individuality degree does not satisfy the individuality degree condition 1001 by the individuality degree condition determination unit 121, so that the audio signal processing parameter 1003 satisfies the individuality degree condition 1001. Adjust. Based on the audio signal processing parameter 1003 adjusted by the audio signal processing parameter adjustment unit 122, the individuality degree recalculation unit 123 recalculates and calculates the individuality degree of the audio data when the audio signal processing is executed. The individuality degree is output to the individuality degree condition determination unit 121. The individuality degree condition determination unit 121 determines whether the adjusted individuality degree calculated again by the individuality degree recalculation unit 123 satisfies the individuality degree condition 1001.

音声信号処理パラメータ決定部１２０では、個人性度が個人性度条件１００１を満たしていると個人性度条件判定部１２１によって判定されるまで、個人性度条件判定部１２１、音声信号処理パラメータ調整部１２２、及び個人性度再算出部１２３が音声信号処理パラメータ１００３の調整処理を繰り返し実行する。 In the audio signal processing parameter determination unit 120, the individuality degree condition determination unit 121, the audio signal processing parameter adjustment unit until the individuality degree condition determination unit 121 determines that the individuality degree satisfies the individuality degree condition 1001. 122 and the individuality degree recalculation unit 123 repeatedly execute the adjustment processing of the audio signal processing parameter 1003.

音声信号処理部１３０は、音声信号処理パラメータ決定部１２０から入力された音声信号処理パラメータ１００３に基づいて、音声信号処理を実行する。音声信号処理部１３０は、個人性特徴除去部１３１、及び音声合成部１３２を備える。 The audio signal processing unit 130 executes audio signal processing based on the audio signal processing parameter 1003 input from the audio signal processing parameter determination unit 120. The voice signal processing unit 130 includes a personality feature removal unit 131 and a voice synthesis unit 132.

個人性特徴除去部１３１は、個人性度条件を満たさない個人性度を除去するように、音声信号処理パラメータ１００３に基づいて、音声信号処理を音声データに実行する。音声合成部２１３は、個人性特徴除去部１３１によって音声信号処理が実行された音声データを合成する。 The personality feature removing unit 131 performs audio signal processing on the audio data based on the audio signal processing parameter 1003 so as to remove the individuality that does not satisfy the individuality condition. The voice synthesizer 213 synthesizes voice data that has been subjected to voice signal processing by the personality feature remover 131.

音声出力Ｉ／Ｆ１０５は、音声データを出力するためのインタフェースであり、例えば、スピーカ、記憶媒体、又はネットワークなどを音声信号処理システム１００に接続するインタフェース等が該当する。 The audio output I / F 105 is an interface for outputting audio data, and corresponds to an interface for connecting a speaker, a storage medium, a network, or the like to the audio signal processing system 100, for example.

音声出力Ｉ／Ｆ１０５がスピーカを接続するインタフェースである場合、音声データは、電気信号に変換されスピーカから出力される。また、音声出力Ｉ／Ｆ１０５が記憶媒体を接続するインタフェースである場合、記憶媒体に音声信号処理が実行された音声データを書き込むことによって音声データが出力される。また、音声出力Ｉ／Ｆ１０５がネットワークを接続するインタフェースである場合、音声データは、例えば音声ストリームデータとしてネットワークに出力される。 When the audio output I / F 105 is an interface for connecting a speaker, the audio data is converted into an electric signal and output from the speaker. Further, when the audio output I / F 105 is an interface for connecting a storage medium, the audio data is output by writing the audio data subjected to the audio signal processing to the storage medium. Further, when the audio output I / F 105 is an interface for connecting a network, the audio data is output to the network as audio stream data, for example.

なお、上述した各構成部は、各構成部に対応するプログラムをＣＰＵ１０３が実行することによって実現される。また、上述した各構成部の一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上述した各構成部を実現するプログラムを解釈し、実行することによってソフトウェアで実現する場合を説明したが、各構成部の機能を実現するプログラム、テーブル、及びファイル等の情報は、メモリのみならず、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、又は、ＩＣカード、ＳＤカード、及びＤＶＤ等の記録媒体に記憶できるし、必要に応じてネットワーク等を介してダウンロード及びインストールすることも可能であることは言うまでもない。 Each component described above is realized by the CPU 103 executing a program corresponding to each component. Further, a part or all of each of the above-described components may be realized by hardware, for example, by designing with an integrated circuit. Moreover, the case where it implement | achieves with a software by interpreting and executing the program which implement | achieves each component mentioned above was demonstrated, but information, such as a program, a table, and a file which implement | achieve the function of each component, is only memory. In addition, it can be stored in a recording device such as a hard disk or SSD (Solid State Drive), or a recording medium such as an IC card, SD card, or DVD, and can be downloaded and installed via a network or the like as necessary. It goes without saying that it is possible.

次に、音声信号処理システムの各構成部の動作の詳細を以下に説明する。 Next, details of the operation of each component of the audio signal processing system will be described below.

（音声個人性特徴分析部１１０の動作）
まず、音声個人性特徴分析部１１０の動作について図３を用いて説明する。図３は、本発明の第１実施形態の音声個人性特徴分析部１１０の機能ブロック図である。 (Operation of voice personality feature analysis unit 110)
First, the operation of the voice personality feature analysis unit 110 will be described with reference to FIG. FIG. 3 is a functional block diagram of the voice personality feature analysis unit 110 according to the first embodiment of this invention.

音声個人性特徴分析部１１０には、音声入力Ｉ／Ｆ１０２から音声データが入力される。 The voice personality feature analysis unit 110 receives voice data from the voice input I / F 102.

次に、音声特徴量算出部１１１について説明する。音声特徴量算出部１１１は、予め定義された音声特徴量を算出するための少なくとも一つのモジュールを含む。そして、音声特徴量算出部１１１に含まれるモジュールは、自身に対応する音声特徴量３１０を算出し、算出した音声特徴量３１０を個人性特徴量算出部１１２に入力する。 Next, the audio feature amount calculation unit 111 will be described. The audio feature amount calculation unit 111 includes at least one module for calculating a predefined audio feature amount. Then, the module included in the sound feature amount calculation unit 111 calculates the sound feature amount 310 corresponding to itself, and inputs the calculated sound feature amount 310 to the personality feature amount calculation unit 112.

音声特徴量３１０は、音声データのテキスト情報に関する言語特徴量３１１、音声データの韻律に関する韻律特徴量３１２、及び、音声データの音韻に関する音韻特徴量３１３に大きく分類される。図３に示す音声特徴量算出部１１１は、言語特徴量算出モジュール３０１、韻律特徴量算出モジュール３０２、及び音韻特徴量算出モジュール３０３を含む。 The speech feature value 310 is roughly classified into a language feature value 311 related to text information of speech data, a prosodic feature value 312 related to the prosody of speech data, and a phoneme feature value 313 related to the phoneme of speech data. 3 includes a language feature quantity calculation module 301, a prosodic feature quantity calculation module 302, and a phoneme feature quantity calculation module 303.

言語特徴量算出モジュール３０１は、入力された音声データの音声波形に基づいて言語特徴量３１１を算出する。言語特徴量３１１は、例えば、発話テキスト及びフィラー情報などであり、図５で詳細を説明する。 The language feature amount calculation module 301 calculates a language feature amount 311 based on the speech waveform of the input speech data. The language feature quantity 311 is, for example, an utterance text and filler information, and will be described in detail with reference to FIG.

韻律特徴量算出モジュール３０２は、入力された音声データの音声波形に基づいて韻律特徴量３１２を算出する。韻律特徴量３１２は、音声データの音声波形のパワー、音声データの音声波形の音声基本周波数（Ｆ０）、及びリズム（音素継続長）などであり、図６Ａ〜図６Ｃで詳細を説明する。 The prosodic feature value calculation module 302 calculates the prosodic feature value 312 based on the speech waveform of the input speech data. The prosodic feature value 312 includes the power of the speech waveform of the speech data, the speech fundamental frequency (F0) of the speech waveform of the speech data, the rhythm (phoneme duration), and the details will be described with reference to FIGS. 6A to 6C.

音韻特徴量算出モジュール３０３は、入力された音声データの音声波形から音韻特徴量３１３を算出する。音韻特徴量３１３は、例えば、音声データの音声波形の音韻スペクトルであり、図７で詳細を説明する。 The phoneme feature quantity calculation module 303 calculates a phoneme feature quantity 313 from the speech waveform of the input speech data. The phoneme feature quantity 313 is, for example, a phoneme spectrum of a voice waveform of voice data, and details will be described with reference to FIG.

図４が音声データとして音声個人性特徴分析部１１０に入力された場合の言語特徴量３１１、韻律特徴量３１２、及び音韻特徴量３１３について、図５〜図７を用いて説明する。 The language feature value 311, prosodic feature value 312, and phoneme feature value 313 when FIG. 4 is input to the speech individuality feature analysis unit 110 as speech data will be described with reference to FIGS. 5 to 7.

図４は、本発明の第１実施形態の音声データの音声波形の説明図である。 FIG. 4 is an explanatory diagram of an audio waveform of audio data according to the first embodiment of this invention.

図４に示す縦軸は、音声個人性特徴分析部１１０に入力された音声データの音声波形の音圧を示し、横軸は時間を示す。 The vertical axis shown in FIG. 4 represents the sound pressure of the speech waveform of the speech data input to the speech individuality feature analysis unit 110, and the horizontal axis represents time.

図５は、本発明の第１実施形態の言語特徴量３１１の説明図である。 FIG. 5 is an explanatory diagram of the language feature quantity 311 according to the first embodiment of this invention.

図４に示す音声データに基づいて言語特徴量算出モジュール３０１によって算出された言語特徴量３１１を図５に示す。図５では、「（えーど）こんにちは、（えーど）わたしは（えーど）田中ともうします」との発話テキスト及びフィラー情報が言語特徴量算出モジュール３０１によって言語特徴量３１１として算出される。なお、図５に示す小括弧内の部分がフィラー情報である。 FIG. 5 shows the language feature quantity 311 calculated by the language feature quantity calculation module 301 based on the voice data shown in FIG. In Figure 5, "(er etc.) Hello, (er etc.) I (er etc.) Tanaka and another to you" is calculated as the spoken text and language feature value filler information by language feature amount calculation module 301 311 . In addition, the part in the parenthesis shown in FIG. 5 is filler information.

韻律特徴量算出モジュール３０２は、上述したように、韻律特徴量３１２として、音声データの音声波形のパワーの時系列データ（図６Ａ参照）、音声データの音声波形の音声基本周波数（Ｆ０）の時系列データ（図６Ｂ参照）、及びリズム（音素継続長）（図６Ｃ参照）を算出する。 As described above, the prosody feature quantity calculation module 302 uses the time series data (see FIG. 6A) of the power of the speech waveform of the speech data and the speech fundamental frequency (F0) of the speech waveform of the speech data as the prosody feature amount 312. Series data (see FIG. 6B) and rhythm (phoneme duration) (see FIG. 6C) are calculated.

図６Ａは、本発明の第１実施形態の韻律特徴量３１２として算出された音声波形のパワーの時系列データの説明図である。 FIG. 6A is an explanatory diagram of power waveform time-series data calculated as the prosodic feature value 312 according to the first embodiment of this invention.

図６Ａの縦軸は音声波形のパワーを示し、図６Ａの横軸は時間を示す。音声波形のパワーは、音圧実効値の２乗値を基準音圧の２乗値で割った値の常用対数を１０倍することによって算出される。 The vertical axis in FIG. 6A indicates the power of the speech waveform, and the horizontal axis in FIG. 6A indicates time. The power of the speech waveform is calculated by multiplying the common logarithm of the value obtained by dividing the square value of the sound pressure effective value by the square value of the reference sound pressure by 10.

図６Ｂは、本発明の第１実施形態の韻律特徴量３１２として算出された音声データの音声波形の基本周波数の時系列データの説明図である。 FIG. 6B is an explanatory diagram of time-series data of the fundamental frequency of the speech waveform of speech data calculated as the prosodic feature value 312 according to the first embodiment of this invention.

図６Ｂの縦軸は基本周波数を示し、図６Ｂの横軸は時間を示す。 The vertical axis in FIG. 6B indicates the fundamental frequency, and the horizontal axis in FIG. 6B indicates time.

図６Ｃは、本発明の第１実施形態の韻律特徴量３１２として算出された音声データの音声波形の音素継続長の時系列データの説明図である。 FIG. 6C is an explanatory diagram of time-series data of phoneme durations of the speech waveform of speech data calculated as the prosodic feature value 312 according to the first embodiment of this invention.

図６Ｃでは、図４に示す音声波形を音素ごとに区分し、各音素が発声されている時間を音素継続長として点線から次の点線までの時間で示す。 In FIG. 6C, the speech waveform shown in FIG. 4 is divided for each phoneme, and the time during which each phoneme is uttered is shown as the time from the dotted line to the next dotted line as the phoneme duration.

図７は、本発明の第１実施形態の音韻特徴量３１３として算出された音韻スペクトルの説明図である。 FIG. 7 is an explanatory diagram of the phoneme spectrum calculated as the phoneme feature quantity 313 according to the first embodiment of this invention.

図７の縦軸は周波数を示し、図７の横軸は時間を示す。図７では、色が濃い周波数ほど分布が大きいことを示す。 The vertical axis in FIG. 7 indicates frequency, and the horizontal axis in FIG. 7 indicates time. In FIG. 7, the darker the frequency, the greater the distribution.

次に、個人性特徴量算出部１１２について説明する。個人性特徴量算出部１１２は予め定義された個人性特徴量を算出するための少なくとも一つのモジュールを含む。個人性特徴量算出部１１２に含まれるモジュールは、自身に対応する個人性特徴量３２０を算出し、算出した個人性特徴量３２０を個人性度算出部１１３に入力する。 Next, the personality feature amount calculation unit 112 will be described. The personality feature quantity calculation unit 112 includes at least one module for calculating a predefined personality feature quantity. The module included in the personality feature amount calculation unit 112 calculates a personality feature amount 320 corresponding to itself, and inputs the calculated personality feature amount 320 to the personality degree calculation unit 113.

個人性特徴量３２０のリストを図８に示す。図８は、本発明の第１実施形態の個人性特徴量リスト８００の説明図である。 A list of personality feature quantities 320 is shown in FIG. FIG. 8 is an explanatory diagram of the personality feature quantity list 800 according to the first embodiment of this invention.

図８に示すように、個人性特徴量算出部１１２は、例えば、フィラー出現頻度、単語使用頻度、文法誤り頻度、方言出現頻度、方言アクセント出現頻度、平均Ｆ０値、Ｆ０パターン形状、音韻特徴量、及び声質特徴量を算出する。なお、個人性特徴量算出部１１２は、図８に示す個人性特徴量リスト８００の個人性特徴量３２０以外の個人性特徴量３２０を算出してもよいし、図８に示す個人性特徴量リスト８００のすべての個人性特徴量３２０を算出しなくてもよい。 As shown in FIG. 8, the individuality feature amount calculation unit 112, for example, filler appearance frequency, word usage frequency, grammatical error frequency, dialect appearance frequency, dialect accent appearance frequency, average F0 value, F0 pattern shape, phonological feature amount. , And voice quality feature quantities are calculated. The personality feature quantity calculation unit 112 may calculate a personality feature quantity 320 other than the personality feature quantity 320 in the personality feature quantity list 800 shown in FIG. 8, or the personality feature quantity shown in FIG. It is not necessary to calculate all the personality features 320 of the list 800.

フィラー出現頻度は、一つのアクセント句にフィラー情報が出現する確率である。例えば、図５に示す言語特徴量３１１には、「（えーど）こんにちは、」、「（えーど）わたしは」、「（えーど）田中と」、及び「もうします。」の四つのアクセント句があり、これらのアクセント句のうちフィラー情報（えーど）が出現するアクセント句は三つである。したがって、図５に示す言語特徴量３１１のフィラー出現頻度は３／４＝０．７５となる。 The filler appearance frequency is a probability that filler information appears in one accent phrase. For example, the language feature value 311 shown in FIG. 5, "(er etc.) Hello,", "(er etc.) I", "(er etc.) Tanaka and", and "you anymore." Four of There are accent phrases, and among these accent phrases, there are three accent phrases in which filler information (edo) appears. Therefore, the filler appearance frequency of the language feature quantity 311 shown in FIG. 5 is 3/4 = 0.75.

単語使用頻度は、一つのアクセント句に、個人性が特定可能な予め定義された単語が出現する確率である。例えば、「田中」という単語が定義されていた場合、図５に示す言語特徴量の単語出現頻度は１／４＝０．２５となる。 The word usage frequency is a probability that a predefined word that can specify individuality appears in one accent phrase. For example, when the word “Tanaka” is defined, the word appearance frequency of the language feature amount shown in FIG. 5 is ¼ = 0.25.

文法誤り頻度は、一つのセンテンスに文法的な誤りが含まれる確率である。方言出現頻度は、一つのアクセント句に、方言として予め定義された単語が出現する確率である。 The grammatical error frequency is the probability that a sentence contains a grammatical error. Dialect appearance frequency is the probability that a word predefined as a dialect will appear in one accent phrase.

方言アクセント出現頻度は、一つのアクセント句に、方言アクセントとして予め定義されたアクセントが出現する確率である。なお、方言アクセント出現頻度は、韻律特徴量３１２の基本周波数に基づいて算出される。 The dialect accent appearance frequency is a probability that an accent defined in advance as a dialect accent appears in one accent phrase. The dialect accent appearance frequency is calculated based on the fundamental frequency of the prosodic feature value 312.

平均Ｆ０値は、韻律特徴量３１２として算出された基本周波数の平均値であり、発話者の声の高さの平均値である。例えば、図６に示す基本周波数の平均Ｆ０値は２３０Ｈｚである。Ｆ０パターン形状は、韻律特徴量３１２として算出された基本周波数に基づいて算出される値であり、発話者のイントネーションの平均的なパターンを示す。 The average F0 value is an average value of fundamental frequencies calculated as the prosodic feature value 312 and is an average value of the voice pitch of the speaker. For example, the average F0 value of the fundamental frequency shown in FIG. 6 is 230 Hz. The F0 pattern shape is a value calculated based on the fundamental frequency calculated as the prosodic feature value 312 and indicates an average pattern of the speaker's intonation.

音韻特徴量及び声質特徴量は、音韻特徴量３１３の音韻スペクトルに基づいて算出され、音韻特徴量は発話者の声紋（声道形状）を示す値である。 The phoneme feature quantity and the voice quality feature quantity are calculated based on the phoneme spectrum of the phoneme feature quantity 313, and the phoneme feature quantity is a value indicating the voiceprint (speech tract shape) of the speaker.

図３に示す個人性特徴量算出部１１２は、方言特徴量算出モジュール３０４、アクセント特徴量算出モジュール３０５、及び声道形状特徴量算出モジュール３０６を含む。 3 includes a dialect feature value calculation module 304, an accent feature value calculation module 305, and a vocal tract shape feature value calculation module 306.

方言特徴量算出モジュール３０４は、言語特徴量３１１に基づいて上述した方言特徴量３２１を算出する。 The dialect feature quantity calculation module 304 calculates the dialect feature quantity 321 described above based on the language feature quantity 311.

アクセント特徴量算出モジュール３０５は、韻律特徴量３１２として算出された基本周波数に基づいて、上述した方言アクセント出現頻度をアクセント特徴量３２２として算出する。なお、アクセント特徴量算出モジュール３０５は、一つのアクセント句にアクセント誤りが含まれる確率をアクセント特徴量３２２として算出してもよい。 The accent feature quantity calculation module 305 calculates the dialect accent appearance frequency as the accent feature quantity 322 based on the fundamental frequency calculated as the prosodic feature quantity 312. Note that the accent feature quantity calculation module 305 may calculate the probability that an accent error is included in one accent phrase as the accent feature quantity 322.

声道形状特徴量算出モジュール３０６は、音韻特徴量３１３として算出された音韻スペクトルに基づいて、声道形状特徴量３２３を算出する。 The vocal tract shape feature amount calculation module 306 calculates a vocal tract shape feature amount 323 based on the phoneme spectrum calculated as the phoneme feature amount 313.

次に、個人性度算出部１１３について説明する。 Next, the individuality degree calculation unit 113 will be described.

個人性度算出部１１３は、予め定義された個人性度算出モデル３４０を用いて、個人性特徴量算出部１１２によって算出された各個人性特徴量３２０の個人性度３３０を算出する。個人性度算出モデル３４０は、各個人性特徴量３２０に対応して定義され、個人性度を算出するための情報である。例えば、個人性度算出モデル３４０は、各個人性特徴量３２０の値の人数分布を含む統計データである。 The individuality degree calculation unit 113 calculates the individuality degree 330 of each individuality feature amount 320 calculated by the individuality feature amount calculation unit 112 using a predefined individuality degree calculation model 340. The individuality degree calculation model 340 is defined to correspond to each individuality feature amount 320 and is information for calculating the individuality degree. For example, the individuality degree calculation model 340 is statistical data including the number distribution of the individuality feature amount 320 values.

図９を用いて、個人性特徴量算出部１１２によって算出された平均Ｆ０値に対応する個人性度算出モデル３４０について説明する。図９は、本発明の第１実施形態の平均Ｆ０値に対応する個人性度算出モデル３４０の説明図である。 The individuality degree calculation model 340 corresponding to the average F0 value calculated by the individuality feature amount calculation unit 112 will be described with reference to FIG. FIG. 9 is an explanatory diagram of the individuality degree calculation model 340 corresponding to the average F0 value according to the first embodiment of this invention.

図９では、大量の音声データの平均Ｆ０値の人数分布の統計データが用意されているものとする。図９に示す統計データは、男性の平均Ｆ０値の統計データである男性平均Ｆ０値統計データ９０１、女性の平均Ｆ０値の統計データである女性平均Ｆ０値統計データ９０２、及び全性別の平均Ｆ０値の統計データである全性別平均Ｆ０値統計データ９０３を含む。例えば、男性平均Ｆ０値統計データ９０１では、平均Ｆ０値が１２０Ｈｚである人数が最大であることを示す。 In FIG. 9, it is assumed that statistical data of the number distribution of average F0 values of a large amount of audio data is prepared. The statistical data shown in FIG. 9 includes male average F0 value statistical data 901 which is statistical data of male average F0 values, female average F0 value statistical data 902 which is statistical data of female average F0 values, and average F0 of all genders. All gender average F0 value statistical data 903 which is statistical data of values is included. For example, the male average F0 value statistical data 901 indicates that the number of persons whose average F0 value is 120 Hz is the maximum.

男性平均Ｆ０値統計データ９０１に基づいて男性の平均Ｆ０値に対応する個人性度算出モデルである男性平均Ｆ０値個人性度算出モデル９１１が算出され、女性平均Ｆ０値統計データ９０２に基づいて女性の平均Ｆ０値に対応する個人性度算出モデルである女性平均Ｆ０値個人性度算出モデル９１２が算出され、全性別平均Ｆ０値統計データ９０３に基づいて全性別の平均Ｆ０値に対応する個人性度算出モデルである全性別平均Ｆ０値個人性度算出モデル９１３が算出される。 Based on the male average F0 value statistical data 901, a male average F0 value personality degree calculation model 911, which is a personality degree calculation model corresponding to the average F0 value of males, is calculated, and based on the female average F0 value statistical data 902, females are calculated. A female average F0 value individuality degree calculation model 912, which is a personality degree calculation model corresponding to the average F0 value, is calculated, and the individuality corresponding to the average F0 value for all genders based on the average F0 value statistical data 903 for all genders An average F0 value individuality degree calculation model 913 that is a degree calculation model is calculated.

各個人性度算出モデル９１１〜９１３は、人数分布と個人性度とが反比例するように生成される。換言すれば、人数分布が大きい平均Ｆ０値の個人性度が小さくなり、人数分布が小さい平均Ｆ０値の個人性度が大きくなるように、個人性度算出モデル９１１〜９１３が生成される。 Each individuality degree calculation model 911 to 913 is generated such that the number distribution and the individuality degree are inversely proportional. In other words, the individuality degree calculation models 911 to 913 are generated so that the individuality degree of the average F0 value having a large number distribution is small and the individuality degree of the average F0 value having a small number distribution is large.

例えば、男性平均Ｆ０値個人性度算出モデル９１１では、男性平均Ｆ０値統計データ９０１で人数分布が最大となる平均Ｆ０値１２０Ｈｚは個人性度が最低となる。また、男性平均Ｆ０値統計データ９０１では、平均Ｆ０値が１２０Ｈｚから離れるにつれて人数分布が少なくなるので、男性平均Ｆ０値個人性度算出モデル９１１では、平均Ｆ０値が１２０Ｈｚから離れるにつれて個人性度が大きくなる。例えば、ある男性の音声データの平均Ｆ０値が３００Ｈｚである場合、平均Ｆ０値が３００Ｈｚの人数分布は非常に小さいので、この男性の平均Ｆ０値の個人性度は非常に大きくなる。したがって、個人性度算出モデル３４０として個人性特徴量の人数分布を用いて算出された個人性度は、人数が最大である基準個人性特徴量の逸脱度合いを定量化したもの供いえる。 For example, in the male average F0 value individuality degree calculation model 911, the average F0 value 120 Hz that maximizes the number distribution in the male average F0 value statistical data 901 has the lowest individuality degree. Further, in the male average F0 value statistical data 901, the distribution of the number of people decreases as the average F0 value moves away from 120 Hz. Therefore, in the male average F0 value individuality calculation model 911, the individuality degree increases as the average F0 value moves away from 120 Hz. growing. For example, when the average F0 value of a certain male voice data is 300 Hz, the distribution of the number of persons whose average F0 value is 300 Hz is very small, and thus the individuality degree of the average F0 value of the male is very large. Therefore, the individuality degree calculated using the personality feature quantity distribution as the individuality degree calculation model 340 can be said to be a quantification of the deviation degree of the reference individuality feature quantity having the largest number of people.

平均Ｆ０値は性別に依存するので、個人性度算出部１１３が性別を判定可能な場合、性別ごとの個人性度算出モデル９１１又は９１２を用いて、個人性度を算出する。これによって、個人性度を正確に算出できる。なお、個人性度算出部１１３が性別を判定不可能な場合、全性別平均Ｆ０値個人性度算出モデル９１３を用いて個人性度を算出する。 Since the average F0 value depends on gender, the individuality degree is calculated using the individuality degree calculation model 911 or 912 for each sex when the individuality degree calculation unit 113 can determine the gender. Thereby, the individuality degree can be accurately calculated. If the individuality degree calculation unit 113 cannot determine the gender, the individuality degree is calculated using the average F0 value individuality degree calculation model 913 for all genders.

例えば、個人性特徴量算出部１１２が図６（Ｂ）に示す基本周波数から算出した平均Ｆ０値が２３０Ｈｚであり、当該平均Ｆ０値の音声データは男性のものである場合、男性平均Ｆ０値個人性度算出モデル９１１に基づいて、この男性の平均Ｆ０値の個人性度は２００と算出される。 For example, when the average F0 value calculated from the fundamental frequency shown in FIG. 6B by the personality feature amount calculation unit 112 is 230 Hz and the voice data of the average F0 value is that of a male, the male average F0 value of the individual Based on the sex calculation model 911, the individual personality of the average F0 value of this male is calculated as 200.

図３に示す個人性度算出部１１３は、方言個人性度算出モジュール３０７、アクセント個人性度算出モジュール３０８、及び声道形状個人性度算出モジュール３０９を含む。 3 includes a dialect individuality calculation module 307, an accent individuality calculation module 308, and a vocal tract shape individuality calculation module 309.

方言個人性度算出モジュール３０７は、方言特徴量３２１に対応する個人性度算出モデル３４０を用いて、方言特徴量３２１の個人性度である方言個人性度３３１を算出する。アクセント個人性度算出モジュール３０８は、アクセント特徴量３２２に対応する個人性度算出モデル３４０を用いて、アクセント特徴量３２２の個人性度であるアクセント個人性度３３２を算出する。声道形状個人性度算出モジュール３０９は、声道形状特徴量３２３に対応する個人性度算出モデル３４０を用いて、声道形状特徴量３２３の個人性度である声道形状個人性度３３３を算出する。 The dialect individuality degree calculation module 307 calculates a dialect individuality degree 331 that is the individuality degree of the dialect feature quantity 321 using the individuality degree calculation model 340 corresponding to the dialect feature quantity 321. The accent individuality degree calculation module 308 calculates an accent individuality degree 332 that is the individuality degree of the accent feature amount 322 using the individuality degree calculation model 340 corresponding to the accent feature amount 322. The vocal tract shape individuality degree calculation module 309 uses the individuality degree calculation model 340 corresponding to the vocal tract shape feature quantity 323 to calculate the vocal tract shape individuality degree 333 which is the individuality degree of the vocal tract shape feature quantity 323. calculate.

なお、個人性度算出モデル３４０は、個人性特徴量の人数分布の統計データに限定されず、専門家などが定義した計算式及びルールであってもよいし、個人性特徴量の平均値又は分散値から自動的に生成されたモデルであってもよい。 The individuality degree calculation model 340 is not limited to the statistical data of the personality feature quantity distribution, but may be a calculation formula and a rule defined by an expert or the like. The model may be automatically generated from the variance value.

（音声信号処理パラメータ決定部１２０の動作）
次に、音声信号処理パラメータ決定部１２０の動作について、図１０を用いて説明する。図１０は、本発明の第１実施形態の音声信号処理パラメータ決定部１２０の機能ブロック図である。 (Operation of the audio signal processing parameter determination unit 120)
Next, the operation of the audio signal processing parameter determination unit 120 will be described with reference to FIG. FIG. 10 is a functional block diagram of the audio signal processing parameter determination unit 120 according to the first embodiment of this invention.

音声信号処理パラメータ決定部１２０は、パラメータ決定用データ群１００２として、個人性特徴量算出部１１２によって算出された個人性特徴量３２０、個人性度算出部１１３によって算出された個人性度３３０、及び音声信号処理パラメータ１００３を取得し、取得したパラメータ決定用データ群１００２を個人性度条件判定部１２１に入力する。音声信号処理パラメータ１００３の詳細は図１２で説明する。 The audio signal processing parameter determination unit 120 includes, as the parameter determination data group 1002, the individuality feature amount 320 calculated by the individuality feature amount calculation unit 112, the individuality degree 330 calculated by the individuality degree calculation unit 113, and The audio signal processing parameter 1003 is acquired, and the acquired parameter determination data group 1002 is input to the individuality degree condition determining unit 121. Details of the audio signal processing parameter 1003 will be described with reference to FIG.

個人性度条件判定部１２１は、個人性度条件判定モジュール１００４を含む。個人性度条件判定モジュール１００４は、個人性度３３０が予め定義された個人性度条件１００１を満たすか否かを判定する。個人性度条件１００１の詳細は図１１で説明する。 The personality condition determination unit 121 includes a personality condition determination module 1004. The individuality degree condition determination module 1004 determines whether or not the individuality degree 330 satisfies a predefined individuality degree condition 1001. Details of the individuality condition 1001 will be described with reference to FIG.

個人性度３３０が個人性度条件１００１を満たすと、個人性度条件判定モジュール１００４によって判定された場合（Ｙｅｓ）、音声信号処理パラメータ決定部１２０は処理を終了し、音声信号処理パラメータ１００３を音声信号処理部１３０に入力する。一方、個人性度３３０が個人性度条件１００１を満たさないと、個人性度条件判定モジュール１００４によって判定された場合（Ｎｏ）、音声信号処理パラメータ決定部１２０は、音声信号処理パラメータ調整部１２２に処理を移行する。音声信号処理パラメータ調整部１２２は、図１３で詳細を説明する音声信号処理パラメータ調整手法に基づいて、音声信号処理パラメータ１００３を調整し、調整後の音声信号処理パラメータ１００３の値でパラメータ決定用データ群１００２の音声信号処理パラメータ１００３を更新する。そして、音声信号処理パラメータ調整部１２２は、調整後の音声信号処理パラメータ１００３を個人性度再算出部１２３に入力する。 When the individuality degree 330 satisfies the individuality degree condition 1001, when the individuality degree condition determination module 1004 determines (Yes), the audio signal processing parameter determination unit 120 ends the processing, and the audio signal processing parameter 1003 is changed to the audio signal processing parameter 1003. The signal is input to the signal processing unit 130. On the other hand, if the individuality degree 330 does not satisfy the individuality degree condition 1001 and the individuality degree condition determination module 1004 determines (No), the audio signal processing parameter determination unit 120 sets the audio signal processing parameter adjustment unit 122 to Migrate processing. The audio signal processing parameter adjustment unit 122 adjusts the audio signal processing parameter 1003 based on the audio signal processing parameter adjustment method, which will be described in detail with reference to FIG. 13, and uses the value of the adjusted audio signal processing parameter 1003 for parameter determination data. The audio signal processing parameter 1003 of the group 1002 is updated. Then, the audio signal processing parameter adjustment unit 122 inputs the adjusted audio signal processing parameter 1003 to the individuality degree recalculation unit 123.

なお、音声信号処理パラメータ１００３の初期値は、音声信号処理部１３０によって音声信号処理が実行されない値に設定される。つまり、最初の個人性度条件判定モジュール１００４による処理で、すべての個人性度３３０が個人性度条件１００１を満たすと判定された場合、音声信号処理部１３０は音声データに対して何も処理を実行せずに、音声データを音声出力Ｉ／Ｆ１０５に入力する。 Note that the initial value of the audio signal processing parameter 1003 is set to a value at which the audio signal processing unit 130 does not execute the audio signal processing. That is, when it is determined in the process by the first individuality degree condition determination module 1004 that all the individuality degrees 330 satisfy the individuality degree condition 1001, the audio signal processing unit 130 performs no processing on the audio data. Without executing, the audio data is input to the audio output I / F 105.

個人性度条件１００１について図１１を用いて説明する。図１１は、本発明の第１実施形態の個人性度条件１００１の説明図である。 The personality condition 1001 will be described with reference to FIG. FIG. 11 is an explanatory diagram of the individuality condition 1001 according to the first embodiment of this invention.

個人性度条件１００１は音声信号処理システム１００のユーザが秘匿を所望する個人性度に応じて設定されるものであり、種々の個人性度条件１００１が設定される。例えば、性別の秘匿を所望する場合、図１１に示す例１のように、平均Ｆ０値個人性度が所定値（例えば２０）以下であることが個人性度条件１００１として設定される。 The personality condition 1001 is set according to the individuality that the user of the audio signal processing system 100 desires to conceal, and various personality conditions 1001 are set. For example, when it is desired to conceal gender, the individuality condition 1001 is set such that the average F0 value individuality is not more than a predetermined value (for example, 20) as in Example 1 shown in FIG.

また、図１１に示す例３の「平均Ｆ０値個人性度≦１５、且つ、フィラー使用頻度個人性度≦２０、且つ、声質特徴個人性度≦１０」のように、複数の個人性特徴量３２０に対応する個人性度３３０に対して、個人性度条件１００１を設定することも可能である。 In addition, a plurality of individuality feature quantities such as “average F0 value individuality degree ≦ 15, filler usage frequency individuality degree ≦ 20, and voice quality characteristic individuality degree ≦ 10” in Example 3 shown in FIG. It is also possible to set the individuality degree condition 1001 for the individuality degree 330 corresponding to 320.

また、個人性度に何らかの計算をして、計算結果に対して条件を設定してもよい。例えば、図１１に示す例４のように、すべての個人性特徴量に対応する個人性度の平均値を計算し、計算した平均値が所定値（例えば２５）以下であることが個人性度条件１００１として設定されてもよい。 Further, some calculation may be performed on the individuality degree, and a condition may be set for the calculation result. For example, as in Example 4 shown in FIG. 11, the average value of the individuality levels corresponding to all the individuality feature amounts is calculated, and the calculated average value is equal to or less than a predetermined value (for example, 25). The condition 1001 may be set.

音声信号処理パラメータ１００３について図１２を用いて説明する。図１２は、本発明の第１実施形態の音声信号処理パラメータ１００３の説明図である。 The audio signal processing parameter 1003 will be described with reference to FIG. FIG. 12 is an explanatory diagram of the audio signal processing parameter 1003 according to the first embodiment of this invention.

音声信号処理パラメータ１００３は、音声信号処理部１３０が音声データに音声信号処理を実行するためのパラメータである。 The audio signal processing parameter 1003 is a parameter for the audio signal processing unit 130 to execute audio signal processing on the audio data.

音声信号処理パラメータ１００３は、調整対象となる個人性特徴量の種類、調整方法、及び調整量（変化量）を含む。 The audio signal processing parameter 1003 includes the type of personality feature amount to be adjusted, the adjustment method, and the adjustment amount (change amount).

例えば、個人性特徴量３２０のフィラー出現頻度を調整する場合、調整対象となる個人性特徴量の種類にはフィラー出現頻度が登録され、調整方法には調整するフィラーの種類が登録され、調整量には削除するフィラーの個数が登録される。 For example, when adjusting the filler appearance frequency of the personality feature amount 320, the filler appearance frequency is registered in the type of personality feature amount to be adjusted, and the type of filler to be adjusted is registered in the adjustment method. The number of fillers to be deleted is registered in.

また、個人性特徴量３２０の平均Ｆ０値を調整する場合、調整対象となる個人性特徴量の種類には平均Ｆ０値が登録され、調整方法には平均Ｆ０値の調整に用いる公知のＰＳＯＬＡ（Pitch Synchronous Overlap Add Method）が登録され、調整量には平均Ｆ０値をどれだけ変化させるかを示す値が登録される。 Further, when adjusting the average F0 value of the personality feature value 320, the average F0 value is registered as the type of the personality feature value to be adjusted, and the adjustment method uses a publicly known PSOLA (used to adjust the average F0 value). Pitch Synchronous Overlap Add Method) is registered, and a value indicating how much the average F0 value is changed is registered in the adjustment amount.

次に、音声信号処理パラメータ調整手法について図１３を用いて説明する。図１３は、本発明の第１実施形態の音声信号処理パラメータ調整手法の説明図である。 Next, an audio signal processing parameter adjustment method will be described with reference to FIG. FIG. 13 is an explanatory diagram of an audio signal processing parameter adjustment method according to the first embodiment of this invention.

音声信号処理パラメータ調整手法には、調整対象となる個人性特徴量３２０、及び、音声信号処理パラメータ調整部１２２による１回の処理における調整量が、個人性度条件１００１ごとに登録される。 In the audio signal processing parameter adjustment method, the individuality feature amount 320 to be adjusted and the adjustment amount in one process by the audio signal processing parameter adjustment unit 122 are registered for each individuality degree condition 1001.

個人性度３３０が個人性度条件１００１の図１１に示す例１を満たさない場合、音声信号処理パラメータ調整部１２２は、音声信号処理パラメータ調整手法の図１３に示す例１に基づいて音声信号処理パラメータ１００３を調整する。つまり、平均Ｆ０値個人性度を下げる方向に平均Ｆ０値を１０Ｈｚ調整する。なお、音声信号処理パラメータ調整部１２２は、平均Ｆ０値を大きくするか小さくするかについては、個人性度算出モデル３４０を参照して決定する。例えば、平均Ｆ０値が２３０Ｈｚであって、平均Ｆ０値個人性度が２００であって、平均Ｆ０値のしきい値が２０より小さいことが個人性度条件１００１である場合、音声信号処理パラメータ調整部１２２は、図９に示す男性平均Ｆ０値個人性度算出モデル９１１を参照し、平均Ｆ０値を１０Ｈｚ小さくするように音声信号処理パラメータ１００３を調整する。 When the individuality degree 330 does not satisfy the example 1 shown in FIG. 11 of the individuality degree condition 1001, the audio signal processing parameter adjustment unit 122 performs the audio signal processing based on the example 1 shown in FIG. 13 of the audio signal processing parameter adjustment method. The parameter 1003 is adjusted. That is, the average F0 value is adjusted by 10 Hz in the direction of decreasing the average F0 value individuality. Note that the audio signal processing parameter adjustment unit 122 determines whether to increase or decrease the average F0 value with reference to the individuality degree calculation model 340. For example, when the average F0 value is 230 Hz, the average F0 value personality is 200, and the threshold value of the average F0 value is smaller than 20, the personality condition 1001 is adjusted. The unit 122 refers to the male average F0 value individuality degree calculation model 911 shown in FIG. 9 and adjusts the audio signal processing parameter 1003 so that the average F0 value is reduced by 10 Hz.

なお、音声信号処理パラメータ調整手法では、音声信号処理パラメータ調整部１２２の処理負荷軽減の観点から、個人性特徴量３２０が単調増加又は単調減少するように調整量を決定することが望ましい。 In the audio signal processing parameter adjustment method, from the viewpoint of reducing the processing load of the audio signal processing parameter adjustment unit 122, it is desirable to determine the adjustment amount so that the individuality feature amount 320 monotonously increases or decreases monotonously.

図１０に戻り、個人性度再算出部１２３について説明する。 Returning to FIG. 10, the individuality degree recalculation unit 123 will be described.

個人性度再算出部１２３は、個人性特徴量推定モジュール１００５及び個人性度算出モジュール１００６を含む。 The individuality degree recalculation unit 123 includes an individuality feature amount estimation module 1005 and an individuality degree calculation module 1006.

個人性特徴量推定モジュール１００５は、音声信号処理パラメータ調整部１２２によって調整された音声信号処理パラメータ１００３に基づいて、調整後の個人性特徴量３２０を推定し、パラメータ決定用データ群１００２に含まれる個人性特徴量３２０を更新する。 The individuality feature amount estimation module 1005 estimates the adjusted individuality feature amount 320 based on the audio signal processing parameter 1003 adjusted by the audio signal processing parameter adjustment unit 122 and is included in the parameter determination data group 1002. The personality feature amount 320 is updated.

個人性度算出モジュール１００６は、個人性特徴量推定モジュール１００５によって推定された個人性特徴量３２０に基づいて個人性度３３０を再度算出し、パラメータ決定用データ群１００２の個人性度３３０を更新する。個人性度算出モジュール１００６には、音声個人性特徴分析部１１０に含まれる個人性度算出部１１３と同じプログラムを用いることが望ましい。 The individuality degree calculation module 1006 recalculates the individuality degree 330 based on the individuality feature quantity 320 estimated by the individuality feature quantity estimation module 1005, and updates the individuality degree 330 of the parameter determination data group 1002. . It is desirable to use the same program as the individuality degree calculation unit 113 included in the voice individuality feature analysis unit 110 for the individuality degree calculation module 1006.

次に、図４に示す音声データの個人性特徴量３２０である平均Ｆ０値が２３０Ｈｚであり、平均Ｆ０値個人性度が２００である場合を例に、音声信号処理パラメータ決定部１２０の全体動作について説明する。 Next, the overall operation of the audio signal processing parameter determination unit 120 will be described by taking as an example a case where the average F0 value, which is the individuality feature amount 320 of the audio data shown in FIG. Will be described.

ここで、個人性度条件１００１は平均Ｆ０値個人性度が３０以下であると設定されているものとし、音声信号処理パラメータ１００３の初期値は、「平均Ｆ０修正有無＝無、平均Ｆ０値修正手法＝無、平均Ｆ０値調整量＝０Ｈｚ」であるとする。 Here, it is assumed that the individuality condition 1001 is set such that the average F0 value individuality is 30 or less, and the initial value of the audio signal processing parameter 1003 is “average F0 correction presence / absence = none, average F0 value correction”. It is assumed that “method = none, average F0 value adjustment amount = 0 Hz”.

まず、個人性度条件判定部１２１は、平均Ｆ０値個人性度が２００であるので、個人性度条件１００１を満たさないと判定し、パラメータ決定用データ群１００２を音声信号処理パラメータ調整部１２２に入力する。 First, the individuality degree condition determination unit 121 determines that the individuality degree condition 1001 is not satisfied because the average F0 value individuality degree is 200, and sets the parameter determination data group 1002 to the audio signal processing parameter adjustment unit 122. input.

ここで、音声信号処理パラメータ調整手法は、個人性度を下げる方向に平均Ｆ０値を１０Ｈｚずつ調整する調整手法であるとする。音声信号処理パラメータ調整部１２２は、平均Ｆ０値（２３０Ｈｚ）が図９に示す男性の平均Ｆ０値の平均値（１２０Ｈｚ）より大きいので、音声信号処理パラメータ１００３を「平均Ｆ０調整有無＝有、平均Ｆ０値調整手法＝ＰＳＯＬＡ、平均Ｆ０値調整量＝−１０Ｈｚ」のように調整し、パラメータ決定用データ群１００２の音声信号処理パラメータ１００３を更新し、パラメータ決定用データ群１００２を個人性度再算出部１２３に入力する。 Here, it is assumed that the audio signal processing parameter adjustment method is an adjustment method for adjusting the average F0 value by 10 Hz in the direction of decreasing the individuality. Since the average F0 value (230 Hz) is larger than the average value (120 Hz) of male average F0 values shown in FIG. “F0 value adjustment method = PSOLA, average F0 value adjustment amount = −10 Hz”, the audio signal processing parameter 1003 of the parameter determination data group 1002 is updated, and the parameter determination data group 1002 is recalculated. Input to the unit 123.

個人性特徴量推定モジュール１００５は、調整後の音声信号処理パラメータ１００３に基づいて、平均Ｆ０値を１０Ｈｚ減算し、パラメータ決定用データ群１００２の平均Ｆ０値を２２０Ｈｚに更新する。 Based on the adjusted audio signal processing parameter 1003, the personality feature amount estimation module 1005 subtracts the average F0 value by 10 Hz, and updates the average F0 value of the parameter determination data group 1002 to 220 Hz.

次に、個人性度算出モジュール１００６は、更新後の平均Ｆ０値（２２０Ｈｚ）の個人性度３３０を算出し、平均Ｆ０値個人性度を１７０に更新する。 Next, the individuality degree calculation module 1006 calculates the updated individuality degree 330 of the average F0 value (220 Hz), and updates the average F0 value individuality degree to 170.

以上によって、パラメータ決定用データ群１００２は、「平均Ｆ０値＝２２０Ｈｚ」、「平均Ｆ０値個人性度＝１７０」、「平均Ｆ０調整有無＝有、平均Ｆ０値調整手法＝ＰＳＯＬＡ、平均Ｆ０値調整量＝−１０Ｈｚ」に更新される。 As described above, the parameter determination data group 1002 includes “average F0 value = 220 Hz”, “average F0 value individuality = 170”, “average F0 adjustment presence / absence = present, average F0 value adjustment method = PSOLA, average F0 value adjustment”. The amount is updated to “−10 Hz”.

更新後のパラメータ決定用データ群１００２が個人性度条件判定部１２１に入力され、個人性度条件判定部１２１によって、個人性度３３０が個人性度条件１００１を満たすと判定されるまで、上述した処理を繰り返す。 The updated parameter determination data group 1002 is input to the individuality degree condition determination unit 121 and is described above until the individuality degree condition determination unit 121 determines that the individuality degree 330 satisfies the individuality degree condition 1001. Repeat the process.

最後に、パラメータ決定用データ群１００２が、「平均Ｆ０値＝１５０Ｈｚ」、「平均Ｆ０値個人性度＝３０」、「平均Ｆ０調整有無＝有、平均Ｆ０値調整手法＝ＰＳＯＬＡ、平均Ｆ０値調整量＝−８０Ｈｚ」となった場合、個人性度条件判定部１２１は、個人性度３３０が個人性度条件１００１を満たすと判定し、音声信号処理パラメータ決定部１２０のすべての処理が終了する。 Finally, the parameter determination data group 1002 includes “average F0 value = 150 Hz”, “average F0 value individuality = 30”, “average F0 adjustment presence / absence = present, average F0 value adjustment method = PSOLA, average F0 value adjustment” In the case of “amount = −80 Hz”, the individuality degree condition determining unit 121 determines that the individuality degree 330 satisfies the individuality degree condition 1001, and all the processes of the audio signal processing parameter determining unit 120 are completed.

（音声信号処理部１３０の動作）
次に、音声信号処理部１３０の動作について、図１４を用いて説明する。図１４は、本発明の第１実施形態の音声信号処理部１３０の機能ブロック図である。 (Operation of the audio signal processing unit 130)
Next, the operation of the audio signal processing unit 130 will be described with reference to FIG. FIG. 14 is a functional block diagram of the audio signal processing unit 130 according to the first embodiment of this invention.

音声信号処理部１３０は、個人性特徴除去部１４０１及び音声合成部１４０５を含む。 The voice signal processing unit 130 includes a personality feature removal unit 1401 and a voice synthesis unit 1405.

個人性特徴除去部１４０１は、音声信号処理パラメータ１００３に基づいて、音声データに音声信号処理を実行し、言語個人性特徴除去モジュール１４０２、韻律個人性特徴除去モジュール１４０３、及び音韻個人性特徴除去モジュール１４０４の少なくとも一つを含む。 The personality feature removal unit 1401 executes speech signal processing on speech data based on the speech signal processing parameter 1003, and performs a language personality feature removal module 1402, a prosodic personality feature removal module 1403, and a phonological personality feature removal module. Including at least one of 1404.

言語個人性特徴除去モジュール１４０２は、音声信号処理パラメータ１００３に基づいて、音声入力Ｉ／Ｆ１０２から入力された音声データに対して、言語特徴量３１１を調整する音声信号処理を実行する。韻律個人性特徴除去モジュール１４０３は、音声信号処理パラメータ１００３に基づいて、音声入力Ｉ／Ｆ１０２から入力された音声データに対して、韻律特徴量３１２を調整する音声信号処理を実行する。音韻個人性特徴除去モジュール１４０４は、音声信号処理パラメータ１００３に基づいて、音声入力Ｉ／Ｆ１０２から入力された音声データに対して、音韻特徴量３１３を調整する音声信号処理を実行する。 The language individuality feature removal module 1402 executes voice signal processing for adjusting the language feature quantity 311 on the voice data input from the voice input I / F 102 based on the voice signal processing parameter 1003. The prosodic personality feature removal module 1403 executes speech signal processing for adjusting the prosodic feature value 312 on the speech data input from the speech input I / F 102 based on the speech signal processing parameter 1003. The phoneme personality feature removal module 1404 executes voice signal processing for adjusting the phoneme feature quantity 313 on the voice data input from the voice input I / F 102 based on the voice signal processing parameter 1003.

ただし、言語個人性特徴除去モジュール１４０２、韻律個人性特徴除去モジュール１４０３、及び音韻個人性特徴除去モジュール１４０４は、音声信号処理パラメータ１００３に定義されていないパラメータについては、初期値を用いて、音声信号処理を実行するが、通常、音声信号処理パラメータ１００３の初期値は言語個人性特徴除去モジュール１４０２、韻律個人性特徴除去モジュール１４０３、及び音韻個人性特徴除去モジュール１４０４を実行しないように設定されている。 However, the language personality feature removal module 1402, the prosodic personality feature removal module 1403, and the phonological personality feature removal module 1404 use initial values for parameters that are not defined in the speech signal processing parameter 1003, Usually, the initial value of the speech signal processing parameter 1003 is set not to execute the language personality feature removal module 1402, the prosodic personality feature removal module 1403, and the phonological personality feature removal module 1404. .

音声合成部１４０５は、個人性特徴除去部１４０１によって音声信号処理が実行された音声特徴量３１０を用いて、音声を再度合成し、合成した音声を音声出力Ｉ／Ｆ１０５に出力する。音声合成部１４０５の音声の合成手法としては、ＰＳＯＬＡのように、元の音声波形を切断した後で、切断した音声再接続する手法、及び声道フィルタを用いたパラメータ合成手法などがある。 The speech synthesizer 1405 again synthesizes the speech using the speech feature quantity 310 that has been subjected to speech signal processing by the personality feature remover 1401 and outputs the synthesized speech to the speech output I / F 105. As a speech synthesis method of the speech synthesizer 1405, there are a method of reconnecting the disconnected speech after cutting the original speech waveform, a parameter synthesis method using a vocal tract filter, and the like, as in PSOLA.

図１０では、図４に示す音声データの個人性特徴量３２０である平均Ｆ０値が２３０Ｈｚであり、平均Ｆ０値個人性度が２００である場合、音声信号処理パラメータ決定部１２０は、音声信号処理パラメータ１００３を「平均Ｆ０調整有無＝有、平均Ｆ０値調整手法＝ＰＳＯＬＡ、平均Ｆ０値調整量＝−８０Ｈｚ」と決定することを説明した。この音声信号処理パラメータ１００３が音声信号処理部１３０に入力された場合の音声信号処理部１３０の動作について説明する。 In FIG. 10, when the average F0 value, which is the personality feature amount 320 of the audio data shown in FIG. 4, is 230 Hz and the average F0 value individuality is 200, the audio signal processing parameter determination unit 120 performs the audio signal processing. It has been described that the parameter 1003 is determined as “average F0 adjustment presence / absence = present, average F0 value adjustment method = PSOLA, average F0 value adjustment amount = −80 Hz”. The operation of the audio signal processing unit 130 when the audio signal processing parameter 1003 is input to the audio signal processing unit 130 will be described.

平均Ｆ０値は韻律個人性特徴量であるので、平均値Ｆ０値を調整するためには、韻律個人性特徴量を調整しなければならない。このため、上述した音声信号処理パラメータ１００３が音声信号処理部１３０に入力された場合、韻律個人性特徴除去モジュール１４０３のみが音声信号処理を実行する。 Since the average F0 value is a prosodic personality feature amount, in order to adjust the average F0 value, the prosodic personality feature amount must be adjusted. For this reason, when the audio signal processing parameter 1003 described above is input to the audio signal processing unit 130, only the prosodic personality feature removal module 1403 executes the audio signal processing.

具体的には、韻律個人性特徴除去モジュール１４０３は、音声入力Ｉ／Ｆ１０２から入力された音声の有声音の波形から周期波形を検出する。そして、韻律個人性特徴除去モジュール１４０３は、検出した周期波形の間隔が１／８０秒短くなるように検出した周期波形を重畳することによって、平均Ｆ０値を８０Ｈｚ小さくする。 Specifically, the prosodic personality feature removal module 1403 detects a periodic waveform from the waveform of the voiced sound of the voice input from the voice input I / F 102. Then, the prosodic personality feature removal module 1403 reduces the average F0 value by 80 Hz by superimposing the detected periodic waveform so that the interval between the detected periodic waveforms is shortened by 1/80 seconds.

韻律個人性特徴除去モジュール１４０３は、すべての有声音の波形のＦ０値を８０Ｈｚ小さくしてもよいが、音声自体の抑揚及び自然性を確保するために、有声音の波形の振幅を調整後のＦ０値に基づいて変更してもよい。具体的には、韻律個人性特徴除去モジュール１４０３は、有声音の波形のＦ０値を小さくした場合、有声音の波形の振幅を小さくし、有声音の波形のＦ０値を大きくした場合、有声音の波形の振幅を大きくする。 The prosodic personality feature removal module 1403 may reduce the F0 value of all voiced sound waveforms by 80 Hz, but in order to ensure the inflection and naturalness of the sound itself, the amplitude of the voiced sound waveform is adjusted. You may change based on F0 value. Specifically, the prosodic personality feature removal module 1403 reduces the amplitude of the voiced sound waveform when the F0 value of the voiced sound waveform is reduced, and increases the F0 value of the voiced sound waveform when the F0 value of the voiced sound waveform is increased. Increase the waveform amplitude.

図１５Ａは、本発明の第１実施形態の平均Ｆ０値が調整される前の基本周波数の時系列データである。図１５Ａは、図６Ａと同じであり、平均Ｆ０値が２３０Ｈｚであることを示している。 FIG. 15A is time-series data of the fundamental frequency before the average F0 value is adjusted according to the first embodiment of this invention. FIG. 15A is the same as FIG. 6A and shows that the average F0 value is 230 Hz.

図１５Ｂは、本発明の第１実施形態の平均Ｆ０値が調整された後の基本周波数の時系列データである。韻律個人性特徴除去モジュール１４０３によって平均Ｆ０値が８０Ｈｚ小さくされ、平均Ｆ０値が１５０Ｈｚとなる。 FIG. 15B is time-series data of the fundamental frequency after the average F0 value of the first embodiment of the present invention is adjusted. The average F0 value is reduced by 80 Hz by the prosodic personality feature removal module 1403, and the average F0 value becomes 150 Hz.

なお、上述した音声信号処理パラメータ１００３では、平均Ｆ０値のみを調整するので、基本周波数以外の音声特徴量３１０は図５、図６Ａ、図６Ｃ、及び図７から変化しない。 Note that, in the audio signal processing parameter 1003 described above, only the average F0 value is adjusted, so that the audio feature quantity 310 other than the fundamental frequency does not change from FIGS. 5, 6A, 6C, and 7.

本実施形態では、所定の個人性度条件１００１を満たさない個人性度３３０がある場合、当該個人性度３３０が個人性度条件１００１を満たすように音声信号処理パラメータ１００３を決定し、決定した音声信号処理パラメータ１００３に基づいて、個人性度３３０が個人性度条件１００１を満たすように音声信号処理を実行する。これによって、個人性の強い個人性情報のみを除去し、音声データの劣化を可能な限り防止できる。 In the present embodiment, when there is a personality degree 330 that does not satisfy the predetermined personality degree condition 1001, the audio signal processing parameter 1003 is determined so that the individuality degree 330 satisfies the personality degree condition 1001, and the determined voice Based on the signal processing parameter 1003, the audio signal processing is executed so that the individuality degree 330 satisfies the individuality degree condition 1001. As a result, it is possible to remove only personality information with strong personality and prevent deterioration of the audio data as much as possible.

（第２実施の形態）
本発明の第２実施形態を図１６〜図２１を用いて説明する。 (Second Embodiment)
A second embodiment of the present invention will be described with reference to FIGS.

本実施形態では、第１実施形態の音声信号処理システムを、アップロードされた音声データを記憶し、ダウンロード要求に応じて音声データを転送するサーバに適用する。この場合、サーバの十分な処理速度及び記憶装置の十分な容量を利用して、音声データのアップロード時に、複数の個人性度条件ごとに、個人性度条件を満たし、音質の劣化が最小となる音声データを生成しておく。これによって、個人性度条件を満たし、かつ音質の劣化が最小となる音声データを提供できる。 In this embodiment, the audio signal processing system of the first embodiment is applied to a server that stores uploaded audio data and transfers the audio data in response to a download request. In this case, using the sufficient processing speed of the server and the sufficient capacity of the storage device, at the time of uploading the voice data, the individuality condition is satisfied for each of the plurality of individuality conditions, and the deterioration of sound quality is minimized. Generate voice data. As a result, it is possible to provide audio data that satisfies the individuality condition and minimizes deterioration in sound quality.

図１６は、本発明の第１実施形態の情報処理システムの構成の説明図である。 FIG. 16 is an explanatory diagram of the configuration of the information processing system according to the first embodiment of this invention.

情報処理システムは、サーバ１６００、音声データをサーバ１６００にアップロードするアップロード端末１６１０、及び音声データをサーバ１６００からダウンロードするダウンロード端末１６２０を備える。サーバ１６００、アップロード端末１６１０、及びダウンロード端末１６２０は、互いに通信可能なネットワーク１６０５を介して接続される。なお、図１６では、アップロード端末１６１０及びダウンロード端末１６２０を一つずつ示したが、アップロード端末１６１０及びダウンロード端末１６２０は複数あってもよい。 The information processing system includes a server 1600, an upload terminal 1610 that uploads audio data to the server 1600, and a download terminal 1620 that downloads audio data from the server 1600. The server 1600, the upload terminal 1610, and the download terminal 1620 are connected via a network 1605 that can communicate with each other. In FIG. 16, one upload terminal 1610 and one download terminal 1620 are shown, but a plurality of upload terminals 1610 and download terminals 1620 may be provided.

サーバ１６００は、音声信号処理システム１００として機能し、ＣＰＵ１０３、メモリ１０４、補助記憶装置１０１、及び通信Ｉ／Ｆ（インタフェース）１６０１を備える。ＣＰＵ１０３、メモリ１０４、及び補助記憶装置１０１は、図１に示す音声信号処理システム１００と同じであるので、説明を省略する。なお、ＣＰＵ１０３、メモリ１０４、補助記憶装置１０１、及び通信Ｉ／Ｆ（インタフェース）１６０１を備える。ＣＰＵ１０３、メモリ１０４、及び補助記憶装置１０１は、バス１０６を介して互いに接続する。 The server 1600 functions as the audio signal processing system 100 and includes a CPU 103, a memory 104, an auxiliary storage device 101, and a communication I / F (interface) 1601. The CPU 103, the memory 104, and the auxiliary storage device 101 are the same as the audio signal processing system 100 shown in FIG. The CPU 103, the memory 104, the auxiliary storage device 101, and a communication I / F (interface) 1601 are provided. The CPU 103, the memory 104, and the auxiliary storage device 101 are connected to each other via the bus 106.

通信Ｉ／Ｆ１６０１は、音声信号処理システム１００をネットワーク１６０５に接続するインタフェースである。通信Ｉ／Ｆ１６０１は、アップロード端末１６１０から音声データを受信する場合、音声入力Ｉ／Ｆ１０２として機能する。通信Ｉ／Ｆ１６０１は、ダウンロード端末１６２がサーバ１６００から音声データをダウンロードする場合、音声出力Ｉ／Ｆ１０５として機能する。 The communication I / F 1601 is an interface that connects the audio signal processing system 100 to the network 1605. The communication I / F 1601 functions as a voice input I / F 102 when receiving voice data from the upload terminal 1610. The communication I / F 1601 functions as an audio output I / F 105 when the download terminal 162 downloads audio data from the server 1600.

アップロード端末１６１０は、入力された音声データをサーバ１６００にアップロードする端末であり、ＣＰＵ１０３、メモリ１０４、通信Ｉ／Ｆ１６１１、及び音声入力Ｉ／Ｆ１６１２を備える。ＣＰＵ１０３、メモリ１０４、通信Ｉ／Ｆ１６１１、及び音声入力Ｉ／Ｆ１６１２は、バス１０６を介して接続される。ＣＰＵ１０３及びメモリ１０４は、サーバ１６００と同じなので説明を省略する。 The upload terminal 1610 is a terminal that uploads input voice data to the server 1600, and includes a CPU 103, a memory 104, a communication I / F 1611, and a voice input I / F 1612. The CPU 103, the memory 104, the communication I / F 1611, and the voice input I / F 1612 are connected via the bus 106. Since the CPU 103 and the memory 104 are the same as the server 1600, the description thereof is omitted.

通信Ｉ／Ｆ１６１１は、アップロード端末１６１０をネットワーク１６０５に接続するインタフェースである。音声入力Ｉ／Ｆ１６１２は、アップロード端末１６１０に音声を入力するためのインタフェースである。 The communication I / F 1611 is an interface that connects the upload terminal 1610 to the network 1605. The voice input I / F 1612 is an interface for inputting voice to the upload terminal 1610.

ダウンロード端末１６２０は、サーバ１６００から音声をダウンロードする端末であり、ＣＰＵ１０３、メモリ１０４、通信Ｉ／Ｆ１６２１、及び音声出力Ｉ／Ｆ１６２２を備える。ＣＰＵ１０３、メモリ１０４、通信Ｉ／Ｆ１６１１、及び音声出力Ｉ／Ｆ１６１３は、バス１０６を介して接続される。ＣＰＵ１０３及びメモリ１０４は、サーバ１６００と同じなので説明を省略する。 The download terminal 1620 is a terminal that downloads audio from the server 1600, and includes a CPU 103, a memory 104, a communication I / F 1621, and an audio output I / F 1622. The CPU 103, the memory 104, the communication I / F 1611, and the audio output I / F 1613 are connected via the bus 106. Since the CPU 103 and the memory 104 are the same as the server 1600, the description thereof is omitted.

通信Ｉ／Ｆ１６２１は、ダウンロード端末１６２０をネットワーク１６０５に接続するインタフェースである。音声出力Ｉ／Ｆ１６２２は、ダウンロード端末１６２０が音声を出力するためのインタフェースである。 The communication I / F 1621 is an interface that connects the download terminal 1620 to the network 1605. The audio output I / F 1622 is an interface for the download terminal 1620 to output audio.

通信Ｉ／Ｆ１６０１、１６１１、及び１６２１は、サーバ１６００、アップロード端末１６１０、ダウンロード端末１６２０との間で音声データを通信可能である。 The communication I / Fs 1601, 1611, and 1621 can communicate audio data with the server 1600, the upload terminal 1610, and the download terminal 1620.

なお、サーバ１６００、アップロード端末１６１０、及びダウンロード端末１６２０は、同一の計算機に集約されてもよい。 Note that the server 1600, the upload terminal 1610, and the download terminal 1620 may be integrated into the same computer.

図１７は、本発明の第２実施形態のサーバ１６００の機能ブロック図である。図１７に示すサーバ１６００の機能ブロックのうち図２に示す音声信号処理システム１００と同じ機能ブロックは、同じ符号を付与し、説明を省略する。 FIG. 17 is a functional block diagram of the server 1600 according to the second embodiment of this invention. Of the functional blocks of the server 1600 shown in FIG. 17, the same functional blocks as those of the audio signal processing system 100 shown in FIG.

サーバ１６００の動作の概略について説明する。 An outline of the operation of the server 1600 will be described.

ダウンロード端末１６２０又はダウンロード端末１６２０のユーザには個人性権限が設定される。個人性権限はどれくらいの個人性特徴を開示するかを示す情報であり、個人性権限に対応して個人性度条件１００１が設定される。 Individuality authority is set for the download terminal 1620 or the user of the download terminal 1620. The individuality authority is information indicating how many individuality characteristics are disclosed, and an individuality degree condition 1001 is set corresponding to the individuality authority.

サーバ１６００は、音声データがアップロードされた場合、すべての個人性権限に対応する個人性度条件１００１ごとに、個人性度条件１００１を満たすようにアップロードされた音声データを調整し、調整した音声データを個人性権限ごとに記憶する。 When the audio data is uploaded, the server 1600 adjusts the uploaded audio data so as to satisfy the individuality condition 1001 for each individuality condition 1001 corresponding to all individuality authorities, and the adjusted audio data Is stored for each individuality authority.

また、サーバ１６００は、ダウンロード要求をダウンロード端末１６２０から受信した場合、ダウンロード要求を送信したダウンロード端末１６２０又は当該ダウンロード端末１６２０のユーザの個人性権限を特定し、特定した個人性権限に対応する音声データを検索し、検索した音声データをダウンロード端末１６２０に送信する。 Further, when the server 1600 receives a download request from the download terminal 1620, the server 1600 specifies the personality authority of the download terminal 1620 that transmitted the download request or the user of the download terminal 1620, and the audio data corresponding to the specified personality authority And the searched voice data is transmitted to the download terminal 1620.

サーバ１６００は、通信Ｉ／Ｆ１６０１、音声個人性特徴分析部１１０、音声信号処理パラメータ決定部１７００、音声信号処理部１３０、ユーザ情報抽出部１７３０、個人性権限設定部１７５０、音声検索部１７８０を備える。サーバ１６００の補助記憶装置１０１は、個人性度条件リスト１７１０、及び音声データベース１７２０を記憶する。 The server 1600 includes a communication I / F 1601, a voice personality feature analysis unit 110, a voice signal processing parameter determination unit 1700, a voice signal processing unit 130, a user information extraction unit 1730, a personality authority setting unit 1750, and a voice search unit 1780. . The auxiliary storage device 101 of the server 1600 stores a personality condition list 1710 and an audio database 1720.

まず、音声データがアップロードされた場合のサーバ１６００の処理について説明する。 First, processing of the server 1600 when audio data is uploaded will be described.

通信Ｉ／Ｆ１６０１は、アップロード端末１６１０によってアップロードされた音声データを受信した場合、受信した音声データを音声個人性特徴分析部１１０に入力する。 When the communication I / F 1601 receives voice data uploaded by the upload terminal 1610, the communication I / F 1601 inputs the received voice data to the voice personality characteristic analysis unit 110.

音声個人性特徴分析部１１０は、音声データに基づいて、音声特徴量３１０、個人性特徴量３２０、及び個人性度３３０を算出し、これらを音声信号処理パラメータ決定部１７００に入力する。 The voice personality feature analysis unit 110 calculates a voice feature value 310, a personality feature value 320, and a personality degree 330 based on the voice data, and inputs them to the voice signal processing parameter determination unit 1700.

ここで、個人性度条件リスト１７１０には、ダウンロード端末１６２０のユーザに設定し得るすべての個人性権限に対応する個人性度条件１００１が登録されている。 Here, in the personality degree condition list 1710, personality degree conditions 1001 corresponding to all personality authorities that can be set for the user of the download terminal 1620 are registered.

音声信号処理パラメータ決定部１７００には、個人性度条件リスト１７１０に登録された個人性度条件１００１ごとに、個人性度３３０が個人性度条件１００１を満たすような音声信号処理パラメータ１００３を決定し、決定した音声信号処理パラメータ１００３を音声信号処理部１３０に入力する。 The audio signal processing parameter determination unit 1700 determines an audio signal processing parameter 1003 such that the individuality degree 330 satisfies the individuality degree condition 1001 for each individuality degree condition 1001 registered in the individuality degree condition list 1710. The determined audio signal processing parameter 1003 is input to the audio signal processing unit 130.

なお、音声信号処理パラメータ決定部１７００は、図３に示す第１実施形態の音声信号処理パラメータ決定部１２０と異なる処理を実行するので、図１８で詳細を説明する。 The audio signal processing parameter determination unit 1700 executes processing different from that of the audio signal processing parameter determination unit 120 of the first embodiment shown in FIG. 3 and will be described in detail with reference to FIG.

音声信号処理部１３０は、音声信号処理パラメータ決定部１７００から入力された音声信号処理パラメータ１００３ごとに音声データに対して音声信号処理を実行し、音声信号処理を実行した音声データを音声データベース１７２０に格納する。 The audio signal processing unit 130 performs audio signal processing on the audio data for each audio signal processing parameter 1003 input from the audio signal processing parameter determination unit 1700, and stores the audio data on which the audio signal processing has been performed in the audio database 1720. Store.

音声データの音声データベース１７２０への格納方法について説明する。 A method for storing voice data in the voice database 1720 will be described.

音声データベース１７２０は、各個人性度条件に対応する個人性度条件音声データベース１７２１〜１７２３を含む。音声信号処理部１３０は、音声信号処理パラメータ１００３の決定に用いられた個人性度条件１００１に対応する個人性度条件音声データベース１７２１〜１７２３に、音声信号処理実行後の音声データを格納する。 The voice database 1720 includes personality degree condition voice databases 1721 to 1723 corresponding to the individuality degree conditions. The audio signal processing unit 130 stores the audio data after execution of the audio signal processing in the individuality condition audio databases 1721 to 1723 corresponding to the individuality condition 1001 used for determining the audio signal processing parameter 1003.

以上によって、アップロード端末１６１０からアップロードされた音声データが、個人性度条件リスト１７１０に登録された各個人性度条件１００１を満たすように調整され、各個人性度条件１００１を満たした音声データが、個人性度条件１００１ごとに格納される。 As described above, the audio data uploaded from the upload terminal 1610 is adjusted so as to satisfy each individuality condition 1001 registered in the individuality condition list 1710, and the audio data satisfying each individuality condition 1001 is Stored for each individuality condition 1001.

次に、サーバ１６００から音声データがダウンロードされる場合のサーバ１６００の動作について説明する。 Next, the operation of server 1600 when audio data is downloaded from server 1600 will be described.

サーバ１６００のダウンロード動作は、通信Ｉ／Ｆ１６０１がダウンロード端末１６２０から受信したダウンロード要求に応じて、ダウンロード端末１６２０のユーザに対応する個人性権限の音声データを音声データベース１７２０から検索し、検索した音声データを、ダウンロード要求を送信したダウンロード端末１６２０に通信Ｉ／Ｆ１６０１を介して送信する動作である。 In the download operation of the server 1600, in response to the download request received by the communication I / F 1601 from the download terminal 1620, the voice database 1720 searches the voice database 1720 for voice data having personality authority corresponding to the user of the download terminal 1620. Is transmitted via the communication I / F 1601 to the download terminal 1620 that has transmitted the download request.

サーバ１６００は、ダウンロード動作を実現するための構成として、ユーザ情報抽出部１７３０、個人性権限設定部１７５０、及び音声検索部１７８０を備える。 The server 1600 includes a user information extraction unit 1730, a personality authority setting unit 1750, and a voice search unit 1780 as a configuration for realizing the download operation.

まず、ユーザ情報抽出部１７３０は、通信Ｉ／Ｆ１６０１から入力されたダウンロード要求からユーザ情報１７４０を抽出し、抽出したユーザ情報１７４０を個人性権限設定部１７５０に入力する。ユーザ情報１７４０は、ダウンロード要求に含まれ、例えば、ダウンロード端末１６２０の種類、ダウンロード端末１６２０のユーザを特定可能なユーザ識別情報を含む。 First, the user information extraction unit 1730 extracts user information 1740 from the download request input from the communication I / F 1601 and inputs the extracted user information 1740 to the personality authority setting unit 1750. The user information 1740 is included in the download request and includes, for example, the type of the download terminal 1620 and user identification information that can identify the user of the download terminal 1620.

個人性権限設定部１７５０は、個人性権限設定ルール１７６０を参照し、ユーザ情報抽出部１７３０から入力されたユーザ情報１７４０に対応する個人性権限１７７０を特定し、特定した個人性権限１７７０を音声検索部１７８０に入力する。個人性権限設定ルール１７６０には、ダウンロード端末１６２０のユーザ情報１７４０と個人性権限１７７０との対応関係が登録される。個人性権限設定ルール１７６０は図２１で詳細を説明する。 The personality authority setting unit 1750 refers to the personality authority setting rule 1760, specifies the personality authority 1770 corresponding to the user information 1740 input from the user information extraction unit 1730, and performs a voice search for the specified personality authority 1770 Input to the unit 1780. In the personality authority setting rule 1760, the correspondence between the user information 1740 of the download terminal 1620 and the personality authority 1770 is registered. The personality authority setting rule 1760 will be described in detail with reference to FIG.

音声検索部１７８０は、音声データベース１７２０に含まれる個人性度条件音声データベース１７２１〜１７２３のうち入力された個人性権限１７７０に対応する個人性度条件音声データベースから、ダウンロード要求に一致する音声データを検索する。音声検索部１７８０は、検索された音声データを、ダウンロード要求を送信したダウンロード端末１６２０に通信Ｉ／Ｆ１６０１を介して送信する。 The voice search unit 1780 searches for voice data matching the download request from the personality degree condition voice database corresponding to the inputted individuality authority 1770 among the individuality degree condition voice databases 1721 to 1723 included in the voice database 1720. To do. The voice search unit 1780 transmits the searched voice data to the download terminal 1620 that has transmitted the download request via the communication I / F 1601.

以上によって、ダウンロード要求を送信したダウンロード端末１６２０の個人性権限に応じて、個人性を秘匿した音声データをダウンロード端末１６２０のユーザに提供できる。 As described above, according to the personality authority of the download terminal 1620 that transmitted the download request, it is possible to provide the voice data with confidentiality to the user of the download terminal 1620.

（音声信号処理パラメータ決定部１７００の動作）
次に、音声信号処理パラメータ決定部１７００の動作について図１８を用いて説明する。図１８は、本発明の第２実施形態の音声信号処理パラメータ決定部１７００の機能ブロック図である。図１８に示す音声信号処理パラメータ決定部１７００のうち、図１０に示す音声信号処理パラメータ決定部１２０と同じ機能ブロックは、同じ符号を付与し、説明を省略する。 (Operation of Audio Signal Processing Parameter Determination Unit 1700)
Next, the operation of the audio signal processing parameter determination unit 1700 will be described with reference to FIG. FIG. 18 is a functional block diagram of the audio signal processing parameter determination unit 1700 according to the second embodiment of this invention. In the audio signal processing parameter determination unit 1700 shown in FIG. 18, the same functional blocks as those of the audio signal processing parameter determination unit 120 shown in FIG.

本実施形態の音声信号処理パラメータ決定部１７００は、第１実施形態の音声信号処理パラメータ決定部１２０に、パラメータ調整終了条件１８０１、調整終了判定モジュール１８０２、最小音質劣化予測値出力部１８０３及び音質劣化推定部１８１０が追加されたものである。 The audio signal processing parameter determination unit 1700 according to the present embodiment includes, in addition to the audio signal processing parameter determination unit 120 according to the first embodiment, a parameter adjustment end condition 1801, an adjustment end determination module 1802, a minimum sound quality degradation prediction value output unit 1803, and a sound quality degradation. An estimation unit 1810 is added.

音声信号処理パラメータ決定部１７００は、個人性度条件リスト１７１０に登録された個人性度条件１００１から一つの個人性度条件１００１を処理対象の個人性度条件１００１として選択し、個人性度条件リスト１７１０に登録されたすべての個人性度条件１００１に対して以降の処理が実行されるまで以降の処理を繰り返す。 The audio signal processing parameter determination unit 1700 selects one individuality degree condition 1001 from the individuality degree conditions 1001 registered in the individuality degree condition list 1710 as the individuality degree condition 1001 to be processed, and the individuality degree condition list The subsequent processing is repeated until the subsequent processing is executed for all personality condition 1001 registered in 1710.

個人性度条件判定部１２１には、処理対象の個人性度条件１００１、及びパラメータ決定用データ群１００２が入力される。パラメータ決定用データ群１００２は、個人性特徴量３２０、個人性度３３０、及び音声信号処理パラメータ１００３を含む。 The personality condition determining unit 121 receives a personality condition 1001 to be processed and a parameter determination data group 1002. The parameter determination data group 1002 includes an individuality feature amount 320, an individuality degree 330, and an audio signal processing parameter 1003.

個人性度条件判定部１２１に処理対象の個人性度条件１００１、及びパラメータ決定用データ群１００２が入力された場合、個人性度条件判定モジュール１００４は、個人性度３３０が処理対象の個人性度条件１００１を満たすか否かを判定する。 When the individuality condition 1001 to be processed and the parameter determination data group 1002 are input to the individuality condition determination unit 121, the individuality condition determination module 1004 has the individuality degree 330 to be processed. It is determined whether or not the condition 1001 is satisfied.

個人性度３３０が処理対象の個人性度条件１００１を満たさないと個人性度条件判定モジュール１００４が判定した場合、調整終了判定モジュール１８０２は、パラメータ調整終了条件１８０１に対応する値がパラメータ調整終了条件１８０１を満たすか否かを判定する。パラメータ調整終了条件１８０１は、音声信号処理パラメータ調整部１２２が音声信号処理パラメータ１００３の調整を終了する条件であり、例えば、平均Ｆ０値が５０Ｈｚ以下であること、又は、調整回数が１０回以上であることなどである。パラメータ調整終了条件１８０１は図１９で詳細を説明する。 If the individuality degree condition determination module 1004 determines that the individuality degree 330 does not satisfy the individuality degree condition 1001 to be processed, the adjustment end determination module 1802 indicates that the value corresponding to the parameter adjustment end condition 1801 is a parameter adjustment end condition. It is determined whether or not 1801 is satisfied. The parameter adjustment end condition 1801 is a condition for the audio signal processing parameter adjustment unit 122 to end the adjustment of the audio signal processing parameter 1003. For example, the average F0 value is 50 Hz or less, or the number of adjustments is 10 or more. There are things. Details of the parameter adjustment end condition 1801 will be described with reference to FIG.

パラメータ調整終了条件１８０１に対応する値がパラメータ調整終了条件１８０１を満たさないと調整終了判定モジュール１８０２が判定した場合、音声信号処理パラメータ調整部１２２が第１実施形態と同じように音声信号処理パラメータ１００３を調整する。そして、個人性度再算出部１２３は、第１実施形態と同じように、調整後の音声信号処理パラメータ１００３に基づいて、個人性度３３０を再度算出する。個人性度条件判定モジュール１００４は、再度算出された個人性度３３０が処理対象の個人性度条件１００１を満たすか否かを判定する。 When the adjustment end determination module 1802 determines that the value corresponding to the parameter adjustment end condition 1801 does not satisfy the parameter adjustment end condition 1801, the audio signal processing parameter adjustment unit 122 performs the audio signal processing parameter 1003 as in the first embodiment. Adjust. Then, the individuality degree recalculation unit 123 calculates the individuality degree 330 again based on the adjusted audio signal processing parameter 1003 as in the first embodiment. The personality condition determination module 1004 determines whether or not the recalculated personality 330 satisfies the personality condition 1001 to be processed.

個人性度３３０が処理対象の個人性度条件１００１を満たすと個人性度条件判定モジュール１００４が判定した場合について説明する。 A case where the individuality degree condition determination module 1004 determines that the individuality degree 330 satisfies the individuality degree condition 1001 to be processed will be described.

この場合、個人性度条件判定モジュール１００４は、パラメータ決定用データ群１００２を音質劣化推定部１８１０に入力する。音質劣化推定部１８１０は、パラメータ決定用データ群１００２が入力された場合、以降の処理を実行する。音質劣化推定部１８１０は、音質劣化推定モジュール１８１１、音質向上モジュール１８１３、及び最小音質劣化予測値記憶モジュール１８１４を含む。 In this case, the personality degree condition determination module 1004 inputs the parameter determination data group 1002 to the sound quality degradation estimation unit 1810. When the parameter determination data group 1002 is input, the sound quality deterioration estimation unit 1810 performs the subsequent processing. The sound quality deterioration estimation unit 1810 includes a sound quality deterioration estimation module 1811, a sound quality improvement module 1813, and a minimum sound quality deterioration predicted value storage module 1814.

まず、音質劣化推定モジュール１８１１は、パラメータ決定用データ群１００２の個人性特徴量３２０に基づいて、音質劣化予測値１８１２を推定し、推定した音質劣化予測値１８１２を音質向上モジュール１８１３に入力する。音質の劣化量を推定する方法は公知であるので説明を省略する。 First, the sound quality deterioration estimation module 1811 estimates the sound quality deterioration predicted value 1812 based on the individuality feature amount 320 of the parameter determination data group 1002 and inputs the estimated sound quality deterioration predicted value 1812 to the sound quality improvement module 1813. Since the method for estimating the deterioration amount of the sound quality is known, the description thereof is omitted.

音質向上モジュール１８１３は、入力された音質劣化予測値１８１２が最小音質劣化予測値より小さいか否かを判定する。音質劣化予測値１８１２が最小音質劣化予測値より小さい場合、音質向上モジュール１８１３は音質が向上したと判定し、入力されたパラメータ決定用データ群１００２及び入力された音質劣化予測値１８１２を最小音質劣化予測値記憶モジュール１８１４に入力する。 The sound quality improvement module 1813 determines whether or not the input sound quality deterioration prediction value 1812 is smaller than the minimum sound quality deterioration prediction value. When the predicted sound quality deterioration value 1812 is smaller than the predicted minimum sound quality deterioration value, the sound quality improvement module 1813 determines that the sound quality has improved, and uses the input parameter determination data group 1002 and the input predicted sound quality deterioration value 1812 as the minimum sound quality deterioration. The predicted value storage module 1814 is input.

最小音質劣化予測値記憶モジュール１８１４は、入力された音質劣化予測値１８１２を最小音質劣化予測値としてメモリ１０４に記憶し、入力されたパラメータ決定用データ群１００２を最小劣化パラメータ決定用データ群としてメモリ１０４に記憶し、処理を音声信号処理パラメータ調整部１２２に移行する。 The minimum sound quality deterioration predicted value storage module 1814 stores the input sound quality deterioration predicted value 1812 as the minimum sound quality deterioration predicted value in the memory 104, and the input parameter determination data group 1002 as the minimum deterioration parameter determination data group. 104, and the process proceeds to the audio signal processing parameter adjustment unit 122.

一方、音質劣化予測値１８１２が最小音質劣化予測値以上である場合、音質向上モジュール１８１３は音質が向上していないと判定し、処理を音声信号処理パラメータ調整部１２２に移行する。 On the other hand, when the predicted sound quality degradation value 1812 is equal to or greater than the predicted minimum sound quality degradation value, the sound quality improvement module 1813 determines that the sound quality is not improved, and shifts the processing to the audio signal processing parameter adjustment unit 122.

次に、パラメータ調整終了条件１８０１に対応する値がパラメータ調整終了条件１８０１を満たすと調整終了判定モジュール１８０２が判定した場合について説明する。 Next, a case where the adjustment end determination module 1802 determines that the value corresponding to the parameter adjustment end condition 1801 satisfies the parameter adjustment end condition 1801 will be described.

この場合、最小音質劣化予測値出力部１８０３は、メモリ１０４に記憶された最小音質劣化予測値及び最小劣化パラメータ決定用データ群を音声信号処理部１３０に入力する。そして、音声信号処理パラメータ決定部１７００は、個人性度条件リスト１７１０に登録された個人性度条件１００１に未処理の個人性度条件１００１があれば、未処理の個人性度条件１００１を選択し、選択した個人性度条件１００１に対して上述した処理を実行し、個人性度条件リスト１７１０に登録された個人性度条件１００１に未処理の個人性度条件１００１がなければ、処理を終了する。 In this case, the minimum sound quality degradation predicted value output unit 1803 inputs the minimum sound quality degradation predicted value and the minimum degradation parameter determination data group stored in the memory 104 to the audio signal processing unit 130. Then, the audio signal processing parameter determination unit 1700 selects the unprocessed personality degree condition 1001 if there is an unprocessed personality degree condition 1001 in the personality degree condition 1001 registered in the personality degree condition list 1710. The above-described processing is executed for the selected personality condition 1001, and if there is no unprocessed personality condition 1001 in the personality condition 1001 registered in the personality condition list 1710, the process ends. .

以上のように、本実施形態の音声信号処理パラメータ決定部１７００は、個人性度が処理対象の個人性度条件１００１を満たす音声信号処理パラメータ１００３から音質の劣化が最小となる音声信号処理パラメータ１００３のみを音声信号処理部１３０に入力する。これによって、ダウンロード端末１６２０には、所望の個人性が秘匿され、かつ音質の劣化が最小となる音声データが提供されることになり、音声データをアップロードしたユーザの個人性を秘匿しつつ、コンテンツの魅力を低減させることを防止できる。 As described above, the audio signal processing parameter determination unit 1700 according to the present embodiment has the audio signal processing parameter 1003 that minimizes the deterioration in sound quality from the audio signal processing parameter 1003 that satisfies the personality degree condition 1001 to be processed. Only to the audio signal processing unit 130. As a result, the download terminal 1620 is provided with audio data in which the desired personality is concealed and the deterioration in sound quality is minimized, and the content of the user who uploaded the audio data is concealed while the content is concealed. It is possible to prevent reducing the attractiveness of.

図１９は、本発明の第２実施形態のパラメータ調整終了条件１８０１の具体例の説明図である。 FIG. 19 is an explanatory diagram of a specific example of the parameter adjustment end condition 1801 according to the second embodiment of this invention.

図１９では、パラメータ調整終了条件１８０１の五つの具体例を示す。 FIG. 19 shows five specific examples of the parameter adjustment end condition 1801.

パラメータ調整終了条件１８０１の一つ目の例は、「すべてのパラメータの組み合わせがテスト済み」である。 The first example of the parameter adjustment end condition 1801 is “all parameter combinations have been tested”.

パラメータ調整終了条件１８０１の二つ目の例は、「音質十分」である。音質劣化推定モジュール１８１１によって推定された音質劣化予測値１８１２が所定値以下であれば、ダウンロード端末１６２０のユーザに提供する音声データの音質が十分に担保されているので、パラメータの調整を終了する。 The second example of the parameter adjustment end condition 1801 is “sufficient sound quality”. If the predicted sound quality degradation value 1812 estimated by the sound quality degradation estimation module 1811 is equal to or less than a predetermined value, the sound quality of the audio data provided to the user of the download terminal 1620 is sufficiently secured, and parameter adjustment is terminated.

パラメータ調整終了条件１８０１の三つ目の例は、「時間切れ」である。音声信号処理パラメータ決定部１７００による処理が開始してから所定時間経過した場合、パラメータの調整を終了する。 A third example of the parameter adjustment end condition 1801 is “time out”. When a predetermined time has elapsed after the processing by the audio signal processing parameter determination unit 1700 is started, the parameter adjustment is ended.

パラメータ調整終了条件１８０１の四つ目の例は、「指定された調整回数に達した」である。音声信号処理パラメータ調整部１２２による音声信号処理パラメータ１００３の調整回数が所定回数に達した場合、パラメータの調整を終了する。 The fourth example of the parameter adjustment end condition 1801 is “the specified number of adjustments has been reached”. When the number of adjustments of the audio signal processing parameter 1003 by the audio signal processing parameter adjustment unit 122 reaches a predetermined number, the parameter adjustment ends.

パラメータ調整終了条件１８０１の五つ目の例は、「パラメータ限界値になった」である。例えば、平均Ｆ０値を小さくするように調整する場合、平均Ｆ０値が０になった場合、音声信号処理パラメータ調整部１２２は平均Ｆ０値を調整できないので、パラメータの調整を終了する。 The fifth example of the parameter adjustment end condition 1801 is “the parameter limit value has been reached”. For example, when adjusting the average F0 value to be small, when the average F0 value becomes 0, the audio signal processing parameter adjustment unit 122 cannot adjust the average F0 value, and thus the parameter adjustment ends.

図２０は、本発明の第２実施形態の音質劣化と平均Ｆ０値の個人性度変化との関係を示すグラフである。 FIG. 20 is a graph showing the relationship between the sound quality deterioration and the change in individuality of the average F0 value according to the second embodiment of the present invention.

図２０に示すように、音質の劣化量は、平均Ｆ０値の変化量が０である場合に最小となり、平均Ｆ０値の変化量に比例して大きくなる。 As shown in FIG. 20, the sound quality deterioration amount is minimized when the average F0 value change amount is 0, and increases in proportion to the average F0 value change amount.

ここで、図１０で説明した具体例について説明する。図１０では、平均Ｆ０値が１５０Ｈｚである場合に個人性度が３０となり、個人性度条件１００１を満たし、音声信号処理パラメータ１００３の調整が終了するが、本実施形態では、パラメータ調整終了条件１８０１を満たすまで音声信号処理パラメータ１００３の調整が継続される。 Here, the specific example described in FIG. 10 will be described. In FIG. 10, when the average F0 value is 150 Hz, the individuality degree is 30, and the individuality degree condition 1001 is satisfied, and the adjustment of the audio signal processing parameter 1003 is completed. However, in this embodiment, the parameter adjustment end condition 1801 The adjustment of the audio signal processing parameter 1003 is continued until the condition is satisfied.

ここで、平均Ｆ０値が５０Ｈｚ以下であることがパラメータ調整終了条件１８０１であるとする。 Here, it is assumed that the parameter adjustment end condition 1801 is that the average F0 value is 50 Hz or less.

この場合、平均Ｆ０値が１４０Ｈｚ〜５０Ｈｚまでの音声信号処理パラメータ１００３が算出されるが、図２０に示すように、平均Ｆ０値の変化量に比例して音質の劣化量が大きくなるので、個人性度条件１００１を満たし、かつ、音質の劣化量が最小となるのは、平均Ｆ０値が１５０Ｈｚ（平均Ｆ０値の変化量が−８０Ｈｚ）のときである。したがって、平均Ｆ０値の変化量が−８０Ｈｚである場合のパラメータ決定用データ群１００２及び平均Ｆ０値の変化量が−８０Ｈｚである場合の音質劣化予測値が音声信号処理部１３０に入力される。 In this case, the audio signal processing parameter 1003 having an average F0 value of 140 Hz to 50 Hz is calculated. As shown in FIG. 20, the sound quality deterioration amount increases in proportion to the change amount of the average F0 value. It is when the average F0 value is 150 Hz (the amount of change in the average F0 value is −80 Hz) that satisfies the quality condition 1001 and the sound quality deterioration amount is minimized. Therefore, the parameter determination data group 1002 when the change amount of the average F0 value is −80 Hz and the sound quality deterioration prediction value when the change amount of the average F0 value is −80 Hz are input to the audio signal processing unit 130.

図２１は、本発明の第２実施形態の個人性権限設定ルール１７６０の具体例の説明図である。 FIG. 21 is an explanatory diagram of a specific example of the individuality authority setting rule 1760 according to the second embodiment of this invention.

個人性権限設定ルール１７６０には、ユーザ情報１７４０と個人性権限１７７０との対応関係が登録される。 In the personality authority setting rule 1760, the correspondence between the user information 1740 and the personality authority 1770 is registered.

図２１に示す例１では、ダウンロード端末１６２０のＩＰアドレスが日本を示す場合、個人性権限１に特定され、ダウンロード端末１６２０のＩＰアドレスが中国を示す場合、個人性権限３に特定されることを示す。 In the example 1 shown in FIG. 21, when the IP address of the download terminal 1620 indicates Japan, it is specified as personality authority 1, and when the IP address of the download terminal 1620 indicates China, it is specified as personality authority 3. Show.

図２１に示す例２では、ダウンロード端末１６２０のユーザ属性が課長である場合、個人性権限１に特定され、ダウンロード端末１６２０のユーザ属性が部長である場合、個人性権限２に特定され、ダウンロード端末１６２０のユーザ属性が社長である場合、個人性権限３に特定されることを示す。 In Example 2 shown in FIG. 21, when the user attribute of the download terminal 1620 is a section manager, it is specified as individuality authority 1, and when the user attribute of the download terminal 1620 is a department manager, it is specified as individuality authority 2, and the download terminal If the user attribute 1620 is president, it indicates that the personality authority 3 is specified.

図２１に示す例３では、ダウンロード端末１６２０のユーザ属性が友達である場合、個人性権限１に特定され、ダウンロード端末１６２０のユーザ属性が同僚である場合、個人性権限２に特定され、ダウンロード端末１６２０のユーザ属性が家族である場合、個人性権限３に特定されることを示す。 In Example 3 shown in FIG. 21, when the user attribute of the download terminal 1620 is a friend, it is specified as personality authority 1, and when the user attribute of the download terminal 1620 is a colleague, it is specified as personality authority 2, and the download terminal If the user attribute 1620 is family, it indicates that the personality authority 3 is specified.

図２１に示す例４では、ダウンロード端末１６２０の種類がＯＳ１である場合、個人性権限１に特定され、ダウンロード端末１６２０の種類がＯＳ２である場合、個人性権限２に特定され、ダウンロード端末１６２０の種類がＯＳ３である場合、個人性権限３に特定されることを示す。 In the example 4 shown in FIG. 21, when the type of the download terminal 1620 is OS1, it is specified as individuality authority 1, and when the type of the download terminal 1620 is OS2, it is specified as individuality authority 2, and the download terminal 1620 When the type is OS3, it indicates that the personality authority 3 is specified.

以上のように、本実施形態の音声信号処理システム１００によれば、ダウンロード端末１６２０のユーザ情報１７４０に対応する個人性度条件１００１を満たす音声データから最良の音質の音声データを、ダウンロード端末１６２０のユーザに提供できる。 As described above, according to the audio signal processing system 100 of the present embodiment, the audio data of the best sound quality from the audio data satisfying the personality condition 1001 corresponding to the user information 1740 of the download terminal 1620 is obtained. Can be provided to users.

（第３実施形態）
本発明の第３実施形態を図２２及び図２３を用いて説明する。 (Third embodiment)
A third embodiment of the present invention will be described with reference to FIGS.

本実施形態は、発話した音声に対してリアルタイムで音声信号処理が実行される局面に音声信号処理システム１００を適用する実施形態である。本実施形態の音声信号処理システム１００を実装するデバイスの例としては、携帯電話及び固定電話などが想定される。音声信号処理パラメータ１００３に基づく音声信号処理の実行時間（遅延時間）は、音声信号処理の種類及び音声信号処理による音声データの変化量によって異なる。音声信号処理の実行に時間がかかりすぎると、実際の会話が成立しなくなる可能性がある。 The present embodiment is an embodiment in which the audio signal processing system 100 is applied to an aspect in which audio signal processing is executed in real time on spoken voice. As an example of a device on which the audio signal processing system 100 of this embodiment is mounted, a mobile phone, a fixed phone, and the like are assumed. The execution time (delay time) of audio signal processing based on the audio signal processing parameter 1003 differs depending on the type of audio signal processing and the amount of change in audio data due to audio signal processing. If speech signal processing takes too long to execute, actual conversation may not be established.

そこで、本実施形態では、個人性度条件１００１を満たす音声信号処理の実行時間を推定し、実行時間が最小となる音声信号処理を実行することによって、実際の会話が成立しなくなることを防止し、音声信号処理システム１００の処理負荷を軽減する。 Therefore, in the present embodiment, it is possible to prevent the actual conversation from being established by estimating the execution time of the audio signal processing that satisfies the individuality condition 1001 and executing the audio signal processing that minimizes the execution time. The processing load of the audio signal processing system 100 is reduced.

なお、遅延時間と音声信号処理による音声データの変化量との関係については図２３で詳細を説明する。 Details of the relationship between the delay time and the amount of change in audio data due to audio signal processing will be described with reference to FIG.

本実施形態の音声信号処理システム１００のハードウェア構成は、第１実施形態の図１に示す音声信号処理システム１００のハードウェア構成と同じであるので説明を省略する。 The hardware configuration of the audio signal processing system 100 of this embodiment is the same as the hardware configuration of the audio signal processing system 100 shown in FIG.

また、本実施形態の音声信号処理システム１００は、第１実施形態及び第２実施形態と同じく、音声個人性特徴分析部１１０、音声信号処理パラメータ決定部２２００（図２２参照）、及び音声信号処理部１３０を備える。 Also, the audio signal processing system 100 according to the present embodiment is similar to the first embodiment and the second embodiment in that the audio individuality feature analysis unit 110, the audio signal processing parameter determination unit 2200 (see FIG. 22), and the audio signal processing Part 130 is provided.

本実施形態では、音声信号処理パラメータ決定部２２００が実行する処理内容が第１実施形態及び第２実施形態と異なるので、音声信号処理パラメータ決定部２２００について図２２を用いて説明する。図２２は、本発明の第３実施形態の音声信号処理パラメータ決定部２２００の機能ブロック図である。図２２に示す音声信号処理パラメータ決定部２２００の機能ブロックのうち、第２実施形態の図１８に示す音声信号処理パラメータ決定部１７００と同じ構成は、同じ符号を付与し、説明を省略する。 In the present embodiment, since the processing contents executed by the audio signal processing parameter determination unit 2200 are different from those in the first embodiment and the second embodiment, the audio signal processing parameter determination unit 2200 will be described with reference to FIG. FIG. 22 is a functional block diagram of an audio signal processing parameter determination unit 2200 according to the third embodiment of this invention. Of the functional blocks of the audio signal processing parameter determination unit 2200 illustrated in FIG. 22, the same components as those of the audio signal processing parameter determination unit 1700 illustrated in FIG. 18 of the second embodiment are denoted by the same reference numerals, and description thereof is omitted.

音声信号処理パラメータ決定部２２００は、最小音質劣化予測値出力部１８０３に代えて最小遅延時間予測値出力部２２０１を備え、音質劣化推定部１８１０に代えて遅延時間推定部２２１０を備える。 The audio signal processing parameter determination unit 2200 includes a minimum delay time predicted value output unit 2201 instead of the minimum sound quality deterioration predicted value output unit 1803, and includes a delay time estimation unit 2210 instead of the sound quality deterioration estimation unit 1810.

遅延時間推定部２２１０は、遅延時間推定モジュール２２１１、遅延時間判定モジュール２２１３、最小遅延時間予測値記憶モジュール２２１４、及びデバイス状態取得モジュール２２１５を含む。 The delay time estimation unit 2210 includes a delay time estimation module 2211, a delay time determination module 2213, a minimum delay time predicted value storage module 2214, and a device state acquisition module 2215.

デバイス状態取得モジュール２２１５は、所定のタイミングで音声信号処理システム１００が実装されたデバイスの状態を取得する。デバイス状態取得モジュール２２１５が取得するデバイスの状態は、例えば、ＣＰＵ１０３の使用状況、メモリ１０４の使用状況、及び通信状況などを含む。ＣＰＵ１０３の使用状況は例えばＣＰＵ１０３の使用率であればよい。メモリ１０４の使用状況は例えばメモリ１０４の使用率であればよい。通信状況は例えば帯域であればよい。 The device status acquisition module 2215 acquires the status of the device on which the audio signal processing system 100 is mounted at a predetermined timing. The device status acquired by the device status acquisition module 2215 includes, for example, the usage status of the CPU 103, the usage status of the memory 104, and the communication status. The usage status of the CPU 103 may be the usage rate of the CPU 103, for example. The usage status of the memory 104 may be, for example, the usage rate of the memory 104. The communication status may be a bandwidth, for example.

個人性度条件判定モジュール１００４は、個人性度３３０が個人性度条件１００１を満たすと判定した場合、パラメータ決定用データ群１００２を遅延時間推定部２２１０の遅延時間推定モジュール２２１１に入力する。 When it is determined that the individuality degree 330 satisfies the individuality degree condition 1001, the individuality degree condition determination module 1004 inputs the parameter determination data group 1002 to the delay time estimation module 2211 of the delay time estimation unit 2210.

遅延時間推定モジュール２２１１は、パラメータ決定用データ群１００２が入力された場合、デバイス状態取得モジュール２２１５に音声信号処理システム１００が実装されたデバイスの状態を取得させる。次に、遅延時間推定モジュール２２１１は、デバイス状態取得モジュール２２１５が取得したデバイスの状態に基づいて、個人性度条件１００１を満たす個人性度に対応する音声信号処理パラメータ１００３に基づく音声信号処理の実行時間を示す遅延時間予測値２２１２を推定する。そして、遅延時間推定モジュール２２１１は、推定した遅延時間予測値２２１２を遅延時間判定モジュール２２１３に入力する。 When the parameter determination data group 1002 is input, the delay time estimation module 2211 causes the device state acquisition module 2215 to acquire the state of the device on which the audio signal processing system 100 is mounted. Next, the delay time estimation module 2211 performs audio signal processing based on the audio signal processing parameter 1003 corresponding to the individuality degree satisfying the individuality degree condition 1001 based on the device state acquired by the device state acquisition module 2215. A predicted delay time value 2212 indicating time is estimated. Then, the delay time estimation module 2211 inputs the estimated delay time prediction value 2212 to the delay time determination module 2213.

遅延時間判定モジュール２２１３は、入力された遅延時間予測値２２１２が所定のしきい値より小さいか否かを判定する。遅延時間判定モジュール２２１３は、入力された遅延時間予測値２２１２が所定のしきい値より小さいと判定した場合、入力された遅延時間予測値２２１２が最小遅延時間予測値より小さいか否かを判定する。 The delay time determination module 2213 determines whether or not the input delay time prediction value 2212 is smaller than a predetermined threshold value. When the delay time determination module 2213 determines that the input delay time prediction value 2212 is smaller than a predetermined threshold, the delay time determination module 2213 determines whether or not the input delay time prediction value 2212 is smaller than the minimum delay time prediction value. .

遅延時間判定モジュール２２１３は、入力された遅延時間予測値２２１２が最小遅延時間予測値より小さいと判定した場合、入力された遅延時間予測値２２１２を最小遅延時間予測値として記憶し、遅延時間推定部２２１０に入力されたパラメータ決定用データ群１００２を最小遅延時間パラメータ決定用データ群として記憶し、処理を音声信号処理パラメータ調整部１２２に移行する。 When it is determined that the input delay time prediction value 2212 is smaller than the minimum delay time prediction value, the delay time determination module 2213 stores the input delay time prediction value 2212 as the minimum delay time prediction value, and a delay time estimation unit The parameter determination data group 1002 input to 2210 is stored as the minimum delay time parameter determination data group, and the process proceeds to the audio signal processing parameter adjustment unit 122.

一方、遅延時間判定モジュール２２１３は、入力された遅延時間予測値２２１２が所定のしきい値以上であると判定した場合、又は、入力された遅延時間予測値２２１２が最小遅延時間予測値以上であると判定した場合、処理を音声信号処理パラメータ調整部１２２に移行する。 On the other hand, when the delay time determination module 2213 determines that the input delay time prediction value 2212 is greater than or equal to a predetermined threshold, or the input delay time prediction value 2212 is greater than or equal to the minimum delay time prediction value. If it is determined, the process proceeds to the audio signal processing parameter adjustment unit 122.

最小遅延時間予測値出力部２２０１は、パラメータ調整終了条件１８０１に対応する値がパラメータ調整終了条件１８０１を満たすと調整終了判定モジュール１８０２が判定した場合、最小遅延時間予測値及び最小遅延時間パラメータ決定用データ群を音声信号処理部１３０に入力し、処理を終了する。 When the adjustment end determination module 1802 determines that the value corresponding to the parameter adjustment end condition 1801 satisfies the parameter adjustment end condition 1801, the minimum delay time predicted value output unit 2201 determines the minimum delay time predicted value and the minimum delay time parameter. The data group is input to the audio signal processing unit 130, and the process ends.

これによって、個人性度条件１００１を満たす個人性度に対応する音声信号処理パラメータ１００３であって、音声信号処理の実行時間が所定のしきい値より小さく、かつ、音声信号処理の実行時間が最小となる音声信号処理パラメータ１００３が音声信号処理部１３０に入力される。このため、音声信号処理部１３０の音声信号処理の実行時間は所定のしきい値より小さくなるため、実際の会話が成立しなくなることを防止できる。また、音声信号処理部１３０の音声信号処理の実行時間が最小となるので、音声信号処理システム１００の処理負荷を最小にできる。 Thus, the audio signal processing parameter 1003 corresponding to the individuality degree satisfying the individuality degree condition 1001, the execution time of the audio signal processing is smaller than the predetermined threshold value, and the execution time of the audio signal processing is minimized. The audio signal processing parameter 1003 is input to the audio signal processing unit 130. For this reason, since the execution time of the audio signal processing of the audio signal processing unit 130 becomes smaller than the predetermined threshold value, it is possible to prevent the actual conversation from being established. In addition, since the execution time of the audio signal processing of the audio signal processing unit 130 is minimized, the processing load of the audio signal processing system 100 can be minimized.

図２２に示す音声信号処理パラメータ決定部２２００は、第２実施形態の音声信号処理パラメータ決定部１７００と同じく、個人性度３３０が個人性度条件１００１を満たしても、パラメータ調整終了条件を満たすまで、音声信号処理パラメータ１００３が調整される。しかし、図２２に示す音声信号処理パラメータ決定部２２０は、個人性度３３０が個人性度条件１００１を満たし、当該個人性度３３０に対応する音声信号処理パラメータ１００３に基づく音声信号処理の遅延時間予測値２２１２が所定のしきい値であれば、当該音声信号処理パラメータ１００３及び当該遅延時間予測値２２１２を音声信号処理部１３０に入力してもよい。 Similar to the audio signal processing parameter determination unit 1700 of the second embodiment, the audio signal processing parameter determination unit 2200 illustrated in FIG. 22 continues until the parameter adjustment end condition is satisfied even if the individuality degree 330 satisfies the individuality degree condition 1001. The audio signal processing parameter 1003 is adjusted. However, the audio signal processing parameter determination unit 220 shown in FIG. 22 predicts the delay time of the audio signal processing based on the audio signal processing parameter 1003 corresponding to the individuality degree 330 when the individuality degree 330 satisfies the individuality degree condition 1001. If the value 2212 is a predetermined threshold value, the audio signal processing parameter 1003 and the estimated delay time value 2212 may be input to the audio signal processing unit 130.

図２３は、本発明の第３実施形態の遅延時間と音声信号処理による音声データの変化量との関係を示すグラフである。 FIG. 23 is a graph showing the relationship between the delay time and the amount of change in audio data due to audio signal processing according to the third embodiment of the present invention.

図２３に示す、遅延時間は、平均Ｆ０値の変化量に比例して大きくなる。これは、平均Ｆ０値の変化量が大きい音声信号処理の実行には、多くの演算処理を実行しなければならず、音声信号処理にかかる時間が長くなり、遅延時間も長くなる。一方、平均Ｆ０値を変化量が小さい音声信号処理の実行には、少ない演算処理で実行できるので、音声信号処理にかかる時間が短くなり、遅延時間も短くなる。このため、平均Ｆ０値の変化量が０の場合、何も処理しないので、遅延時間が最小となる。 The delay time shown in FIG. 23 increases in proportion to the change amount of the average F0 value. This is because a large amount of arithmetic processing must be performed to execute the audio signal processing in which the amount of change in the average F0 value is large, and the time required for the audio signal processing increases and the delay time also increases. On the other hand, since the average F0 value can be executed with a small amount of arithmetic processing to execute the audio signal processing with a small change amount, the time required for the audio signal processing is shortened and the delay time is also shortened. For this reason, when the amount of change in the average F0 value is 0, nothing is processed, so that the delay time is minimized.

なお、本発明は上述した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、上述した実施形態は本発明を分かりやすく説明するために詳細に説明したのであり、必ずしも説明の全ての構成を備えるものに限定されものではない。 In addition, this invention is not limited to embodiment mentioned above, Various modifications are included. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to one having all the configurations described.

１１０音声個人性特徴分析部
１１１音声特徴量算出部
１１２個人性特徴量算出部
１１３個人性度算出部
１２０音声信号処理パラメータ決定部
１２１個人性度条件判定部
１２２音声信号処理パラメータ調整部
１２３個人性度再算出部
１３０音声信号処理部
１３１個人性特徴除去部
１３２音声合成部 DESCRIPTION OF SYMBOLS 110 Voice personality feature analysis part 111 Voice feature-value calculation part 112 Personality feature-value calculation part 113 Personality degree calculation part 120 Voice signal processing parameter determination part 121 Personality degree condition determination part 122 Voice signal processing parameter adjustment part 123 Personality Degree recalculation unit 130 speech signal processing unit 131 personality feature removal unit 132 speech synthesis unit

Claims

In an audio signal processing method executed on input audio data,
The method
Executed in a system comprising a CPU, a storage area, and an interface;
A feature amount calculating step of calculating at least one feature amount obtained by quantifying the input voice data;
Based on the feature amount calculated by the feature Ryosan out step, and personal property calculation step of calculating a quantified individuality of the person of the intensity of the characteristic quantity,
If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing step for executing voice signal processing on the input voice data such that the personality degree satisfies the personality degree condition; only including,
The audio signal process is a process of changing the audio data,
In the method, when there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination step of determining a change amount of the audio data such that the personality degree satisfies the personality degree condition. In addition,
In the audio signal processing step, audio signal processing for changing the audio data based on the amount of change determined in the change amount determination step is executed.
In the change amount determining step,
A process of determining whether or not the characteristic amount of the changed sound data satisfies the personality degree condition while changing the change amount of the sound data in a predetermined amount unit, and the state after the change is a predetermined end condition Run repeatedly until
When it is determined that the characteristic amount of the voice data after the change satisfies the personality condition, the amount of deterioration of the sound quality of the voice data due to the amount of change is estimated, and the amount of change with the least amount of deterioration is determined as the minimum deterioration change. As a quantity, stored in the storage area,
When the state after the change satisfies the termination condition, the minimum deterioration change amount stored in the storage area is determined as the change amount of the audio data .

  In an audio signal processing method executed on input audio data,
  The method
  Executed in a system comprising a CPU, a storage area, and an interface;
  A feature amount calculating step of calculating at least one feature amount obtained by quantifying the input voice data;
  A personality degree calculating step for calculating a personality degree obtained by quantifying the strength of the individuality of the feature quantity based on the feature quantity calculated in the feature quantity calculation step;
  If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing step for executing voice signal processing on the input voice data such that the personality degree satisfies the personality degree condition; Including
  The audio signal process is a process of changing the audio data,
  In the method, when there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination step of determining a change amount of the audio data such that the personality degree satisfies the personality degree condition. In addition,
  In the audio signal processing step, audio signal processing for changing the audio data based on the amount of change determined in the change amount determination step is executed.
  The storage area stores, for each of a plurality of feature quantities obtained by quantifying the input voice data, personality degree calculation information used for calculating the individuality degree of the feature quantity from the feature quantity,
  The individuality degree calculation information indicates that, for each of the plurality of feature amounts, the individuality degree of the feature amount is smaller as the value of the number distribution of the feature amount is larger,
  The individuality degree indicates that the greater the value, the stronger the individuality of the feature amount used for calculating the individuality degree,
  In the individuality degree calculation step, based on the feature quantity calculated in the feature quantity calculation step and the individuality degree calculation information, the individuality degree of each feature quantity calculated in the feature quantity calculation step is calculated. Calculate
  In the change amount determining step,
  If there is a personality degree that is equal to or greater than the first threshold, it is determined that there is a personality degree that does not satisfy the personality degree condition,
  An audio signal processing method, wherein the change amount of the audio data is determined so that the individuality degree is smaller than the first threshold value.

  The system is connected to an upload terminal for uploading the audio data and a download terminal for downloading the audio data,
  A user who views the audio data via the download terminal is set with a personality authority associated with the personality condition.
  In the change amount determination step, the minimum deterioration change amount is determined as a change amount of the audio data for each individuality degree condition corresponding to an individuality authority that can be set,
  In the audio signal processing step,
  For each amount of change determined by the change amount determination step, execute the audio signal processing on the audio data,
  The audio signal processing method according to claim 1, wherein the audio data after the audio signal processing is stored in the storage area for each individuality authority.

  The method
  When the system receives a download request from the download terminal, identifying the personality authority set for the user of the download terminal that transmitted the download request;
  Retrieving audio data corresponding to the specified individuality authority from the audio data after the audio signal processing registered in the storage area;
  The audio signal processing method according to claim 3, further comprising: transmitting the searched audio data to a download terminal that has transmitted the download request.

  In an audio signal processing method executed on input audio data,
  The method
  Executed in a system comprising a CPU, a storage area, and an interface;
  A feature amount calculating step of calculating at least one feature amount obtained by quantifying the input voice data;
  A personality degree calculating step for calculating a personality degree obtained by quantifying the strength of the individuality of the feature quantity based on the feature quantity calculated in the feature quantity calculation step;
  If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing step for executing voice signal processing on the input voice data such that the personality degree satisfies the personality degree condition; Including
  The audio signal process is a process of changing the audio data,
  In the method, when there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination step of determining a change amount of the audio data such that the personality degree satisfies the personality degree condition. In addition,
  In the audio signal processing step, audio signal processing for changing the audio data based on the amount of change determined in the change amount determination step is executed.
  In the change amount determining step,
  A process of determining whether or not the characteristic amount of the changed sound data satisfies the personality degree condition while changing the change amount of the sound data in a predetermined amount unit, and the state after the change is a predetermined end condition Run repeatedly until
  When it is determined that the feature amount of the audio data after the change satisfies the personality degree condition, an execution time required to execute the audio signal processing based on the change amount is predicted,
  When the predicted execution time is smaller than a second threshold value, the change amount of the execution time is determined as the change amount of the sound data.

  In an audio signal processing system that executes audio signal processing on input audio data,
  A feature amount calculation unit for calculating at least one feature amount obtained by quantifying the input voice data;
  Based on the feature amount calculated by the feature amount calculation unit, a personality degree calculation unit that calculates a personality degree that quantifies the strength of the individuality of the feature amount;
  If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing unit that performs audio signal processing on the input voice data such that the individuality degree satisfies the personality degree condition; With
  The audio signal process is a process of changing the audio data,
  When there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination unit that determines a change amount of the audio data such that the personality degree satisfies the personality degree condition,
  The audio signal processing unit executes audio signal processing for changing the audio data based on a change amount determined by the change amount determining unit,
  The change amount determination unit
  A process of determining whether or not the characteristic amount of the changed sound data satisfies the personality degree condition while changing the change amount of the sound data in a predetermined amount unit, and the state after the change is a predetermined end condition Run repeatedly until
  When it is determined that the characteristic amount of the voice data after the change satisfies the personality condition, the amount of deterioration of the sound quality of the voice data due to the amount of change is estimated, and the amount of change with the least amount of deterioration is determined as the minimum deterioration change. The amount is stored in a storage area provided for the audio signal processing,
  When the state after the change satisfies the termination condition, the audio signal processing system determines the minimum deterioration change amount stored in the storage area as the change amount.

  In an audio signal processing system that executes audio signal processing on input audio data,
  A feature amount calculation unit for calculating at least one feature amount obtained by quantifying the input voice data;
  Based on the feature amount calculated by the feature amount calculation unit, a personality degree calculation unit that calculates a personality degree that quantifies the strength of the individuality of the feature amount;
  If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing unit that performs audio signal processing on the input voice data such that the individuality degree satisfies the personality degree condition; With
  The audio signal process is a process of changing the audio data,
  When there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination unit that determines a change amount of the audio data such that the personality degree satisfies the personality degree condition,
  The audio signal processing unit executes audio signal processing for changing the audio data based on a change amount determined by the change amount determining unit,
  For each of the plurality of feature quantities obtained by quantifying the input voice data, the storage area stores personality degree calculation information used for calculating the individuality degree of the feature quantity from the feature quantity;
  The individuality degree calculation information indicates that, for each of the plurality of feature amounts, the individuality degree of the feature amount is smaller as the value of the number distribution of the feature amount is larger,
  The individuality degree indicates that the greater the value, the stronger the individuality of the feature amount used for calculating the individuality degree,
  The individuality degree calculation unit calculates the individuality degree of each feature amount calculated by the feature amount calculation unit based on the feature amount calculated by the feature amount calculation unit and the individuality degree calculation information. ,
  The change amount determination unit
  If there is a personality degree equal to or greater than the first threshold, it is determined that there is a personality degree that does not satisfy the personality degree condition,
  An audio signal processing system, wherein the change amount of the audio data is determined so that the individuality degree is smaller than the first threshold value.

  The audio signal processing system includes:
  Connected to an upload terminal for uploading the audio data and a download terminal for downloading the audio data;
  A user who views the audio data via the download terminal is set with a personality authority associated with the personality condition.
  The change amount determination unit determines the minimum deterioration change amount as a change amount of the audio data for each individuality condition corresponding to the individuality authority set for the user,
  The audio signal processor is
  For each change amount determined by the change amount determination unit, the audio signal processing is performed on the audio data,
  The audio signal processing system according to claim 6, wherein the audio data after the audio signal processing is stored in the storage area for each individuality authority.

When a download request is received from the download terminal, a personality authority specifying unit that specifies the personality authority set for the user of the download terminal that transmitted the download request;
The audio data corresponding to the specified individuality authority is searched from the audio data after the audio signal processing stored for each individuality authority in the storage area, and the search is performed by the download terminal that transmitted the download request. The voice signal processing system according to claim 8, further comprising: a voice search unit that transmits the voice data.

  In an audio signal processing system that executes audio signal processing on input audio data,
  A feature amount calculation unit for calculating at least one feature amount obtained by quantifying the input voice data;
  Based on the feature amount calculated by the feature amount calculation unit, a personality degree calculation unit that calculates a personality degree that quantifies the strength of the individuality of the feature amount;
  If there is a personality degree that does not satisfy the predetermined personality degree condition, an audio signal processing unit that performs audio signal processing on the input voice data such that the individuality degree satisfies the personality degree condition; With
  The audio signal process is a process of changing the audio data,
  When there is a personality degree that does not satisfy the predetermined personality degree condition, a change amount determination unit that determines a change amount of the audio data such that the personality degree satisfies the personality degree condition,
  The audio signal processing unit executes audio signal processing for changing the audio data based on a change amount determined by the change amount determining unit,
  The change amount determination unit
  A process of determining whether or not the characteristic amount of the changed sound data satisfies the personality degree condition while changing the change amount of the sound data in a predetermined amount unit, and the state after the change is a predetermined end condition Run repeatedly until
  When it is determined that the feature amount of the voice data after the change satisfies the personality degree condition, an execution time required to execute the voice signal processing based on the change amount is predicted,
  When the predicted execution time is smaller than a second threshold value, the change amount of the execution time is determined as the change amount of the sound data.