JP2013098869A

JP2013098869A - Voice system

Info

Publication number: JP2013098869A
Application number: JP2011241599A
Authority: JP
Inventors: Minoru Kobata; 稔木幡
Original assignee: Chiba Institute of Technology
Current assignee: Chiba Institute of Technology
Priority date: 2011-11-02
Filing date: 2011-11-02
Publication date: 2013-05-20

Abstract

PROBLEM TO BE SOLVED: To solve a conventional problem that, particularly in analog communication, a voice waveform is deformed using various methods to increase communication secrecy, and however, once voice is understood partially, an original input voice may be restored from this clue, because the deformation is limited to a simple replacement of an input voice waveform block.SOLUTION: An element piece of the input voice is replaced with another element piece using a cipher key and transmitted. By this, restoration by the understanding of a partial voice is not possible, so that the communication secrecy is improved.

Description

本発明は、通信、特にアナログ通信において秘話性を向上させる技術に関する。 The present invention relates to a technique for improving secrecy in communication, particularly analog communication.

通信の秘匿化は現代において必須の技術である。一方、通信方式のうちアナログの音声通信は古くから通信路が確立されており、これを利用することによって簡便に通信を行うことができる。しかしながら、アナログ通信は入力音声波形そのものを通信路に流すため、盗聴などに対し脆弱性を有する点が問題となる。このため、アナログ通信の音声波形を操作することによって、通信の秘匿性を向上させる試みが存在する。非特許文献１および２では、上記試みが従来技術として紹介されている。図８から図１０は従来技術によりアナログ通信の秘匿性を高めた通信方式である。非特許文献１において、図８では、一定時間内においてサンプリングを行い、サンプルの順序を時間軸上で前後させている。図９においては、音声の波形を一定時間で区切り、ブロックとしており、これを並べ替え送信することで秘匿性を得ている。また、非特許文献２においては、図１０のように、周波数スペクトルにおいて所定のルールでスペクトルの反転を行っている。
ＩＥＥＥＴＲＡＮＳＡＣＴＩＯＮＳＯＮＣＯＭＭＵＮＩＣＡＴＩＯＮＳ，ＶＯＬ．ＣＯＭ−２９，ＮＯ．Ｉ，ＪＡＮＵＡＲＹ１９８１Ｄ．Ｗ．Ｄａｖｉｅｓ（Ｅｄ．）：ＡｄｖａｎｃｅｓｉｎＣｒｙｐｔｏｌｏｇｙ − ＥＵＲＯＣＲＹＰＴ '９１，ＬＮＣＳ５４７，ｐｐ．４２２−４３０，１９９１． Communication concealment is an essential technology in modern times. On the other hand, a communication path has been established for a long time in analog voice communication among communication methods, and communication can be easily performed by using this. However, analog communication has a problem in that it is vulnerable to eavesdropping and the like because the input speech waveform itself flows through the communication path. For this reason, there is an attempt to improve the confidentiality of communication by manipulating the voice waveform of analog communication. In Non-Patent Documents 1 and 2, the above attempt is introduced as a prior art. FIG. 8 to FIG. 10 are communication systems in which the confidentiality of analog communication is enhanced by the prior art. In Non-Patent Document 1, in FIG. 8, sampling is performed within a fixed time, and the order of the samples is moved back and forth on the time axis. In FIG. 9, the speech waveform is divided into predetermined blocks and divided into blocks, and confidentiality is obtained by rearranging and transmitting the blocks. In Non-Patent Document 2, as shown in FIG. 10, the spectrum is inverted according to a predetermined rule in the frequency spectrum.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-29, NO. I, JANUARY 1981 D. W. Davies (Ed.): Advances in Cryptology-EUROCRYPT '91, LNCS 547, pp. 422-430, 1991.

しかしながら、これらの方法では、暗号化したい音声そのものに対して並べ替える等の操作を行うので、部分的に元の音声が聞こえてしまう場合がある。そうするとそれを手がかりに並べ変えを行えば、比較的容易に復元ができてしまう可能性がある。このように、暗号化したい音声そのものに対して操作を行う方法では解読にあたって探索空間が限られてしまうため、通信の秘匿化が確保できない。したがって、暗号化したい音声以外の音声で通信を行い、十分な通信の秘匿化の確保を行うことが課題として生ずる。 However, in these methods, operations such as rearrangement are performed on the sound itself to be encrypted, so the original sound may be partially heard. Then, if you rearrange them using this as a clue, you may be able to restore them relatively easily. As described above, since the search space is limited in the method of performing the operation on the sound itself to be encrypted, the communication cannot be concealed. Therefore, it is a problem that communication is performed with a voice other than the voice to be encrypted to ensure sufficient concealment of communication.

以上の課題を解決するために、第一に本発明は以下のような音声システムを提供する。まず、システムに入力された言語入力に応じて、音声素片のインデックス列を取得する。次に、そのインデックス列を暗号化鍵によって、別のインデックス列に置換する。置換されたインデックス列により音声合成を行い秘話化音声を生成し、その音声を伝送路により送信する。受信側においては、受信した秘話化音声に応じて音声素片のインデックス列を取得する。次に、復号化鍵により、インデックス列を送信前のインデックス列に復元する。復元したインデックス列により音声合成を行い、音声などとして出力する。 In order to solve the above problems, first, the present invention provides the following audio system. First, an index string of speech segments is acquired according to the language input input to the system. Next, the index string is replaced with another index string by the encryption key. Speech synthesis is performed using the replaced index sequence to generate a secret speech, and the speech is transmitted through a transmission path. On the receiving side, an index sequence of speech segments is acquired according to the received secret speech. Next, the index string is restored to the index string before transmission with the decryption key. Speech synthesis is performed using the restored index sequence and output as speech.

具体的には、音声素片とインデックスとを対応づけた対応表を保持する対応表保持部と、言語入力を受付ける言語入力受付部と、言語入力受付部から入力された言語を構成する音声素片のインデックスである生インデックスを対応表から生成する生インデックス生成部と、生インデックスを暗号化したインデックスである暗号化インデックスを取得する暗号化インデックス取得部と、取得した暗号化インデックスに対応する音声素片を対応表から取得し、音声合成を行う秘話化音声合成部と、合成された秘話化音声を出力する出力部と、を有する出力装置と、秘話化音声を取得する取得部と、音声素片とインデックスとを対応づけた対応表を保持する対応表保持部と、取得した秘話化音声に応じた音声素片のインデックスである暗号化インデックスを対応表を利用して生成する暗号化インデックス生成部と、
暗号化インデックスを復号化して生インデックスを取得する生インデックス復号化部と、復号化された生インデックスにより音声素片を取得し、対応する言語を復元する言語復元部と、を有する復元装置と、からなる音声システムである。 Specifically, a correspondence table holding unit that holds a correspondence table that associates speech units and indexes, a language input reception unit that receives language input, and a speech element that constitutes a language input from the language input reception unit. A raw index generation unit that generates a raw index that is a single index from a correspondence table, an encrypted index acquisition unit that acquires an encrypted index that is an index obtained by encrypting the raw index, and a voice corresponding to the acquired encrypted index An output unit having a secret speech synthesis unit that acquires a segment from a correspondence table and performs speech synthesis, an output unit that outputs the synthesized secret speech, an acquisition unit that acquires the secret speech, and a voice A correspondence table holding unit that holds a correspondence table in which a segment and an index are associated with each other, and an encryption index that is an index of a speech unit corresponding to the acquired secret speech. An encryption index generator for generating by using a correspondence table box,
A restoration apparatus comprising: a raw index decoding unit that decodes an encrypted index to obtain a raw index; and a language restoration unit that obtains a speech unit from the decoded raw index and restores a corresponding language; An audio system consisting of

第二は、上記第一の音声システムを基本として、システムへの言語入力が音声である音声システムを提供する。 Second, based on the first voice system, a voice system in which the language input to the system is voice is provided.

具体的には、言語入力受付部は、音声言語の入力を受付ける音声言語入力受付手段を有する請求項１に記載の音声システムである。 Specifically, the language input receiving unit is the voice system according to claim 1, further comprising a voice language input receiving unit that receives an input of a voice language.

第三は、上記第一または第二の音声システムを基本として、システムへの言語入力がテキストデータである音声システムを提供する。 Third, based on the first or second voice system, a voice system in which language input to the system is text data is provided.

具体的には、言語入力受付部は、テキスト言語の入力を受付けるテキスト言語入力受付手段を有する請求項１又は２に記載の音声システムである。 Specifically, the language input receiving unit is the voice system according to claim 1 or 2, further comprising a text language input receiving unit that receives an input of a text language.

以上のような構成をとる第一の本発明によって、システムへの言語入力に対して、音声素片のインデックス列を取得し、これを暗号化鍵により置換を行うことが可能となる。そうすると、置換されたインデックス列により音声合成することにより、入力された言語入力以外の音声素片による秘話化音声を伝送路に流すことが可能となる。受信側においては、受信した秘話化音声に応じて音声素片のインデックスを取得し、これを復号化鍵で復元し、元の音声素片のインデックス列を取得可能である。このインデックス列により音声合成を行うと、元の音声が復元出来るといった具合である。このように、伝送媒体に流す音声がシステムへ入力された音声以外の音素列とすることにより、通信の秘匿性が向上することとなる。 According to the first aspect of the present invention configured as described above, it is possible to acquire an index sequence of speech segments for language input to the system and replace it with an encryption key. Then, by synthesizing speech with the replaced index string, it is possible to flow the secret speech using speech segments other than the input language input to the transmission path. On the receiving side, it is possible to acquire the index of the speech unit according to the received secret speech, restore it with the decryption key, and acquire the index sequence of the original speech unit. If speech synthesis is performed using this index sequence, the original speech can be restored. Thus, the confidentiality of communication is improved by using the phoneme string other than the voice input to the system as the voice that flows through the transmission medium.

また、第二の本発明によって、システムへの言語入力が音声であることにより、伝送路を介して秘話化音声を送信し、受信側へほぼリアルタイムで音声を送信することが可能である。これにより、電話のような、双方向リアルタイムのコミュニケーションを通信の秘匿性を確保しつつ、行うことが可能である。 In addition, according to the second aspect of the present invention, since the language input to the system is a voice, it is possible to send the secret voice via the transmission path and to send the voice to the receiving side in almost real time. Thereby, bidirectional real-time communication such as a telephone can be performed while ensuring confidentiality of communication.

また、第三の本発明によって、システムへの言語入力がテキストデータであることにより、テキストを音声に変換した上で、通信が可能である。受信側で音声からテキストデータへ変換する処理を行えば、ファクシミリのようなオフラインのコミュニケーションが、秘匿性を確保しつつ行うことが可能である。
Further, according to the third aspect of the present invention, since the language input to the system is text data, it is possible to communicate after converting the text into speech. If a process of converting voice to text data is performed on the receiving side, offline communication such as facsimile can be performed while ensuring secrecy.

以下に、図を用いて本発明の実施の形態を説明する。なお、本発明はこれら実施の形態に何ら限定されるものではなく、その要旨を逸脱しない範囲において、種々なる態様で実施しうる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the present invention is not limited to these embodiments, and can be implemented in various modes without departing from the spirit of the present invention.

≪実施例≫
<概要>
図１は、本実施例の音声システムの動作の一例を説明するための概念図である。この図にあるように、受信機と送信機があり、送信機に対する「あめになる」という言語入力を秘話化音声「おやせまめ」とし、受信機に送信している。受信機側では秘話化音声「おやせまめ」を復元し、「あめになる」という出力を得ている。 <Example>
<Overview>
FIG. 1 is a conceptual diagram for explaining an example of the operation of the sound system according to the present embodiment. As shown in this figure, there are a receiver and a transmitter, and the language input of “getting ridiculous” to the transmitter is set as the secret voice “Oyasememe” and transmitted to the receiver. On the receiver side, the secret voice “Oyasememe” is restored, and an output of “becoming candy” is obtained.

具体的に説明すると、送信機側で「あめになる」という入力を得た後、入力のパターンに応じて、音声素片データベースのインデックスを取得する。実際の音声素片データベースは大量のパターンを含むが、簡単のため本事例においては、図２（Ａ）のようなインデックス・テーブルから付与されているとする。すると、「Ａ１Ｇ４Ｅ２Ｅ１Ｉ３」というインデックス列が取得できる（図２（Ｂ））。このインデックスを暗号化鍵によって発生させた暗号に基づき、別のインデックス列に置換する。「Ａ５Ｈ１Ｃ４Ｇ１Ｇ４」というインデックス列が取得できたとする。そうすると、素片データベースのインデックスにより「おやせまめ」という秘話化文字列が得られる（図２（Ｃ））。当該文字列は音声素片データベースに基づき、音声合成により音声化し、アナログ通信路などを介して受信機に送信される。 More specifically, after receiving an input of “becoming candy” on the transmitter side, an index of the speech unit database is acquired according to the input pattern. Although an actual speech segment database includes a large number of patterns, for the sake of simplicity, it is assumed that the speech unit database is assigned from an index table as shown in FIG. Then, an index string “A1 G4 E2 E1 I3” can be acquired (FIG. 2B). This index is replaced with another index string based on the encryption generated by the encryption key. Assume that an index string “A5 H1 C4 G1 G4” has been acquired. Then, the secret character string “Oyasememe” is obtained from the index of the segment database (FIG. 2C). The character string is converted into speech by speech synthesis based on the speech segment database and transmitted to the receiver via an analog communication path or the like.

受信機側では、秘話化音声「おやせまめ」のパターンに応じ、音声素片データベースよりインデックス列を取得する。本事例においては「Ａ５Ｈ１Ｃ４Ｇ１Ｇ４」のインデックス列が取得される。このインデックス列を復号化鍵により復元すると、送信機において置換前の「Ａ１Ｇ４Ｅ２Ｅ１Ｉ３」とのインデックス列が取得される。このインデックス列に応じて音声素片データベースを参照し、「あめになる」という文字列に対応する音声が得られる。 On the receiver side, an index string is acquired from the speech segment database according to the pattern of the secret speech “Oyasememe”. In this example, an index string “A5 H1 C4 G1 G4” is acquired. When this index string is restored with the decryption key, the index string with “A1 G4 E2 E1 I3” before replacement is acquired at the transmitter. The speech unit database is referred to in accordance with the index string, and the speech corresponding to the character string “Let's become candy” is obtained.

このように、本実施例の音声システムは、入力の文字列に対して音声素片データベースのインデックス列を取得し、暗号化鍵により、別の音声素片に対応するインデックス列に置換する処理が可能である。このため、秘話化音声そのものを素片ごとに入れ替えたとしても、元の文字列を得ることは不可能である。したがって高い秘匿性を保ちながら通信が可能である。また、音声合成を用いて送信機側から秘話化音声を受信機側に音声にて送信するため、アナログ通信路のような簡便な通信方法に適する。 As described above, the speech system according to the present embodiment has a process of acquiring the index sequence of the speech segment database for the input character string and replacing it with an index sequence corresponding to another speech segment using the encryption key. Is possible. For this reason, even if the secret speech itself is replaced for each segment, it is impossible to obtain the original character string. Therefore, communication is possible while maintaining high confidentiality. Also, since the secret speech is transmitted by voice from the transmitter side to the receiver side using voice synthesis, it is suitable for a simple communication method such as an analog communication path.

<機能的構成１>
図３は、本実施例の音声システムにおける機能ブロックの一例を表す図である。この図にあるように、本実施例の「音声システム」は、「出力装置」（０３０１）と、「復元装置」（０３０２）と、を有する。出力装置（０３０１）は言語入力受付部（０３０３）と、生インデックス生成部（０３０４）と、対応表保持部（０３０５）と、暗号化インデックス取得部（０３０６）と、秘話化音声合成部（０３０７）と、出力部（０３０８）とを有する。復元装置（０３０２）は、取得部（０３０９）と、暗号化インデックス生成部（０３１０）と、対応表保持部（０３１１）と、生インデックス復号化部（０３１２）と、言語復元部（０３１３）とを有する。 <Functional configuration 1>
FIG. 3 is a diagram illustrating an example of functional blocks in the audio system according to the present embodiment. As shown in this figure, the “voice system” of this embodiment includes an “output device” (0301) and a “restoration device” (0302). The output device (0301) includes a language input reception unit (0303), a raw index generation unit (0304), a correspondence table holding unit (0305), an encrypted index acquisition unit (0306), and a secret speech synthesis unit (0307). ) And an output unit (0308). The restoration device (0302) includes an acquisition unit (0309), an encrypted index generation unit (0310), a correspondence table holding unit (0311), a raw index decryption unit (0312), and a language restoration unit (0313). Have

言語入力受付部（０３０３）は、出力装置（０３０１）において、言語入力を受付ける機能を有する。
言語とは、音声やテキスト等の音声合成として出力が可能なデータである。入力はマイクロフォンでのリアルタイムの入力や、音声データでの入力、テキストデータでの入力などが考えられる。受け付けた言語入力は、生インデックス生成部（０３０４）へ出力される。 The language input reception unit (0303) has a function of receiving language input in the output device (0301).
A language is data that can be output as speech synthesis such as speech or text. The input can be real-time input with a microphone, input with audio data, input with text data, or the like. The accepted language input is output to the raw index generation unit (0304).

生インデックス生成部（０３０４）は、言語入力受付部から入力された言語を構成する音声素片のインデックスである生インデックスを対応表から生成する機能を有する。具体的には、言語入力に応じて対応表保持部（０３０５）に蓄積されている音声素片のインデックス情報を取得する。例えば、言語入力が音声データである場合においては、ＤＰマッチング等のパターン認識の手段を使用して、入力音声波形に対して最も適合性の高い音声素片を探し出し、その素片のインデックス情報を対応表保持部（０３０５）より取得するといった具合である。ここで得られたインデックス情報はインデックス列として、暗号化インデックス取得部（０３０６）へ出力される。 The raw index generation unit (0304) has a function of generating, from the correspondence table, a raw index that is an index of a speech unit constituting a language input from the language input reception unit. Specifically, the speech unit index information stored in the correspondence table holding unit (0305) is acquired according to the language input. For example, when the language input is speech data, pattern recognition means such as DP matching is used to find the speech unit having the highest suitability for the input speech waveform, and the index information of the segment is obtained. For example, it is acquired from the correspondence table holding unit (0305). The index information obtained here is output to the encrypted index acquisition unit (0306) as an index string.

対応表保持部（０３０５）は、音声素片とインデックスとを対応づけた対応表を保持する機能を有する。具体的には、音声素片データベースに蓄積されている音声素片と、音声素片のインデックスとを関連付けて保持しているテーブルである。例えば素片データが５０音からなるものであった場合には、図５（Ａ）のような対応表等が考えられる。音声素片については、単に５０音から構成されているものの他、母音−母音、母音−子音、子音−母音、子音−子音、のように２音のつながりで構成されているものや、1音素が複数の素片から構成されているもの、また単語レベルで音声素片を構成しているもの等、種々のものが考えられる。 The correspondence table holding unit (0305) has a function of holding a correspondence table in which speech units and indexes are associated with each other. Specifically, it is a table that holds a speech unit stored in a speech unit database and a speech unit index in association with each other. For example, if the segment data is composed of 50 sounds, a correspondence table as shown in FIG. As for speech segments, in addition to those composed of only 50 sounds, vowels-vowels, vowels-consonants, consonants-vowels, consonants-consonants, etc. There are various types such as those composed of a plurality of segments and those composed of speech segments at the word level.

暗号化インデックス取得部（０３０６）は、生インデックスを暗号化したインデックスである暗号化インデックスを取得する機能を有する。生インデックス生成部（０３０４）より入力された図２（Ｂ）のようなインデックス列を、暗号化鍵で他のインデックス列（図２（Ｃ））に置換を実行する。暗号化鍵は、生インデックスを入力データとして、対応表保持部内のインデックスのいずれかに置換するような暗号化などを行う。暗号化方式は、公開鍵暗号方式や秘密鍵（共通鍵）暗号方式などの種々のものが採用可能である。置換された後の暗号化インデックス列は、秘話化音声合成部（０３０７）に出力される。 The encrypted index acquisition unit (0306) has a function of acquiring an encrypted index that is an index obtained by encrypting a raw index. The index string as shown in FIG. 2B input from the raw index generation unit (0304) is replaced with another index string (FIG. 2C) with the encryption key. The encryption key is encrypted by using the raw index as input data and replacing it with one of the indexes in the correspondence table holding unit. Various encryption methods such as a public key encryption method and a secret key (common key) encryption method can be adopted. The encrypted index string after the replacement is output to the secret speech synthesizer (0307).

秘話化音声合成部（０３０７）は、取得した暗号化インデックスに対応する音声素片を対応表から取得し、音声合成を行う機能を有する。暗号化インデックス取得部（０３０６）で置換された後の暗号化インデックス列を入力し、入力にかかる暗号化インデックスに応じて、対応表を参照し、これに基づいて音声素片データベースより音声素片を取得し、音声合成を行う。合成された音声は出力部（０３０８）に出力される。 The secret speech synthesizing unit (0307) has a function of acquiring a speech unit corresponding to the acquired encrypted index from the correspondence table and performing speech synthesis. The encrypted index string after being replaced by the encrypted index acquisition unit (0306) is input, the correspondence table is referred to according to the input encrypted index, and based on this, the speech unit is stored from the speech unit database. To synthesize speech. The synthesized voice is output to the output unit (0308).

出力部（０３０８）は、合成された秘話化音声を出力する機能を有する。出力とは、入力の音声を、伝送媒体を介して復元装置（０３０２）の取得部（０３０９）に対して出力することである。伝送方法についてはアナログ通信、デジタル通信、無線、有線等様々な態様が考えられるが、音声波形を伝搬することができるものであればどのような方式でも採用可能である。 The output unit (0308) has a function of outputting the synthesized secret speech. The output is to output the input voice to the acquisition unit (0309) of the restoration device (0302) via the transmission medium. Although various modes such as analog communication, digital communication, wireless, and wired are conceivable as a transmission method, any method can be adopted as long as it can propagate a voice waveform.

取得部（０３０９）は、復元装置（０３０２）において、秘話化音声を取得する機能を有する。出力装置（０３０１）の出力部（０３０８）から出力された音声波形データを取得し、暗号化インデックス生成部（０３１０）に対して出力を行う。 The acquisition unit (0309) has a function of acquiring the secret speech in the restoration device (0302). The speech waveform data output from the output unit (0308) of the output device (0301) is acquired and output to the encryption index generation unit (0310).

暗号化インデックス生成部（０３１０）は、取得した秘話化音声に応じた音声素片のインデックスである暗号化インデックスを、対応表を利用して生成する機能を有する。具体的には、取得部（０３０９）からの音声波形データをＤＰマッチング等のパターン認識により、最も適合性の高い音声素片を選択し、対応するインデックス情報を対応表保持部（０３１１）から取得する機能を有する。インデックス情報は暗号化インデックス列として、生インデックス復号化部（０３１２）に出力される。 The encryption index generation unit (0310) has a function of generating an encryption index that is an index of a speech unit corresponding to the acquired secret speech using a correspondence table. Specifically, the speech waveform data from the acquisition unit (0309) is selected by pattern recognition such as DP matching to select the speech unit having the highest suitability, and the corresponding index information is acquired from the correspondence table holding unit (0311). It has the function to do. The index information is output to the raw index decryption unit (0312) as an encrypted index sequence.

対応表保持部（０３１１）は、復元装置（０３０２）において、音声素片とインデックスとを対応づけた対応表を保持する機能を有する。暗号化インデックス生成部（０３０９）での音声認識の際、または言語復元部（０３１３）において音声合成を行う場合に参照を行う。 The correspondence table holding unit (0311) has a function of holding a correspondence table in which a speech unit and an index are associated with each other in the restoration device (0302). Reference is made at the time of speech recognition at the encryption index generation unit (0309) or when speech synthesis is performed at the language restoration unit (0313).

対応表保持部（０３１１）における対応表は出力装置（０３０１）の対応表保持部（０３０５）の対応表と同一であるかサブセットである必要がある。音声システム全体で単一の対応表保持部を有して、これを共有する態様であってもよい。 The correspondence table in the correspondence table holding unit (0311) needs to be the same as or a subset of the correspondence table of the correspondence table holding unit (0305) of the output device (0301). The voice system may have a single correspondence table holding unit and share it.

生インデックス復号化部（０３１２）は、暗号化インデックスを復号化して生インデックスを取得する機能を有する。出力装置（０３０１）の暗号化インデックス取得部（０３０６）で使用した暗号化方式を使用して、置換されたインデックス列を元の生インデックスに戻す機能を有する。取得した生インデックスは、言語復元部（０３１３）に対して出力する。 The raw index decryption unit (0312) has a function of decrypting the encrypted index and acquiring the raw index. Using the encryption method used in the encrypted index acquisition unit (0306) of the output device (0301), it has a function of returning the replaced index string to the original raw index. The obtained raw index is output to the language restoration unit (0313).

言語復元部（０３１３）は復号化された生インデックスにより音声素片を取得し、
対応する言語を復元する機能を有する。生インデックス復号化部（０３１２）で取得した生インデックスから、対応表に基づいて、音声素片を取得し音声合成を行う。この時出力は音声でもよいし、音声素片に対応するテキストデータでもよい。 The language restoration unit (0313) obtains a speech unit from the decoded raw index,
It has a function to restore the corresponding language. Based on the correspondence table, a speech unit is obtained from the raw index obtained by the raw index decoding unit (0312), and speech synthesis is performed. At this time, the output may be speech or text data corresponding to speech segments.

<機能的構成２>
図４は、本実施例の音声システムにおける機能ブロックの別の一例を表す図である。この図にあるように、本実施例の「音声システム」は、「出力装置」（０４０１）と、「復元装置」（０４０２）と、を有する。出力装置（０４０１）は言語入力受付部（０４０３）と、生インデックス生成部（０４０４）と、対応表保持部（０４０５）と、暗号化インデックス取得部（０４０６）と、秘話化音声合成部（０４０７）と、出力部（０４０８）とを有する。復元装置（０４０２）は、取得部（０４０９）と、暗号化インデックス生成部（０４１０）と、対応表保持部（０４１１）と、生インデックス復号化部（０４１２）と、言語復元部（０４１３）とを有する。なお上記構成要件については、上記で記載済みであるので、説明は省略する。本構成で特徴的な点は、「出力装置」（０４０１）おける言語入力受付部（０４０３）が、「音声言語入力受付手段」（０４１４）を有する点である。 <Functional configuration 2>
FIG. 4 is a diagram illustrating another example of functional blocks in the audio system according to the present embodiment. As shown in this figure, the “voice system” of this embodiment includes an “output device” (0401) and a “restoration device” (0402). The output device (0401) includes a language input reception unit (0403), a raw index generation unit (0404), a correspondence table holding unit (0405), an encrypted index acquisition unit (0406), and a secret speech synthesis unit (0407). ) And an output unit (0408). The restoration device (0402) includes an acquisition unit (0409), an encrypted index generation unit (0410), a correspondence table holding unit (0411), a raw index decryption unit (0412), and a language restoration unit (0413). Have Since the above-described configuration requirements have already been described above, description thereof will be omitted. A characteristic point of this configuration is that the language input receiving unit (0403) in the “output device” (0401) has “speech language input receiving means” (0414).

音声言語入力受付手段（０４１４）は、音声言語の入力を受付ける機能を有する。音声言語とは、音声で生成可能な言語であれば、自然言語に限られず、歌声などであっても構わない。どのような音声言語を使用可能であるかは、音声素片データベースに基づく対応表保持部（０４０５）の対応表に依存することになる。 The spoken language input receiving means (0414) has a function of receiving an input of the spoken language. The speech language is not limited to a natural language as long as it can be generated by speech, and may be a singing voice or the like. Which speech language can be used depends on the correspondence table of the correspondence table holding unit (0405) based on the speech segment database.

<機能的構成３>
図５は、本実施例の音声システムにおける機能ブロックのさらに別の一例を表す図である。この図にあるように、本実施例の「音声システム」は、「出力装置」（０５０１）と、「復元装置」（０５０２）と、を有する。出力装置（０５０１）は言語入力受付部（０５０３）と、生インデックス生成部（０５０４）と、対応表保持部（０５０５）と、暗号化インデックス取得部（０５０６）と、秘話化音声合成部（０５０７）と、出力部（０５０８）とを有する。復元装置（０５０２）は、取得部（０５０９）と、暗号化インデックス生成部（０５１０）と、対応表保持部（０５１１）と、生インデックス復号化部（０５１２）と、言語復元部（０５１３）とを有する。なお上記構成要件については、上記で記載済みであるので、説明は省略する。本構成で特徴的な点は、言語入力受付部が、「テキスト言語入力受付手段」（０５１５）を有する点である。 <Functional configuration 3>
FIG. 5 is a diagram illustrating still another example of functional blocks in the audio system according to the present embodiment. As shown in this figure, the “voice system” of this embodiment includes an “output device” (0501) and a “restoration device” (0502). The output device (0501) includes a language input reception unit (0503), a raw index generation unit (0504), a correspondence table holding unit (0505), an encrypted index acquisition unit (0506), and a secret speech synthesis unit (0507). ) And an output unit (0508). The restoration device (0502) includes an acquisition unit (0509), an encrypted index generation unit (0510), a correspondence table holding unit (0511), a raw index decryption unit (0512), and a language restoration unit (0513). Have Since the above-described configuration requirements have already been described above, description thereof will be omitted. A characteristic point of this configuration is that the language input receiving unit has “text language input receiving means” (0515).

テキスト言語入力受付手段（０５１５）はテキスト言語の入力を受付ける機能を有する。テキスト言語とは、テキストデータであり、音声合成可能なものであればどのようなものでもよい。どのようなテキスト言語を使用可能であるかは、音声言語入力受付手段（０４１４）と同様、音声素片データベースに基づく対応表保持部（０５０５）の対応表に依存することになる。 The text language input receiving means (0515) has a function of receiving text language input. The text language is text data, and any text language can be used as long as speech synthesis is possible. Which text language can be used depends on the correspondence table of the correspondence table holding unit (0505) based on the speech segment database, similarly to the speech language input receiving means (0414).

<処理の流れ>
図６は、本実施例の音声システムにおける処理の流れの一例を表すフローチャートである。まず、出力装置側で、言語入力を受付ける（ステップＳ０６０１）。次に言語入力受付ステップにおいて入力された言語を構成する音声素片のインデックスである生インデックスを、音声素片とインデックスとを対応づけた対応表から生成する（ステップＳ０６０２）。次に、生インデックスを暗号化したインデックスである暗号化インデックスを取得する（ステップＳ０６０３）。次に、取得した暗号化インデックスに対応する音声素片を前記対応表から取得し、音声合成を行う（ステップＳ０６０４）。そして、合成された秘話化音声を出力する（ステップＳ０６０５）。復元装置側では、秘話化音声を取得する（ステップＳ０６０６）。次に、取得した秘話化音声に応じた音声素片のインデックスである暗号化インデックスを
音声素片とインデックスとを対応づけた対応表を利用して生成する（ステップＳ０６０７）。次に、暗号化インデックスを復号化して生インデックスを取得する（ステップＳ０６０８）。次に、暗号化インデックスを復号化して生インデックスを取得する生インデックス復号化ステップと、復号化された生インデックスにより音声素片を取得し、対応する言語を復元する（ステップＳ０６０９）。 <Process flow>
FIG. 6 is a flowchart illustrating an example of a process flow in the voice system according to the present embodiment. First, the language input is accepted on the output device side (step S0601). Next, a raw index, which is an index of speech units constituting the language input in the language input accepting step, is generated from a correspondence table in which speech units and indexes are associated (step S0602). Next, an encrypted index that is an index obtained by encrypting the raw index is obtained (step S0603). Next, a speech unit corresponding to the obtained encrypted index is obtained from the correspondence table, and speech synthesis is performed (step S0604). Then, the synthesized secret voice is output (step S0605). On the restoration device side, the secret speech is acquired (step S0606). Next, an encrypted index, which is an index of a speech unit corresponding to the acquired secret speech, is generated using a correspondence table in which the speech unit and the index are associated (step S0607). Next, the encrypted index is decrypted to obtain a raw index (step S0608). Next, a raw index decrypting step for decrypting the encrypted index to obtain a raw index, a speech segment is obtained from the decrypted raw index, and a corresponding language is restored (step S0609).

<ハードウエア的構成>
図７は、上記機能的な各構成要件をハードウエアとして実現した際の、音声システムにおける構成の一例を表す概略図である。この図を利用して音声通信のそれぞれのハードウエア構成部の働きについて説明する。この図にあるように、本実施例の音声システムは、送信機（０７０１）と受信機（０７０２）とからなる。送信機（０７０１）においては、各種演算処理を行う「ＣＰＵ（中央演算装置）」（０７０３）と、「揮発性メモリ」（０７０４）と、「不揮発性メモリ」（０７０５）と、Ｄ／Ａコンバータ（０７０６）と、マイクロフォン（０７０７）と、ネットワークＩ／Ｆ（０７０８）とを有している。そしてそれらが「システムバス」（０７０９）などのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 <Hardware configuration>
FIG. 7 is a schematic diagram showing an example of a configuration in the audio system when the above functional components are realized as hardware. The operation of each hardware component of voice communication will be described using this figure. As shown in this figure, the voice system of the present embodiment includes a transmitter (0701) and a receiver (0702). In the transmitter (0701), a “CPU (central processing unit)” (0703), a “volatile memory” (0704), a “nonvolatile memory” (0705), and a D / A converter that perform various arithmetic processes. (0706), a microphone (0707), and a network I / F (0708). These are connected to each other through a data communication path such as a “system bus” (0709) to transmit / receive information and process information.

送信機（０７０２）においては、各種演算処理を行う「ＣＰＵ（中央演算装置）」（０７１０）と、「揮発性メモリ」（０７１１）と、「不揮発性メモリ」（０７１２）と、Ｄ／Ａコンバータ（０７１３）と、スピーカ（０７１４）と、ネットワークＩ／Ｆ（０７１５）とを有している。そしてそれらが「システムバス」（０７１６）などのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 In the transmitter (0702), a “CPU (central processing unit)” (0710), a “volatile memory” (0711), a “nonvolatile memory” (0712), and a D / A converter that perform various arithmetic processes. (0713), a speaker (0714), and a network I / F (0715). Then, they are connected to each other by a data communication path such as “system bus” (0716) to transmit / receive information and process information.

また、「揮発性メモリ」（０７０４、０７１１）は、各種処理を行うプログラムを「ＣＰＵ」（０７０３、０７１０）に実行させるために「不揮発性メモリ」（０７０５、０７１２）から読み出すと同時にそのプログラムの作業領域でもあるワーク領域を提供する。 The “volatile memory” (0704, 0711) reads from the “non-volatile memory” (0705, 0712) at the same time that the “CPU” (0703, 0710) executes the program for performing various processes. Provide a work area that is also a work area.

ここで、マイクロフォン（０７０７）に音声入力があった場合に、Ｄ／Ａコンバータ（０７０６）を介してＣＰＵ（０７０２）にデータが送られる。ＣＰＵ（０７０２）においては、揮発性メモリ（０７０４）に保持されている、Ａ：生インデックス生成プログラムが実行され、Ｂ：対応表と不揮発性メモリ（０７０５）内に蓄積された素片ＤＢ（０７１７）を参照しつつ、生インデックスが作成される。次にＣ：暗号化プログラムがＣＰＵ（０７０３）で実行され、生インデックスが別の素片の暗号化インデックスに置換される。これを用いて、Ｄ：音声合成エンジンをＣＰＵ（０７０３）で素片ＤＢ（０７１７）を参照しつつ、実行することにより秘話化音声が生成される。音声はネットワークＩ／Ｆ（０７０８）を介して受信機（０７０２）に送信される。 Here, when there is an audio input to the microphone (0707), data is sent to the CPU (0702) via the D / A converter (0706). In the CPU (0702), A: the raw index generation program held in the volatile memory (0704) is executed, and B: the element DB (0717) stored in the correspondence table and the nonvolatile memory (0705). ), A raw index is created. Next, C: the encryption program is executed by the CPU (0703), and the raw index is replaced with the encryption index of another segment. Using this, the secret speech is generated by executing the D: voice synthesis engine while referring to the segment DB (0717) by the CPU (0703). The voice is transmitted to the receiver (0702) via the network I / F (0708).

受信機側では、ネットワークＩ／Ｆ（０７１５）において秘話化音声を受信する。そうすると音声はＣＰＵ（０７１０）に送られ、実行中のＥ：暗号化インデックス作成プログラムにより暗号化インデックスが生成される。この時当該プログラムは不揮発性メモリ（０７１２）に蓄積された素片ＤＢ（０７１８）と揮発性メモリ（０７１１）に保持されているＧ：対応表とを参照する。次に、暗号化インデックスはＦ：復号化プログラムにより生インデックスに戻される。この生インデックスを、Ｇ：対応表と素片ＤＢ（０７１８）とを参照しつつ、Ｈ：音声合成エンジンで音声化する。音声はＤ／Ａコンバータ（０７１３）を介してスピーカ（０７１４）により出力される。 On the receiver side, the secret speech is received at the network I / F (0715). Then, the voice is sent to the CPU (0710), and an encrypted index is generated by the running E: encrypted index creation program. At this time, the program refers to the segment DB (0718) stored in the nonvolatile memory (0712) and the G: correspondence table held in the volatile memory (0711). Next, the encrypted index is returned to the raw index by the F: decryption program. This raw index is voiced by the H: voice synthesis engine while referring to the G: correspondence table and the segment DB (0718). The sound is output from the speaker (0714) via the D / A converter (0713).

<効果の簡単な説明>
以上のように本実施例の音声システムによって、出力装置において入力の言語に対して音声素片のインデックス列を得て、このインデックス列を暗号鍵により別のインデックス列に置換することが可能である。これを伝送路により送信する。復元装置においてこれを受信し復号化鍵により元のインデックス列を取得し、入力言語を復元可能となる。これにより、通信の秘匿性が向上する。 <Brief description of effect>
As described above, with the speech system of this embodiment, it is possible to obtain an index sequence of speech segments for the input language in the output device, and replace this index sequence with another index sequence using an encryption key. . This is transmitted through the transmission path. This is received by the restoration device, the original index string is acquired by the decryption key, and the input language can be restored. Thereby, the confidentiality of communication improves.

実施例の音声システムによる処理の概要を説明するための図The figure for demonstrating the outline | summary of the process by the audio | voice system of an Example 実施例の音声システムによる処理の概要を説明するための図The figure for demonstrating the outline | summary of the process by the audio | voice system of an Example 実施例の音声システムの機能ブロックの一例を表す図The figure showing an example of the functional block of the audio | voice system of an Example 実施例の音声システムの別の機能ブロックの一例を表す図The figure showing an example of another functional block of the audio system of an example 実施例の音声システムのさらに別の機能ブロックの一例を表す図The figure showing an example of another functional block of the audio system of an example 実施例の音声システムにおける処理の流れの一例を表すフローチャートThe flowchart showing an example of the flow of processing in the audio system of the embodiment 実施例の音声システムにおけるハードウエア構成の一例を表す概略図Schematic showing an example of a hardware configuration in the audio system of the embodiment 従来技術における暗号化方式を説明するための概略図Schematic for explaining the encryption method in the prior art 従来技術における暗号化方式を説明するための概略図Schematic for explaining the encryption method in the prior art 従来技術における暗号化方式を説明するための概略図Schematic for explaining the encryption method in the prior art

０３０１出力装置
０３０２復元装置
０３０４生インデックス生成部
０３０６暗号化インデックス取得部
０３１０暗号化インデックス生成部
０３１２生インデックス復号化部 0301 Output device 0302 Restoration device 0304 Raw index generation unit 0306 Encrypted index acquisition unit 0310 Encrypted index generation unit 0312 Raw index decryption unit

Claims

A correspondence table holding unit that holds a correspondence table in which speech units and indexes are associated;
A language input receiving unit for receiving language input;
This is the index of the speech unit that composes the language input from the language input reception unit
A raw index generation unit that generates a raw index from a correspondence table;
An encrypted index acquisition unit that acquires an encrypted index that is an index obtained by encrypting a raw index;
The speech unit corresponding to the obtained encryption index is obtained from the correspondence table,
A secret speech synthesis unit that performs speech synthesis;
An output unit for outputting the synthesized secret voice;
An output device having
An acquisition unit for acquiring the confidential voice;
A correspondence table holding unit that holds a correspondence table in which speech units and indexes are associated;
The encryption index that is the index of the speech segment according to the acquired secret speech
An encrypted index generation unit generated using a correspondence table;
Decrypt encrypted index to get raw index
A raw index decryption unit;
The speech segment is obtained from the decoded raw index,
A language restoration unit that restores the corresponding language;
A restoration device having
An audio system consisting of

The speech system according to claim 1, wherein the language input reception unit includes a speech language input reception unit that receives an input of the speech language.

The speech system according to claim 1, wherein the language input receiving unit includes a text language input receiving unit that receives an input of a text language.

A language input reception step for receiving language input;
A raw index generation step of generating a raw index, which is an index of a speech unit constituting a language input in the language input reception step, from a correspondence table in which the speech unit and the index are associated;
An encrypted index acquisition step of acquiring an encrypted index that is an index obtained by encrypting a raw index;
A speech unit corresponding to the obtained encrypted index is obtained from the correspondence table, and a secret speech synthesis step for performing speech synthesis,
An output step for outputting the synthesized secret voice;
An acquisition step of acquiring an anonymized voice;
An encrypted index, which is an index of speech segments according to the acquired secret speech,
An encrypted index generation step that uses a correspondence table that associates speech segments and indexes;
A raw index decryption step of decrypting the encrypted index to obtain a raw index;
A language restoration step of obtaining a speech unit from the decrypted raw index and restoring a corresponding language;
A method of operating a voice system comprising:

6. The operation method of an audio system according to claim 5, wherein the language input receiving step includes an audio language input receiving substep for receiving an input of the audio language.

The speech system operating method according to claim 5 or 6, wherein the language input receiving step includes a text language input receiving sub-step for receiving an input of a text language.