JP2007041443A

JP2007041443A - Device, program, and method for speech conversion

Info

Publication number: JP2007041443A
Application number: JP2005227617A
Authority: JP
Inventors: Hiromi Kosaku; 浩美小作; Kiyoshi Kogure; 潔小暮; Futoshi Naya; 太納谷; Noriaki Kuwabara; 教彰桑原
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-08-05
Filing date: 2005-08-05
Publication date: 2007-02-15

Abstract

<P>PROBLEM TO BE SOLVED: To protect individual's privacy and to perform conversion into speech data with which the meaning of a conversation can be understood. <P>SOLUTION: A device for speech conversion is equipped with a speech data input section 11 which inputs speech data, a substituted word detection section 15 which detects a word part including a predetermined word to be substituted from the input speech data, and a synchronization section 21 which substitutes alternative speech data corresponding to the word part of the input speech data for the word part. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は音声変換装置、音声変換プログラムおよび音声変換方法に関し、特に、音声を、プライバシーを保護した音声に変換する音声変換装置、音声変換プログラムおよび音声変換方法に関する。 The present invention relates to a voice conversion device, a voice conversion program, and a voice conversion method, and more particularly, to a voice conversion device, a voice conversion program, and a voice conversion method for converting voice into a voice that protects privacy.

近年、個人情報保護法の施行に伴い、人のプライバシーを保護する気運が高まっている。特に医療分野ではその傾向が高い。一方、医療現場では医療ミスの低減のために、医師、看護師、患者の会話を録音する試みがなされている。しかしながら、診断または治療中の会話をそのまま録音したのでは、患者のプライバシーを保護することはできない。録音される音声には、患者を特定する氏名や、その患者の病名などが含まれているためである。このため、特開２００４−３５７２９４号公報には、音声に制限キーワードが含まれれば、制限キーワードの再生を抑止する装置が記載されている。再生の抑止の具体例としては、妨害音で再生音を変換する、または、制限キーワードそのものをスキップするなどである。 In recent years, with the enforcement of the Personal Information Protection Law, there is a growing tendency to protect human privacy. This tendency is particularly high in the medical field. On the other hand, in the medical field, attempts are being made to record the conversations of doctors, nurses, and patients in order to reduce medical errors. However, recording the conversation during diagnosis or treatment as it is cannot protect the patient's privacy. This is because the recorded voice includes the name identifying the patient and the name of the patient's disease. For this reason, Japanese Patent Application Laid-Open No. 2004-357294 describes an apparatus that suppresses playback of a restricted keyword if the restricted keyword is included in the voice. Specific examples of the inhibition of reproduction include conversion of reproduction sound by interfering sound or skipping the restriction keyword itself.

しかしながら、録音された音声を再生する際に、制限キーワードの再生を抑止すると、会話の内容を理解することができなくなってしまうといった問題がある。たとえば、患者の名前、病名を含む文「山田さんは肺がんです」があった場合に、制限キーワード「山田」、「肺がん」とあれば、再生される文は「？？？さんは？？？？です」となってしまう。なお、「？」は妨害音を示す。このため、たとえば「さん」から人であることを認識できるけれども、医師であるのか、看護師であるのか、または、患者であるのか不明である。また、「です」からは、何かを肯定していることはわかるが、何を肯定しているのか不明である。このため、主語と述語がわからなくなってしまい、音声を再生する意味がなくなってしまうといった問題がある。
特開２００４−３５７２９４号公報 However, when the recorded voice is played back, if the playback of the restricted keyword is suppressed, there is a problem that the contents of the conversation cannot be understood. For example, if there is a sentence “Yamada is lung cancer” that includes the patient's name and disease name, and the restriction keywords “Yamada” and “lung cancer” exist, the reproduced sentence is “??? Is that? " “?” Indicates a disturbing sound. For this reason, for example, “san” can recognize that he is a person, but it is unknown whether he is a doctor, a nurse, or a patient. Also, from "I", you can see that you are affirming something, but you are not sure what you are affirming. For this reason, there is a problem that the subject and the predicate are not understood and the meaning of reproducing the sound is lost.
JP 2004-357294 A

この発明は上述した問題点を解決するためになされたもので、この発明の目的の１つは、個人のプライバシーを保護するとともに、会話の意味を理解できる音声データに変換することが可能な音声変換装置を提供することである。 The present invention has been made to solve the above-mentioned problems, and one of the objects of the present invention is to protect the privacy of the individual and to convert the voice data that can be understood into the meaning of the conversation. It is to provide a conversion device.

この発明の他の目的は、個人のプライバシーを保護するとともに、会話の意味を理解できる音声データに変換することが可能な音声変換プログラムおよび音声変換方法を提供することである。 Another object of the present invention is to provide a voice conversion program and a voice conversion method capable of converting personal voice into voice data that can understand the meaning of conversation while protecting personal privacy.

上述した目的を達成するために、この発明のある局面によれば、音声変換装置は、音声データを入力する音声データ入力手段と、入力された音声データの予め定められた置換対象語を含む単語部分を、単語部分に対応する代替音声データに置き換える置換手段と、を備える。 In order to achieve the above-described object, according to one aspect of the present invention, a speech conversion device includes speech data input means for inputting speech data, and a word including a predetermined replacement target word for the input speech data. Replacement means for replacing the portion with alternative speech data corresponding to the word portion.

この発明に従えば、音声データのうち予め定められた置換対象語を含む単語部分がそれに対応する代替音声データに置き換えられる。このため、たとえば、置換対象語を、氏名、住所等の個人を特定することのできる単語とすれば、音声データから個人を特定する部分が代替音声データに置き換えられるので、個人のプライバシーを保護することができる。さらに、置換対象語がまったく削除されることなく代替語に置き換えられるので、会話の意味を理解することができる。その結果、個人のプライバシーを保護するとともに、会話の意味を理解できる音声データに変換することが可能な音声変換装置を提供することができる。 According to the present invention, a word portion including a predetermined replacement target word in the audio data is replaced with the corresponding alternative audio data. For this reason, for example, if the replacement target word is a word that can specify an individual such as a name and address, the portion that specifies the individual from the voice data can be replaced with alternative voice data, thus protecting the privacy of the individual. be able to. Further, since the replacement target word is replaced with an alternative word without being deleted at all, the meaning of the conversation can be understood. As a result, it is possible to provide an audio conversion device capable of protecting personal privacy and converting audio data that can understand the meaning of conversation.

好ましくは、置換レベルを入力する置換レベル入力手段をさらに備え、置換手段は、入力された置換レベルに応じて置換対象語を決定する置換対象語決定手段を含む。 Preferably, a replacement level input means for inputting a replacement level is further provided, and the replacement means includes a replacement target word determining means for determining a replacement target word according to the input replacement level.

この発明に従えば、聞く者によって知らせてよい情報の範囲が異なってくることに対応することができる。 According to the present invention, it is possible to cope with the fact that the range of information that can be notified varies depending on the listener.

好ましくは、置換対象語の１つの単語に対応して置換レベルごとに代替語を定義した代替語テーブルを記憶する代替語テーブル記憶手段をさらに備え、置換手段は、代替語テーブルを用いて、入力された置換レベルに応じた代替音声データを生成する代替音声データ生成手段をさらに含む。 Preferably, the apparatus further comprises an alternative word table storage unit that stores an alternative word table that defines an alternative word for each replacement level corresponding to one word of the replacement target word, and the replacement unit uses the alternative word table as an input. It further includes alternative voice data generation means for generating alternative voice data corresponding to the replaced level.

この発明に従えば、聞く者によって知らせてよい情報の範囲が異なることに対応することができる。 According to the present invention, it is possible to cope with the difference in the range of information that can be notified depending on the listener.

好ましくは、置換レベルを入力するレベル入力手段と、置換対象語の１つの単語に対応して置換レベルごとに代替語を定義した代替語テーブルを記憶する代替語テーブル記憶手段と、をさらに備え、置換手段は、代替語テーブルを用いて、入力された置換レベルに応じた代替音声データを生成する代替音声データ生成手段を含む。 Preferably, the apparatus further comprises level input means for inputting a replacement level, and an alternative word table storage means for storing an alternative word table that defines an alternative word for each replacement level corresponding to one word of the replacement target word, The substitution means includes substitution voice data generation means for generating substitution voice data corresponding to the inputted substitution level using the substitution word table.

好ましくは、置換手段は、入力された音声データを音声認識してテキストデータを出力する音声認識手段と、テキストデータから置換対象語を抽出する置換対象語検出手段と、置換対象語に対応して代替語を定義した代替語テーブルを記憶する代替語テーブル記憶手段と、抽出された単語に対応する代替語を決定する代替語決定手段と、決定された代替語を音声合成して代替音声データを生成する音声合成手段と、入力された音声データの単語部分を代替音声データに置き換える音声データ置換手段とを含む。 Preferably, the replacement means corresponds to the speech recognition means for recognizing the input speech data and outputting the text data, the replacement target word detection means for extracting the replacement target word from the text data, and the replacement target word. An alternative word table storage unit that stores an alternative word table that defines an alternative word, an alternative word determination unit that determines an alternative word corresponding to the extracted word, and an alternative voice data by synthesizing the determined alternative word by voice Voice generation means for generating, and voice data replacement means for replacing the word portion of the input voice data with alternative voice data.

この発明の他の局面によれば、音声変換プログラムは、音声データを入力するステップと、入力された音声データの予め定められた置換対象語を含む単語部分を、単語部分に対応する代替音声データに置き換えるステップと、をコンピュータに実行させる。 According to another aspect of the present invention, a speech conversion program includes a step of inputting speech data, and a substitute speech data corresponding to a word portion including a word portion including a predetermined replacement target word of the input speech data. And causing the computer to execute.

この発明に従えば、個人のプライバシーを保護するとともに、会話の意味を理解できる音声データに変換することが可能な音声変換プログラムを提供することができる。 According to the present invention, it is possible to provide a voice conversion program capable of protecting personal privacy and converting voice data that can understand the meaning of conversation.

この発明のさらに他の局面によれば、音声変換方法は、音声データを入力するステップと、入力された音声データの予め定められた置換対象語を含む単語部分を、単語部分に対応する代替音声データに置き換えるステップとを含む。 According to still another aspect of the present invention, a speech conversion method includes a step of inputting speech data, and an alternative speech corresponding to a word portion including a predetermined replacement target word of the input speech data. And replacing with data.

この発明に従えば、個人のプライバシーを保護するとともに、会話の意味を理解できる音声データに変換することが可能な音声変換方法を提供することができる。 According to the present invention, it is possible to provide a voice conversion method capable of protecting personal privacy and converting voice data that can understand the meaning of conversation.

以下、本発明の実施の形態について図面を参照して説明する。以下の説明では同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがってそれらについての詳細な説明は繰返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

図１は、本発明の実施の形態の一つにおける音声変換装置のハード構成を示すブロック図である。図１を参照して、音声変換装置１００は、一般的なパーソナルコンピュータで構成される。音声変換装置１００は、それぞれがバス１２０に接続された中央演算装置（ＣＰＵ）１０１と、ＣＰＵ１０１が実行するためのプログラムなどを記録したＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０３と、実行されるプログラムをロードするための、およびプログラム実行中のデータを記憶するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０５と、データを不揮発的に記憶するためのハードディスクドライブ（ＨＤＤ）１０７と、ＩＣカード１０８が装着されるカードインターフェース（Ｉ／Ｆ）１０９と、ユーザとのインターフェイスとなる操作部１１１および表示部１１３と、音声変換装置１００をネットワーク１１８に接続するための通信Ｉ／Ｆ１１５と、音声処理回路１１６と、を含む。 FIG. 1 is a block diagram showing a hardware configuration of an audio conversion device according to one embodiment of the present invention. Referring to FIG. 1, the voice conversion device 100 is configured by a general personal computer. The voice conversion device 100 loads a central processing unit (CPU) 101, each of which is connected to the bus 120, a ROM (Read Only Memory) 103 in which a program to be executed by the CPU 101 is recorded, and a program to be executed. A RAM (Random Access Memory) 105 for storing data during program execution, a hard disk drive (HDD) 107 for storing data in a nonvolatile manner, and a card interface on which an IC card 108 is mounted ( I / F) 109, an operation unit 111 and a display unit 113 that serve as an interface with a user, a communication I / F 115 for connecting the audio conversion device 100 to a network 118, and an audio processing circuit 116.

ＣＰＵ１０１は、ＲＯＭ１０３に記録された音声変換プログラムをＲＡＭ１０５にロードして実行する。なお、ＣＰＵ１０１は、カードＩ／Ｆ１０９に装着されたＩＣカード１０８に記録された音声変換プログラムをＲＡＭ１０５にロードして実行するようにしてもよい。さらに、ネットワークに接続された他のコンピュータから音声変換プログラムをダウンロードしてＨＤＤ１０７に記憶し、その音声変換プログラムをＲＡＭ１０５にロードして実行するようにしてもよい。ここでいう音声変換プログラムは、ＣＰＵ１０１により直接実行可能なプログラムだけでなく、ソースプログラム形式のプログラム、圧縮処理されたプログラム、暗号化されたプログラム等を含む。 The CPU 101 loads the voice conversion program recorded in the ROM 103 into the RAM 105 and executes it. The CPU 101 may load the voice conversion program recorded on the IC card 108 attached to the card I / F 109 into the RAM 105 and execute it. Further, a voice conversion program may be downloaded from another computer connected to the network and stored in the HDD 107, and the voice conversion program may be loaded into the RAM 105 and executed. The voice conversion program here includes not only a program directly executable by the CPU 101 but also a program in a source program format, a compressed program, an encrypted program, and the like.

操作部１１１は、キーボードまたはマウス等の入力装置である。表示部１１３は、液晶表示装置、陰極線管（ＣＲＴ）またはプラズマディスプレイパネルである。通信Ｉ／Ｆ１１５は、音声変換装置１００をネットワーク１１８と接続するための通信インターフェイスである。これにより、音声変換装置１００は、ネットワーク１１８に接続された他のコンピュータとの間で通信することが可能となる。ネットワーク１１８は、ローカルエリアネットワーク（ＬＡＮ）であってもよいし、インターネット等のワイドエリアネットワーク（ＷＡＮ）であってもよい。また、シリアル回線またはパラレル回線で他のコンピュータと直接接続するものであってもよいし、モデムを介した電話回線で接続するものであってもよい。ネットワーク１１８は有線または無線の別を問わない。 The operation unit 111 is an input device such as a keyboard or a mouse. The display unit 113 is a liquid crystal display device, a cathode ray tube (CRT), or a plasma display panel. The communication I / F 115 is a communication interface for connecting the audio conversion device 100 to the network 118. As a result, the audio conversion device 100 can communicate with other computers connected to the network 118. The network 118 may be a local area network (LAN) or a wide area network (WAN) such as the Internet. Further, it may be connected directly to another computer via a serial line or parallel line, or may be connected via a telephone line via a modem. The network 118 may be wired or wireless.

音声処理回路１１６は、マイク１１７およびスピーカ１１９と接続される。マイク１１７は、人の音声を集音して、音声データを音声処理回路１１６に出力する。音声処理回路１１６は、アナログ信号の音声データを増幅する増幅回路、ノイズを除去するノイズ除去回路、デジタル信号に変換するためのＡ／Ｄ（アナログ／デジタル）コンバータ、デジタル信号の音声データをアナログ信号に変換するためのＤ／Ａ（デジタル／アナログ）コンバータと、を含む。音声処理回路１１６は、マイク１１７から入力されるアナログ信号の音声データを増幅し、ノイズ除去し、デジタル信号に変換して、デジタル信号の音声データをＣＰＵ１０１に出力する。また、音声処理回路１１６は、ＣＰＵ１０１から入力されるデジタル信号の音声データをアナログ信号に変換し、増幅した後、アナログ信号の音声データをスピーカ１１９に出力して、音声を発生させる。 The audio processing circuit 116 is connected to the microphone 117 and the speaker 119. The microphone 117 collects human voice and outputs voice data to the voice processing circuit 116. The audio processing circuit 116 includes an amplification circuit that amplifies audio data of an analog signal, a noise removal circuit that removes noise, an A / D (analog / digital) converter for converting the digital signal into an analog signal, And a D / A (digital / analog) converter. The audio processing circuit 116 amplifies the analog signal audio data input from the microphone 117, removes noise, converts the analog signal into a digital signal, and outputs the digital signal audio data to the CPU 101. In addition, the audio processing circuit 116 converts the digital signal audio data input from the CPU 101 into an analog signal, amplifies it, and then outputs the analog signal audio data to the speaker 119 to generate audio.

なお、音声変換プログラムを記録する記録媒体としてはＩＣカード１０８に限られず、フレキシブルディスク、カセットテープ、光ディスク（ＭＯ（ＭａｇｎｅｔｉｃＯｐｔｉｃａｌＤｉｓｃ／ＭＤ（ＭｉｎｉＤｉｓｃ）／ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ））、光カード、マスクＲＯＭ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）などの半導体メモリ等の固定的にプログラムを担持する媒体であってもよい。 Note that the recording medium for recording the voice conversion program is not limited to the IC card 108, but a flexible disk, a cassette tape, an optical disk (MO (Magnetic Optical Disc / MD (Mini Disc) / DVD (Digital Versatile Disc)), an optical card, It may be a medium that carries a fixed program such as a semiconductor memory such as a mask ROM or EPROM (Erasable Programmable ROM).

図２は、音声変換装置のＣＰＵ１０１の機能の概要を示す機能ブロック図である。図２を参照して、ＣＰＵ１０１は、マイク１１７からの音声を処理する音声処理回路１１６に接続された音声入力部１１と、この音声入力部１１から入力される音声データを認識してテキストデータ化する音声認識部１３と、テキストデータから置換の対象となる置換対象語を検出する置換対象語検出部１５と、検出された置換対象語を置換するための代替語を決定する代替語決定部１７と、代替語の音声データを生成する音声合成部１９と、音声データの置換対象語の部分を代替語の音声データに置き換える同期化部２１と、置換対象語と代替語とを関連付けた置換語リストを生成する置換語リスト生成部２５と、操作部１１１から指示される置換レベル入力部２３と、を含んでいる。 FIG. 2 is a functional block diagram showing an outline of the functions of the CPU 101 of the voice conversion device. Referring to FIG. 2, CPU 101 recognizes voice data input from voice input unit 11 connected to voice processing circuit 116 that processes voice from microphone 117, and converts it into text data. A speech recognition unit 13 that performs replacement, a replacement target word detection unit 15 that detects a replacement target word that is a replacement target from text data, and an alternative word determination unit 17 that determines a replacement word for replacing the detected replacement target word. A speech synthesizer 19 that generates speech data of alternative words, a synchronization unit 21 that replaces the replacement target word portion of the speech data with the replacement word speech data, and a replacement word that associates the replacement target word with the replacement word A replacement word list generation unit 25 for generating a list and a replacement level input unit 23 instructed from the operation unit 111 are included.

マイク１１７で集音された人の会話の音声データは、音声処理回路１１６に入力される。音声処理回路１１６は、入力された音声データを、デジタル信号の音声データに変換して、音声入力部１１に出力する。音声入力部１１は、音声処理回路１１６から入力された音声データを、音声認識部１３および同期化部２１に出力する。音声認識部１３は、音声入力部１１および置換対象語検出部１５に接続される。音声認識部１３は、音声入力部１１から入力される音声データを音声認識して、テキストデータに変換し、そのテキストデータを置換対象語検出部１５に出力する。 Voice data of a person's conversation collected by the microphone 117 is input to the voice processing circuit 116. The audio processing circuit 116 converts the input audio data into audio data of a digital signal and outputs it to the audio input unit 11. The voice input unit 11 outputs the voice data input from the voice processing circuit 116 to the voice recognition unit 13 and the synchronization unit 21. The voice recognition unit 13 is connected to the voice input unit 11 and the replacement target word detection unit 15. The speech recognition unit 13 recognizes the speech data input from the speech input unit 11 and converts it into text data, and outputs the text data to the replacement target word detection unit 15.

なお、ここでは音声入力部１１に、音声処理回路１１６から音声データが入力される例を示すが、ＩＣカード１０８に記録された音声データがカードＩ／Ｆ１０９により読み出されて、音声入力部１１に入力されてもよい。この場合、音声データが記録される記録媒体は、ＩＣカード１０８に限られず、ＨＤＤ１０７等の磁気記録装置であってもよいし、ＣＤ−ＲＯＭ等の光磁気ディスクであってもよいし、ＥＰＲＯＭなどの半導体メモリであってもよい。 Here, an example in which audio data is input from the audio processing circuit 116 to the audio input unit 11 is shown, but audio data recorded on the IC card 108 is read out by the card I / F 109 and the audio input unit 11 is read out. May be entered. In this case, the recording medium on which the audio data is recorded is not limited to the IC card 108, but may be a magnetic recording device such as the HDD 107, a magneto-optical disk such as a CD-ROM, or an EPROM. The semiconductor memory may be used.

置換レベル入力部２３は、操作部１１１につらなっており、操作部１１１から入力される情報に基づいて置換レベルを取得する。ここで、置換レベルとは、医師、看護師、薬剤師などの医療関係者と、それ以外の患者、家族、知人などの医療関係者以外の人とに対して付す資格のことを云い、医療関係者の置換レベルを「１」とし、それ以外の人の置換レベルを「２」としている。置換レベル入力部２３は、音声変換装置１００のユーザが誰であるかを把握するためにユーザ認証機能を備えている。具体的には、音声変換装置１００は、ユーザを識別するためのユーザ識別情報と、パスワードと、置換レベルとを関連付けたユーザ情報を予めＨＤＤ１０７に記憶している。資格は、ここでは、医師、看護師、患者の別である。ユーザが操作部１１１にユーザ識別情報とパスワードとを入力すると、置換レベル入力部２３は、入力されたユーザ識別情報とパスワードとを含むユーザ情報を、ＨＤＤ１０７から抽出する。入力されたユーザ識別情報とパスワードとを含むユーザ情報が抽出されたならば、そのユーザ情報から置換レベルを取得するのである。また、そのようなユーザ情報が抽出されないならば、置換レベルを「２」とする。なお、ここでは、ユーザ認証を、ユーザ識別情報とパスワードとで認証する例を説明したが、生体認証を用いるようにしてもよい。また、ユーザ認証をすることなく、ユーザが操作部１１１を操作して、置換レベルを直接入力するようにしてもよい。置換レベル入力部２３は、置換レベルを置換対象語検出部１５および代替語決定部１７に出力する。 The replacement level input unit 23 is connected to the operation unit 111, and acquires a replacement level based on information input from the operation unit 111. Here, the replacement level refers to the qualification given to medical personnel such as doctors, nurses, pharmacists, and other non-medical personnel such as patients, family members, acquaintances, etc. The replacement level of the person is “1”, and the replacement level of the other person is “2”. The replacement level input unit 23 has a user authentication function in order to grasp who the user of the voice conversion device 100 is. Specifically, the voice conversion device 100 stores in advance in the HDD 107 user information that associates user identification information for identifying a user, a password, and a replacement level. The qualifications here are doctors, nurses and patients. When the user inputs user identification information and a password to the operation unit 111, the replacement level input unit 23 extracts user information including the input user identification information and password from the HDD 107. If user information including the input user identification information and password is extracted, the replacement level is acquired from the user information. If such user information is not extracted, the replacement level is set to “2”. Although an example in which user authentication is performed using user identification information and a password has been described here, biometric authentication may be used. Alternatively, the user may directly input the replacement level by operating the operation unit 111 without performing user authentication. The replacement level input unit 23 outputs the replacement level to the replacement target word detection unit 15 and the alternative word determination unit 17.

置換対象語検出部１５は、音声認識部１３からテキストデータが入力され、置換レベル入力部２３から置換レベルが入力される。置換対象語検出部１５は、ＲＯＭ１０３に接続されており、ＲＯＭ１０３に記憶されている代替語テーブル３５を読み出す。 The replacement target word detection unit 15 receives text data from the speech recognition unit 13 and receives a replacement level from the replacement level input unit 23. The replacement target word detection unit 15 is connected to the ROM 103 and reads an alternative word table 35 stored in the ROM 103.

ここで、代替語テーブル３５について説明する。図３は、代替語テーブルの一例を示す図である。代替語テーブル３５は、置換対象語と、その置換対象語を置換するための代替語とを定義する。置換対象語は、置換レベルにより異なり、１つの置換対象語に対応する代替語は置換レベルにより異なる。より具体的に説明すると、代替語テーブル３５では、置換対象語を、所定の種類の単語で定義する。本実施の形態においては、所定種類の単語を、人名、電話番号、病名（特殊）薬品名（特殊）、病名（一般）、薬品名（一般）、検査名、処置名としている。 Here, the alternative word table 35 will be described. FIG. 3 is a diagram illustrating an example of an alternative word table. The replacement word table 35 defines replacement target words and replacement words for replacing the replacement target words. The replacement target word differs depending on the replacement level, and the alternative word corresponding to one replacement target word differs depending on the replacement level. More specifically, in the alternative word table 35, the replacement target word is defined by a predetermined type of word. In the present embodiment, a predetermined type of word is a person name, a telephone number, a disease name (special) drug name (special), a disease name (general), a drug name (general), a test name, and a treatment name.

人名は、医師、看護師、患者およびその他の者を含む。病名（特殊）は、奇な病気の名称であり、病名（一般）は、風、糖尿病などの一般的に多くの人がかかる病気の病名である。奇な病気の患者は数が少ないためその病名をきけば患者を特定できてしまうため、病名（特殊）は、病名（一般）と別の種類に分類される。薬品名（特殊）は、奇な病気の治療に用いられる薬品の名称であり、薬品名（一般）は、一般的に多くの人がかかる病気の治療に用いられる薬品の名称である。薬品名（特殊）を聞けば患者を特定できてしまうため、薬品名（特殊）は、薬品名（一般）と別の種類に分類される。 Personal names include doctors, nurses, patients and others. A disease name (special) is a name of an odd disease, and a disease name (general) is a disease name of a disease that generally affects many people such as wind and diabetes. Since there are only a few patients with strange illnesses, the patient name can be identified by identifying the disease name. Therefore, the disease name (special) is classified into a different type from the disease name (general). The drug name (special) is the name of a drug used for the treatment of a strange disease, and the drug name (general) is the name of a drug that is generally used for the treatment of such a disease by many people. Since the patient can be identified by listening to the medicine name (special), the medicine name (special) is classified into a different type from the medicine name (general).

代替語テーブル３５は、置換レベルにより置換対象語を異ならせて定義する。置換レベル「１」では、置換対象語は、人名、病名（特殊）、薬品名（特殊）および電話番号であるのに対して、置換レベル「２」では、置換対象語は、人名、病名（特殊）、薬品名（特殊）、電話番号、病名（一般）、薬品名（一般）、検査名および処置名である。 The alternative word table 35 is defined by changing the replacement target word according to the replacement level. At the replacement level “1”, the replacement target words are a person name, a disease name (special), a drug name (special), and a telephone number, whereas at the replacement level “2”, the replacement target word is a person name, a disease name ( Special), drug name (special), telephone number, disease name (general), drug name (general), examination name and treatment name.

また、代替語テーブル３５は、置換レベルにより１つの置換対象語に対応する代替語を異ならせて定義する。たとえば、置換対象語「医師」に対応する代替語は、置換レベル「１」では「Ｄｒ」に番号を付したものであり、置換レベル「２」では「Ｄｒ」のみである。置換レベル「１」では、医師ごとに異なる代替語が定義されるが、置換レベル「２」では、全ての医師に同じ代替語が定義される。置換対象語「看護師」に対応する代替語は、置換レベル「１」では「Ｎｓ」に経験年数を付したものであり、置換レベル「２」では「Ｎｓ」のみである。置換レベル「１」では、経験年数の異なる看護師で異なる代替語が定義され、代替語から経験年数がわかる。置換レベル「２」では、全ての看護師に同じ代替語が定義される。置換対象語「患者」に対応する代替語は、置換レベル「１」では「Ｐｔ」にケア度を付したものであり、置換レベル「２」では「Ｐｔ」のみである。置換レベル「１」では、患者ごとに異なる代替語が定義され、代替語からケア度がわかる。置換レベル「２」では、全ての患者に同じ代替語が定義される。 The alternative word table 35 defines different alternative words corresponding to one replacement target word depending on the replacement level. For example, the replacement word corresponding to the replacement target word “doctor” is a number obtained by adding a number to “Dr” at the replacement level “1”, and only “Dr” at the replacement level “2”. At the replacement level “1”, different alternative words are defined for each doctor. At the replacement level “2”, the same alternative word is defined for all doctors. The replacement word corresponding to the replacement target word “nurse” is “Ns” plus years of experience at the replacement level “1”, and only “Ns” at the replacement level “2”. At the replacement level “1”, different alternative words are defined by nurses with different years of experience, and the years of experience are known from the alternative words. At the replacement level “2”, the same alternative word is defined for all nurses. The substitution word corresponding to the substitution target word “patient” is “Pt” added with a care degree at the substitution level “1”, and only “Pt” at the substitution level “2”. At the replacement level “1”, different alternative words are defined for each patient, and the degree of care is known from the alternative words. At substitution level “2”, the same alternative word is defined for all patients.

図２に戻って、置換対象語検出部１５は、置換レベルおよび代替語テーブルに基づき、置換対象語を決定する。すなわち、代替語テーブルと置換レベルとから置換対象語を取得し、テキストデータを走査して置換対象語を抽出する。上述したように、置換レベル「１」の場合には、置換レベル「２」の場合に比較して、置換対象語の数が少ない。そして、置換対象語検出部１５は、テキストデータの置換対象語の位置を同期化部２１に出力し、置換対象語を代替語決定部１７に出力する。 Returning to FIG. 2, the replacement target word detector 15 determines a replacement target word based on the replacement level and the alternative word table. That is, the replacement target word is acquired from the replacement word table and the replacement level, and the replacement target word is extracted by scanning the text data. As described above, the replacement level “1” has a smaller number of replacement target words than the replacement level “2”. Then, the replacement target word detection unit 15 outputs the position of the replacement target word in the text data to the synchronization unit 21 and outputs the replacement target word to the replacement word determination unit 17.

代替語決定部１７は、置換対象語検出部１５からテキストデータの置換対象語が入力され、置換レベル入力部２３から置換レベルが入力される。代替語決定部１７は、ＲＯＭ１０３に接続されており、ＲＯＭ１０３に記憶されている代替語テーブル３５を読み出す。代替語決定部１７は、置換レベルおよび代替テーブルに基づいて、置換対象語に対応する代替語を決定する。すなわち、代替語テーブルと置換レベルとから置換対象語に対応する代替語を決定する。上述したように、置換レベル「１」の場合には、置換レベル「２」の場合に比較して、１つの置換語に対応する代替語の数が多い。 The replacement word determination unit 17 receives the replacement target word of the text data from the replacement target word detection unit 15 and receives the replacement level from the replacement level input unit 23. The alternative word determination unit 17 is connected to the ROM 103 and reads the alternative word table 35 stored in the ROM 103. The alternative word determination unit 17 determines an alternative word corresponding to the replacement target word based on the replacement level and the replacement table. That is, an alternative word corresponding to the replacement target word is determined from the alternative word table and the replacement level. As described above, in the case of the replacement level “1”, the number of alternative words corresponding to one replacement word is larger than that in the case of the replacement level “2”.

本実施の形態における音声変換装置１００は、人名と資格とを関連付けた人名テーブルを予めＨＤＤ１０７に記憶している。この人名テーブルは、看護師の場合には経験年数を、患者の場合にはケア度をさらに関連付けたものである。代替語決定部１７は、置換対象語検出部１５から置換対象語の人名が入力されると、人名から医師、看護師、患者のいずれであるかを判断する。また、看護師の場合には経験年数、患者の場合にはケア度が判断される。たとえば、人名「田中」と人名「山田」の資格がともに「医師」であり、人名「田中」と人名「山田」が置換対象語に決定された場合について説明する。代替語決定部１７は、置換レベル「１」の場合には、人名「田中」の代替語を「Ｄｒ１」に決定し、人名「山田」の代替語を「Ｄｒ２」に決定する。代替語決定部１７は、置換レベル「２」の場合には、人名「田中」の代替語を「Ｄｒ」に決定し、人名「山田」の代替語を「Ｄｒ」に決定する。代替語決定部１７は、代替語を音声合成部１９に出力し、置換対象語と代替語との組を置換語リスト生成部２５に出力する。 The voice conversion device 100 according to the present embodiment stores a personal name table that associates personal names and qualifications in the HDD 107 in advance. This personal name table further associates the years of experience in the case of a nurse and the degree of care in the case of a patient. When the replacement target word detection unit 15 receives the name of the replacement target word, the replacement word determination unit 17 determines whether the replacement word determination unit 17 is a doctor, a nurse, or a patient. In the case of a nurse, the years of experience are judged, and in the case of a patient, the degree of care is judged. For example, a case where the qualifications of the person name “Tanaka” and the person name “Yamada” are both “doctors” and the person name “Tanaka” and the person name “Yamada” are determined as replacement target words will be described. When the replacement level is “1”, the alternative word determination unit 17 determines the alternative word for the personal name “Tanaka” as “Dr1” and the alternative word for the personal name “Yamada” as “Dr2”. In the case of the replacement level “2”, the alternative word determination unit 17 determines the alternative word for the personal name “Tanaka” as “Dr” and the alternative word for the personal name “Yamada” as “Dr”. The alternative word determination unit 17 outputs the alternative word to the speech synthesizer 19, and outputs the combination of the replacement target word and the alternative word to the replacement word list generation unit 25.

音声合成部１９には、代替語決定部１７から代替語が入力される。音声合成部１９は、テキストデータの代替語を音声合成して、音声データに変換する。そして、音声合成部１９は、音声データを同期化部２１に出力する。 An alternative word is input from the alternative word determination unit 17 to the speech synthesizer 19. The voice synthesizer 19 synthesizes an alternative word of text data as voice and converts it into voice data. Then, the voice synthesizer 19 outputs the voice data to the synchronizer 21.

同期化部２１は、音声合成部１９から代替語の音声データが入力され、音声入力部１１から音声データが入力され、置換対象語検出部１５から置換対象語のテキストデータ中の位置情報が入力される。同期化部２１は、音声入力部から入力される音声データのうち置換対象語に相当する部分を、代替語の音声データに置き換える。同期化部２１は、置換対象語のテキストデータの位置情報から音声データ中の置換対象語に相当する音声データの位置を取得し、その位置の音声データを代替語の音声データに置き換える。そして、同期化部２１は、代替語に置き換えた音声データを音声処理回路１１６およびＨＤＤ１０７に出力する。これにより、置換対象語が代替語に置き換えられた音声データが、スピーカ１１９から出力される。また、ＨＤＤ１０７に、置換対象語が代替語に置き換えられた音声データ３３が記憶される。 The synchronizer 21 receives the speech data of the substitute word from the speech synthesizer 19, the speech data from the speech input unit 11, and the position information in the text data of the replacement target word from the replacement target word detector 15. Is done. The synchronization unit 21 replaces a portion corresponding to the replacement target word in the voice data input from the voice input unit with the voice data of the alternative word. The synchronization unit 21 acquires the position of the speech data corresponding to the replacement target word in the speech data from the position information of the text data of the replacement target word, and replaces the speech data at that position with the speech data of the alternative word. Then, the synchronization unit 21 outputs the audio data replaced with the alternative word to the audio processing circuit 116 and the HDD 107. Thereby, the audio data in which the replacement target word is replaced with the alternative word is output from the speaker 119. Also, the HDD 107 stores the audio data 33 in which the replacement target word is replaced with an alternative word.

置換語リスト生成部２５には、代替語決定部１７から置換対象語と代替語との組が入力される。置換語リスト生成部２５は、置換対象語と代替語との組から置換対象語と代替語とを関連付けた置換語リストを生成して、ＨＤＤ１０７に記憶する。これにより、ＨＤＤ１０７に置換語リスト３１が記憶される。 The replacement word list generation unit 25 receives a combination of a replacement target word and a replacement word from the replacement word determination unit 17. The replacement word list generation unit 25 generates a replacement word list in which the replacement target word and the replacement word are associated with each other from the combination of the replacement target word and the replacement word, and stores the replacement word list in the HDD 107. As a result, the replacement word list 31 is stored in the HDD 107.

図４は、音声変換装置１０２で実行される置換処理の流れの一例を示すフローチャートである。図４を参照して、置換処理では、置換レベルが入力されたか否かを判断する（ステップＳ０１）。置換レベルが入力されたならステップＳ０２へ進み、そうでなければ待機状態となる。ステップＳ０２では、音声データが入力されるまで待機状態となり（ステップＳ０２でＮＯ）、音声データが入力されるとステップＳ０３に進む。すなわち、置換処理は、置換レベルおよび音声データが入力されることを条件に実行される処理である。そして、入力された音声データを音声認識して、テキストデータに変換する（ステップＳ０３）。そして、テキストデータを取得する（ステップＳ０４）。 FIG. 4 is a flowchart showing an example of the flow of replacement processing executed by the speech conversion apparatus 102. Referring to FIG. 4, in the replacement process, it is determined whether or not a replacement level has been input (step S01). If a replacement level is input, the process proceeds to step S02, and if not, a standby state is entered. In step S02, the process waits until voice data is input (NO in step S02). When voice data is input, the process proceeds to step S03. That is, the replacement process is a process executed on condition that a replacement level and audio data are input. The input voice data is recognized as voice and converted to text data (step S03). Then, text data is acquired (step S04).

次のステップＳ０５では、テキストデータのテキストを順に走査して、置換対象語を検出する。具体的には、ステップＳ０１で入力された置換レベルと、ＲＯＭ１０３から読み出した代替語テーブル３５とから置換対象語を取得し、テキストが取得した置換対象語であるか否かを判断する。テキストが置換対象語であれば置換対象語が検出されたとしてステップＳ０６に進み、置換対象語でなければ検出されないとしてステップＳ１１に進む。 In the next step S05, the text of the text data is sequentially scanned to detect a replacement target word. Specifically, a replacement target word is acquired from the replacement level input in step S01 and the replacement word table 35 read from the ROM 103, and it is determined whether the text is the replacement target word acquired. If the text is a replacement target word, the process proceeds to step S06 assuming that the replacement target word is detected, and the process proceeds to step S11 because it is not detected if it is not a replacement target word.

ステップＳ０６では、ステップＳ０１で入力された置換レベルと、ＲＯＭ１０３から読み出した代替語テーブルとから、ステップＳ０５で検出された置換対象語に対応する代替語を決定する。そして、ステップＳ０１で入力された置換レベルが「１」か否かが判断され（ステップＳ０７）、イエスの場合にはステップＳ０８に進み、ノーの場合にはステップＳ０８をスキップしてステップＳ０９に進む。ステップＳ０８では、置換対象語と代替語との組を、ＨＤＤ１０７に記憶されている置換語リスト３１に追加した後、ステップＳ０９に進む。 In step S06, an alternative word corresponding to the replacement target word detected in step S05 is determined from the replacement level input in step S01 and the alternative word table read from the ROM 103. Then, it is determined whether or not the replacement level input in step S01 is “1” (step S07). If yes, the process proceeds to step S08, and if no, the process skips step S08 and proceeds to step S09. . In step S08, the combination of the replacement target word and the alternative word is added to the replacement word list 31 stored in the HDD 107, and the process proceeds to step S09.

ステップＳ０９では、ステップＳ０６で決定された代替語を音声合成して、音声データを生成する（ステップＳ０９）。そして、ステップＳ０２で入力された音声データのうちステップＳ０５で検出された置換対象語に相当する部分を、ステップＳ０９で生成された代替語の音声データに置き換える（ステップＳ１０）。 In step S09, the alternative word determined in step S06 is synthesized with speech to generate speech data (step S09). Then, the portion corresponding to the replacement target word detected in step S05 in the speech data input in step S02 is replaced with the substitute word speech data generated in step S09 (step S10).

次に走査するべきテキストがテキストデータに存在するか否かが判断され（ステップＳ１１）、存在すればステップＳ０５に戻り、存在しなければステップＳ１２に進む。ステップＳ１２では、ステップＳ０１で入力された置換レベルが「１」か否かが判断され、イエスの場合にはステップＳ１３に進み、ノーの場合にはステップＳ１３をスキップしてステップＳ１４に進む。ステップＳ１３では、ＨＤＤ１０７に記憶されている置換語リストが出力される。出力は、表示部１１３であってもよいし、紙などの記録媒体にプリンタで印刷するようにしてもよい。 Next, it is determined whether or not the text to be scanned exists in the text data (step S11). If it exists, the process returns to step S05, and if not, the process proceeds to step S12. In step S12, it is determined whether or not the replacement level input in step S01 is “1”. If yes, the process proceeds to step S13, and if no, the process skips step S13 and proceeds to step S14. In step S13, the replacement word list stored in HDD 107 is output. The output may be from the display unit 113 or may be printed on a recording medium such as paper by a printer.

ステップＳ１４では、ステップＳ１０で代替語の音声データに置換された音声データを出力する。音声データの出力は、音声処理回路１１６の場合にはスピーカから音声が出力され、ＨＤＤ１０７の場合には音声データ３３がＨＤＤ１０７に記憶される。 In step S14, the voice data replaced with the alternative word voice data in step S10 is output. In the case of the audio processing circuit 116, the audio data is output from the speaker. In the case of the HDD 107, the audio data 33 is stored in the HDD 107.

図５は、音声変換装置に入力される音声データをテキストで示す図である。図中、左欄は、右欄に示す会話の話者を示す。Ｄｒ１は、氏名「田中」の資格「医師」を示し、Ｎｓ５は、氏名「屋」の資格「経験年数５年の看護師」を示し、Ｎｓ２は、氏名「名東」の資格「経験年数２年」を示し、ＰｔＤ３は、氏名「桑原」の資格「看護度Ｄ３の患者」を示している。 FIG. 5 is a diagram showing voice data input to the voice conversion device in text. In the figure, the left column shows the speakers of the conversation shown in the right column. Dr1 indicates the qualification “doctor” of the name “Tanaka”, Ns5 indicates the qualification “nurse” with the name “ya”, and Ns2 indicates the qualification “2 years of experience” with the name “Meito” PtD3 indicates the qualification “patient with nursing degree D3” of the name “Kuwahara”.

図６は、図５に示した音声データが置換レベル「１」で音声変換された後の音声データをテキストで示す図である。図中左欄は、図５に示したのと同じである。図７は、図６に示した音声データに対応して生成される置換語リストを示す図である。図８は、図５に示した音声データが置換レベル「２」で音声変換された後の音声データをテキストで示す図である。 FIG. 6 is a diagram showing, as text, the voice data after the voice data shown in FIG. 5 is voice-converted at the replacement level “1”. The left column in the figure is the same as that shown in FIG. FIG. 7 is a diagram showing a replacement word list generated corresponding to the voice data shown in FIG. FIG. 8 is a diagram showing text data of the voice data after the voice data shown in FIG. 5 is voice-converted at the replacement level “2”.

図５、図６および図８を参照して、代替語に置換された語数は、図６に示す音声データよりも、図８に示す音声データの方が多い。これは、代替語テーブル３５で、置換レベルに応じて置換対象語を異ならせたことによる。また、図６に示す音声データでの代替語が、図８に示す音声データでの代替語よりも種類が多い。これは、代替語テーブル３５で、置換レベルに応じて置換対象語に対応する代替語を異ならせたことによる。さらに、図６の音声データに対して、図７に示す置換語リストが示されるので、置換語リストを見ながら音声データを聞く医療関係者は、変換前の音声データを再現することができる。 Referring to FIG. 5, FIG. 6, and FIG. 8, the number of words replaced with alternative words is larger in the voice data shown in FIG. 8 than in the voice data shown in FIG. This is because the replacement word is made different in the replacement word table 35 according to the replacement level. Further, there are more types of alternative words in the voice data shown in FIG. 6 than alternative words in the voice data shown in FIG. This is because, in the alternative word table 35, the alternative words corresponding to the replacement target word are made different according to the replacement level. Furthermore, since the replacement word list shown in FIG. 7 is shown for the voice data of FIG. 6, medical personnel who listen to the voice data while looking at the replacement word list can reproduce the voice data before conversion.

＜変形例＞
上述した音声変換装置１００では、入力された音声データの置換対象語に対応する部分を代替語の音声データに置換した音声データを出力するようにした。変形例における音声変換装置１００では、代替語の音声データに置換した音声データを出力するのではなく、テキストデータのうち置換対象語を代替語に置換したテキストデータを出力するようにしたものである。 <Modification>
In the speech conversion apparatus 100 described above, speech data in which a portion corresponding to a replacement target word in the input speech data is replaced with speech data of an alternative word is output. The speech conversion apparatus 100 according to the modification does not output the speech data replaced with the speech data of the alternative word, but outputs the text data in which the replacement target word is replaced with the alternative word in the text data. .

図９は、変形例における音声変換装置のＣＰＵの機能の概要を示す機能ブロック図である。図９を参照して、図２に示した機能ブロック図と異なる点は、音声合成部１９および同期化部２１に代えて置換部２７を含む点である。その他の構成は、図２に示したのと同じなので、ここでは説明を繰り返さない。図９を参照して、置換部２７には、代替語決定部１７から代替語が入力され、音声認識部１３からテキストデータが入力され、置換対象語検出部１５から置換対象語のテキストデータ中の位置情報が入力される。置換部２７は、テキストデータのうち置換対象語に相当する部分を、代替語に置き換える。そして、置換部２７は、代替語に置き換えたテキストデータをＨＤＤ１０７に記憶する。これにより、ＨＤＤ１０７に置換後テキストデータ３７が記憶される。 FIG. 9 is a functional block diagram showing an overview of the functions of the CPU of the speech conversion apparatus according to the modification. Referring to FIG. 9, the difference from the functional block diagram shown in FIG. 2 is that a replacement unit 27 is included instead of speech synthesis unit 19 and synchronization unit 21. Other configurations are the same as those shown in FIG. 2, and therefore, description thereof will not be repeated here. Referring to FIG. 9, the replacement unit 27 receives the replacement word from the replacement word determination unit 17, receives the text data from the speech recognition unit 13, and includes the replacement target word detection unit 15 in the text data of the replacement target word. Position information is input. The replacement unit 27 replaces a portion corresponding to the replacement target word in the text data with an alternative word. Then, the replacement unit 27 stores the text data replaced with the alternative word in the HDD 107. As a result, the replaced text data 37 is stored in the HDD 107.

図１０は、変形例における音声変換装置で実行される置換処理の流れの一例を示すフローチャートである。図１０を参照して、図４に示した置換処理と異なる点は、ステップＳ０９およびＳ１０に代えて、ステップＳ２１が実行される点、ステップＳ１４に代えてステップＳ２２が実行される点である。その他の処理は、図４に示したのと同じなのでここでは説明を繰り返さない。ステップＳ２１では、ステップＳ０４で取得されたテキストデータのうちステップＳ０５で検出された置換対象語を、ステップＳ０６で決定された代替語に置き換える。 FIG. 10 is a flowchart illustrating an example of the flow of replacement processing executed by the speech conversion apparatus according to the modification. Referring to FIG. 10, the difference from the replacement process shown in FIG. 4 is that step S21 is executed instead of steps S09 and S10, and step S22 is executed instead of step S14. The other processes are the same as those shown in FIG. 4, and therefore description thereof will not be repeated here. In step S21, the replacement target word detected in step S05 in the text data acquired in step S04 is replaced with the alternative word determined in step S06.

ステップＳ２２では、ステップＳ２１で代替語に置換されたテキストデータ（置換後テキストデータ）をＨＤＤ１０７に記憶する。 In step S22, the text data replaced with the alternative word in step S21 (replaced text data) is stored in the HDD 107.

以上説明したように、本実施の形態における音声変換装置１００は、音声データのうち置換対象語を含む単語部分を、代替語テーブル３５で定義された代替語を音声合成した音声データに置き換える。このため、たとえば、置換対象語を、氏名、住所等の個人を特定することのできる単語とすれば、音声データから個人を特定する部分が代替語の音声データに置き換えられるので、個人のプライバシーを保護することができる。さらに、置換対象語がまったく削除されることなく代替語に置き換えられるので、会話の意味を理解することができる。 As described above, the speech conversion apparatus 100 according to the present embodiment replaces the word portion including the replacement target word in the speech data with the speech data obtained by speech synthesis of the alternative word defined in the alternative word table 35. For this reason, for example, if the replacement target word is a word that can specify an individual such as a name or address, the portion that specifies the individual from the voice data is replaced with the voice data of the alternative word. Can be protected. Further, since the replacement target word is replaced with an alternative word without being deleted at all, the meaning of the conversation can be understood.

また、置換レベルに応じて置換対象語を異ならせるので、聞く者によって知らせてよい情報の範囲が異なってくることに対応することができる。たとえば、医療関係者には、置換対象語を代替語に置換することによって患者のプライバシーを保護するが、置換対象語をできるだけ少なくして、医師と看護師との会話をできるだけ詳細に聞かせることができる。逆に、医療に関係のない者には、できるだけ多くの置換対象語を代替語に置換した音声を聞かせることによって患者のプライバシーを可能な限り保護することができる。 In addition, since the replacement target word is made different according to the replacement level, it is possible to cope with the range of information that can be notified depending on the listener. For example, medical personnel can protect patient privacy by substituting replacement words with alternative words, but with as few replacement words as possible and listening to the doctor-nurse conversation as closely as possible. Can do. On the other hand, the patient's privacy can be protected as much as possible by letting a person who is not related to medical care hear the voice in which as many replacement words as possible are replaced with alternative words.

また、代替語テーブルは、１つの置換対象語に対応して置換レベルごとに代替語を定義するので、聞く者によって知らせてよい情報の範囲が異なってくることに対応することができる。たとえば、医療関係者には、置換対象語を複数の代替語に置換することによって、患者のプライバシーを保護するとともに、医師と看護師との会話をできるだけ詳細に聞かせることができる。より具体的には、会話だけから薬品名の違いを理解できるようにすることができる。逆に、医療に関係のない者には、１つの置換対象語を単一の代替語に置換した音声を聞かせることによって患者のプライバシーを可能な限り保護することができる。代替語によれば、置換対象語の違いを区別できるので、会話を聞いて、人名なのか、薬品名なのかの区別をすることができる。 Further, since the alternative word table defines alternative words for each replacement level corresponding to one replacement target word, it is possible to cope with the range of information that may be notified by the listener. For example, by replacing the replacement target word with a plurality of alternative words, the medical staff can protect the privacy of the patient and can hear the conversation between the doctor and the nurse as detailed as possible. More specifically, the difference in drug names can be understood only from conversation. Conversely, for a person who is not related to medical care, the patient's privacy can be protected as much as possible by listening to the voice in which one replacement word is replaced with a single replacement word. According to the alternative word, since the difference between the replacement target words can be distinguished, it is possible to distinguish whether it is a person name or a medicine name by listening to the conversation.

なお、本実施の形態においては、音声変換装置１００について説明したが、図４または図９に示した置換処理をコンピュータに実行させるための音声変換プログラムまたは音声変換方法として発明を捉えることができるのは言うまでもない。 In the present embodiment, the voice conversion device 100 has been described. However, the invention can be understood as a voice conversion program or a voice conversion method for causing a computer to execute the replacement process shown in FIG. 4 or FIG. Needless to say.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本発明の実施の形態の一つにおける音声変換装置のハード構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the speech converter in one of the embodiments of this invention. 音声変換装置のＣＰＵの機能の概要を示す機能ブロック図である。It is a functional block diagram which shows the outline | summary of the function of CPU of a speech converter. 代替語テーブルの一例を示す図である。It is a figure which shows an example of an alternative word table. 音声変換装置で実行される置換処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the replacement process performed with a speech converter. 音声変換装置に入力される音声データをテキストで示す図である。It is a figure which shows the audio | voice data input into an audio | voice conversion apparatus with a text. 図５に示した音声データが置換レベル「１」で音声変換された後の音声データをテキストで示す図である。It is a figure which shows the audio | voice data after carrying out the audio | voice conversion of the audio | voice data shown in FIG. 5 by substitution level "1" with a text. 図７に示した音声データに対応して生成される置換語リストを示す図である。It is a figure which shows the replacement word list | wrist produced | generated corresponding to the audio | voice data shown in FIG. 図５に示した音声データが置換レベル「２」で音声変換された後の音声データをテキストで示す図である。It is a figure which shows the audio | voice data after carrying out the audio | voice conversion of the audio | voice data shown in FIG. 5 by substitution level "2" with a text. 変形例における音声変換装置のＣＰＵの機能の概要を示す機能ブロック図である。It is a functional block diagram which shows the outline | summary of the function of CPU of the audio | voice conversion apparatus in a modification. 変形例における音声変換装置で実行される置換処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the replacement process performed with the audio | voice conversion apparatus in a modification.

Explanation of symbols

１１音声入力部、１３音声認識部、１５置換対象語検出部、１７代替語決定部、１９音声合成部、２１同期化部、２３置換レベル入力部、２５置換語リスト生成部、２７置換部、３１置換語リスト、３３音声データ、３５代替語テーブル、３７置換後テキストデータ、１００音声変換装置、１０１ＣＰＵ、１０３ＲＯＭ、１０５ＲＡＭ、１０７ＨＤＤ、１０８ＩＣカード、１０９カードＩ／Ｆ、１１１操作部、１１３表示部、１１５通信Ｉ／Ｆ、１１６音声処理回路、１１７マイク、１１９スピーカ、１２０バス。 DESCRIPTION OF SYMBOLS 11 Speech input part, 13 Speech recognition part, 15 Replacement object word detection part, 17 Alternative word determination part, 19 Speech synthesis part, 21 Synchronization part, 23 Replacement level input part, 25 Replacement word list generation part, 27 Replacement part, 31 Replacement word list, 33 Voice data, 35 Substitution word table, 37 Text data after replacement, 100 Voice conversion device, 101 CPU, 103 ROM, 105 RAM, 107 HDD, 108 IC card, 109 card I / F, 111 Operation unit , 113 display unit, 115 communication I / F, 116 sound processing circuit, 117 microphone, 119 speaker, 120 bus.

Claims

Voice data input means for inputting voice data;
A speech conversion apparatus comprising: replacement means for replacing a word portion including a predetermined replacement target word in the input speech data with alternative speech data corresponding to the word portion.

It further comprises a replacement level input means for inputting a replacement level,
The speech conversion apparatus according to claim 1, wherein the replacement unit includes a replacement target word determination unit that determines the replacement target word according to the input replacement level.
Make different

An alternative word table storage unit that stores an alternative word table that defines an alternative word for each replacement level corresponding to one word of the replacement target word;
The speech conversion apparatus according to claim 2, wherein the replacement means further includes alternative speech data generation means for generating alternative speech data according to the input replacement level using the alternative word table.

Level input means for inputting a replacement level;
An alternative word table storage unit that stores an alternative word table that defines alternative words for each of the replacement levels corresponding to one word of the replacement target word;
The speech conversion apparatus according to claim 1, wherein the replacement unit includes a substitute speech data generation unit that generates substitute speech data according to the input replacement level using the substitute word table.

The replacement means includes voice recognition means for voice recognition of the input voice data and outputting text data;
Replacement target word detecting means for extracting the replacement target word from the text data;
An alternative word table storage means for storing an alternative word table defining alternative words corresponding to the replacement target words;
Alternative word determining means for determining an alternative word corresponding to the extracted word;
Speech synthesis means for synthesizing the determined alternative word to generate alternative speech data;
The speech conversion apparatus according to claim 1, further comprising: speech data replacement means for replacing the word portion of the input speech data with the alternative speech data.

Inputting audio data;
A speech conversion program for causing a computer to execute a step of replacing a word portion including a predetermined replacement target word in the input speech data with alternative speech data corresponding to the word portion.

Inputting audio data;
Replacing a word part including a predetermined replacement target word in the input voice data with alternative voice data corresponding to the word part.