JP6342972B2

JP6342972B2 - Communication system and communication method thereof

Info

Publication number: JP6342972B2
Application number: JP2016222505A
Authority: JP
Inventors: 吉田　大輔; 大輔吉田; 大輔渡邊; 飯島　雅之; 雅之飯島; 正尚平光; 信之鹿又
Original assignee: Hitachi Information and Telecommunication Engineering Ltd
Current assignee: Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2016-11-15
Filing date: 2016-11-15
Publication date: 2018-06-13
Anticipated expiration: 2036-11-15
Also published as: JP2018082269A

Description

本発明は、コミュニケーションシステム及びそのコミュニケーション方法に係り、特に、電話端末により音声を受け付けて音声翻訳を行う用途に用いて好適なコミュニケーションシステム及びそのコミュニケーション方法に関する。 The present invention relates to a communication system and a communication method thereof, and more particularly, to a communication system and a communication method thereof suitable for use in receiving speech from a telephone terminal and performing speech translation.

近年、インターネットなど、コミュニケーションのためのインフラは、地球規模で発達し、コミュニケーションのグローバル化が急速に進んでいる。このような状況の元で、情報処理技術の進展に伴って、自動音声翻訳が注目を集めている。 In recent years, communication infrastructure such as the Internet has been developed on a global scale, and the globalization of communication is rapidly progressing. Under such circumstances, automatic speech translation is attracting attention as information processing technology advances.

自動音声翻訳は、入力された人の音声を、音声認識し、テキスト化し、その入力された言語のテキストを目的言語のテキストに自動翻訳し、さらに、翻訳されたテキストを目的言語の音声に変換して、出力するシステムである。 Automatic speech translation recognizes the voice of the input person, converts it into text, automatically translates the text in the input language into text in the target language, and converts the translated text into speech in the target language And a system that outputs.

このような自動音声翻訳を利用した技術としては、例えば、特許文献１がある。特許文献１には、携帯端末から受け付けた音声データから緊急度及び言語種別を判定し、緊急度が高い場合には各種言語に精通した通訳者を選択し、緊急度が低い場合には機械翻訳サーバに機械翻訳させる技術が開示されている。 As a technique using such automatic speech translation, for example, there is Patent Document 1. In Patent Document 1, the urgency level and language type are determined from voice data received from a mobile terminal. When the urgency level is high, an interpreter who is familiar with various languages is selected. When the urgency level is low, machine translation is performed. A technique for causing a server to perform machine translation is disclosed.

特開２０１６−６６９８３号公報Japanese Patent Laid-Open No. 2006-66983

自動音声翻訳により、音声翻訳をする際には、予め入力言語（原文言語）と出力言語（訳文言語）を指定する必要がある。上記特許文献１のように、スマートフォンなどの携帯端末を使用する場合は、画面をタップすることで言語指定をすることができる。しかしながら、一般の固定電話機の場合は、同じ方法での指定ができないという問題点がある。 When speech translation is performed by automatic speech translation, it is necessary to specify an input language (source language) and an output language (translation language) in advance. When using a mobile terminal such as a smartphone as in Patent Document 1, a language can be specified by tapping the screen. However, in the case of a general fixed telephone, there is a problem that it cannot be specified in the same way.

また、対面音声通訳において、１台の電話機を交互に受け渡しながら、自動音声翻訳システムを利用して、相手の発話を翻訳した音声を聞く応用が考えられる。このような場合には、電話機の受け渡しタイミングのばらつきにより出力音声の冒頭が聞こえないおそれがあるという問題点がある。 Further, in face-to-face speech interpretation, it is conceivable to use an automatic speech translation system to listen to speech translated from the other party's utterance while alternately passing one telephone. In such a case, there is a problem that the beginning of the output voice may not be heard due to variations in the delivery timing of the telephone.

さらに、言語種別の特定に関して言えば、特許文献１記載の通訳サービスシステムにおいては、「英語」「フランス語」などのキーワードを言語種別ごとに用意して、警備員又は外国人がそのキーワードを発話することによって、外国人の発話言語を特定する（段落番号００５９）。 Further, regarding the specification of the language type, in the interpreting service system described in Patent Document 1, a keyword such as “English” or “French” is prepared for each language type, and a security guard or a foreigner speaks the keyword. Thus, the foreign language is specified (paragraph number 0059).

しかしながら、特許文献１記載による言語種別の判定は、迂遠であり、処理系にも負荷がかかり、コミュニケーションのための余計な時間がかかって、対面音声通訳において、１台の電話機を交互に受け渡す応用においては、円滑に受け渡すことができないという問題点がある。 However, the determination of the language type described in Patent Document 1 is a detour, a load is imposed on the processing system, and it takes extra time for communication, so that one telephone is alternately transferred in the face-to-face speech interpretation. In application, there is a problem that it cannot be delivered smoothly.

本発明は、上記問題点を解決するためになされたもので、その目的は、電話機と自動音声翻訳システムが電話回線により、音声をやり取りする場合に、簡単な指定で言語種別の指定が行なえ、対面音声通訳において、１台の電話機を交互に受け渡す場合に、円滑に受け渡しが行なえ、受話器を受け渡された者が適切な音声で聞くことができ、音声の聞き逃しも防止できるコミュニケーションシステム及びそのコミュニケーション方法を提供することにある。 The present invention was made to solve the above problems, and its purpose is to specify a language type with a simple specification when a telephone and an automatic speech translation system exchange voice over a telephone line. In a face-to-face voice interpreter, when a single telephone is handed over alternately, a communication system that allows smooth handing over, allows the handed-over handset to hear the appropriate voice, and prevents missed voices, and It is to provide the communication method.

本発明に係るコミュニケーションシステムは、電話端末とコミュニケーションサーバが電話回線により接続されたコミュニケーションシステムであって、電話端末は、コミュニケーションサーバに通話を送受信する手段と、キー装置からの入力によりＤＴＭＦ（Dual-Tone Multi-Frequency）信号を生成し、コミュニケーションサーバに送信する手段とを有し、コミュニケーションサーバは、第一の言語から第二の言語に音声翻訳をする言語処理部と、電話回線により送信されてくる音声信号に係る第一の言語の音声を、第二の言語に音声翻訳して、電話端末に送信する手段と、話者の言語の音声データと、その音声データを翻訳した翻訳音声の音声データを格納する通話データテーブルとを有するものである。そして、コミュニケーションサーバは、電話端末から第一のＤＴＭＦ信号を受信したときに、他の話者により操作されて送信される第二のＤＴＭＦ信号を受信するまでは、第一のＤＴＭＦ信号受信それ以降に受信する音声信号に係る音声を、第一のＤＴＭＦ信号の表す言語の音声データとして、通話データテーブルに格納し、第二のＤＴＭＦ信号が送信されてきた後に、第一のＤＴＭＦ信号の表す言語の音声データを、第二のＤＴＭＦ信号が表す言語に翻訳した翻訳音声の音声データに係る音声信号を、電話端末に送信する。 A communication system according to the present invention is a communication system in which a telephone terminal and a communication server are connected by a telephone line, and the telephone terminal transmits a call to and from the communication server and inputs from a key device. Tone Multi-Frequency) signal is generated and transmitted to the communication server. The communication server is transmitted by the language processing unit that performs speech translation from the first language to the second language, and transmitted by the telephone line. Means for translating the first language voice related to the incoming voice signal into the second language and transmitting it to the telephone terminal, voice data of the speaker language, and the voice of the translated voice obtained by translating the voice data A call data table for storing data. When the communication server receives the first DTMF signal from the telephone terminal, the communication server receives the first DTMF signal until it receives the second DTMF signal operated and transmitted by another speaker. The voice related to the voice signal received is stored in the call data table as voice data in the language represented by the first DTMF signal, and the language represented by the first DTMF signal is transmitted after the second DTMF signal is transmitted. The speech signal related to the speech data of the translated speech obtained by translating the speech data into the language represented by the second DTMF signal is transmitted to the telephone terminal.

本発明によれば、電話機と自動音声翻訳システムが電話回線により、音声をやり取りする場合に、簡単な指定で言語種別の指定が行なえ、対面音声通訳において、１台の電話機を交互に受け渡す場合に、円滑に受け渡しが行なえ、受話器を受け渡された者が適切な音声で聞くことができ、音声の聞き逃しも防止できるコミュニケーションシステム及びそのコミュニケーション方法を提供することができる。 According to the present invention, when a telephone and an automatic speech translation system exchange voice over a telephone line, the language type can be designated with simple designation, and one telephone is alternately delivered in face-to-face speech interpretation. In addition, it is possible to provide a communication system and a communication method capable of smoothly delivering, allowing a person who has received the handset to hear the sound with an appropriate voice, and preventing the voice from being missed.

コミュニケーションシステムの全体構成図である。1 is an overall configuration diagram of a communication system. 電話端末の機能構成図である。It is a functional block diagram of a telephone terminal. コミュニケーションサーバの構成図である。It is a block diagram of a communication server. コミュニケーション制御部が認識する状態を説明する図である。It is a figure explaining the state which a communication control part recognizes. コミュニケーション制御部が受け付けるイベントを説明する図である。It is a figure explaining the event which a communication control part receives. イベントによる状態遷移を状態遷移図である。It is a state transition diagram showing the state transition due to an event. 状態と発生するイベントによる処理マトリックスを示す図である。It is a figure which shows the process matrix by a state and the event which generate | occur | produces. ボタン対応テーブル３１０の仕様を説明する図である。It is a figure explaining the specification of the button corresponding | compatible table. 通話状態テーブル３２０の仕様を説明する図である。It is a figure explaining the specification of the call state table. 通話データテーブル３３０の仕様を説明する図である。It is a figure explaining the specification of the call data table. コミュニケーションシステムの各々のコンポーネント間での受け渡しと、システムの状態を示す概要シーケンス図である（その一）。FIG. 2 is a schematic sequence diagram showing delivery between each component of a communication system and a state of the system (part 1). コミュニケーションシステムの各々のコンポーネント間での受け渡しと、システムの状態を示す概要シーケンス図である（その二）。It is a general | schematic sequence diagram which shows the delivery between each component of a communication system, and the state of a system (the 2). コミュニケーションサーバの処理を示す概要フローチャートである。It is a general | schematic flowchart which shows the process of a communication server. 音声解析処理を示すフローチャートである。It is a flowchart which shows an audio | voice analysis process. リピート処理を示すフローチャートである。It is a flowchart which shows a repeat process.

以下、本発明に係る各実施形態を、図１ないし図１３を用いて説明する。 Embodiments according to the present invention will be described below with reference to FIGS.

先ず、図１及び図３を用いて、本発明の実施形態に係るコミュニケーションシステムの構成について説明する。
図１は、コミュニケーションシステムの全体構成図である。
図２は、電話端末の機能構成図である。
図３は、コミュニケーションサーバの構成図である。 First, the configuration of a communication system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 3.
FIG. 1 is an overall configuration diagram of a communication system.
FIG. 2 is a functional configuration diagram of the telephone terminal.
FIG. 3 is a configuration diagram of the communication server.

本実施形態のコミュニケーションシステムは、図１に示されるように、電話端末１０とコミュニケーションサーバ１００が、電話回線５により接続された形態である。 As shown in FIG. 1, the communication system of the present embodiment is a form in which a telephone terminal 10 and a communication server 100 are connected by a telephone line 5.

電話回線５は、公衆回線でもよいし、企業内のＰＢＸ（Private Branch eXchange）により回線交換される構内回線でもよい。また、アナログ回線でもよいし、デジタル回線でもよい。 The telephone line 5 may be a public line or a private line that is switched by a private branch exchange (PBX) in a company. Also, an analog line or a digital line may be used.

電話端末１０は、ＤＴＭＦ（Dual-Tone Multi-Frequency）信号（いわゆるプッシュ信号）を発信可能な電話機である必要がある。ＤＴＭＦ信号とは、０から９までの数字と、＊、＃、Ａ、Ｂ、Ｃ、Ｄの記号の計１６種類の符号を、低群・高群の二つの音声周波数帯域の合成信号音で送信する信号である。また、電話端末１０に接続されている回線は、ＤＴＭＦ信号を送信可能な回線とする。 The telephone terminal 10 needs to be a telephone capable of transmitting a DTMF (Dual-Tone Multi-Frequency) signal (so-called push signal). A DTMF signal is a composite signal sound of two voice frequency bands of low and high groups, consisting of numbers from 0 to 9 and symbols of *, #, A, B, C, and D in total. A signal to be transmitted. The line connected to the telephone terminal 10 is a line that can transmit a DTMF signal.

電話端末１０は、図２に示されるように、受話器１１、送話器１２、キー装置１３、音声信号変換部１４、接続制御部１５、ＤＴＭ信号生成部１６からなる。受話器１１、送話器１２は、それぞれ受話のためのスピーカ、送話のためのマイクである。音声信号変換部１４は、電話回線からの電気信号を音声に変換し、送話器１２から音声を電気信号に変換する部分である。接続制御部１５は、着呼、発呼を制御して、電話端末１０と回線の接続、切断を行う部分である。ＤＴＭ信号生成部１６は、キー装置１３の入力にしたがって、所定の周波数のＤＴＭ信号を生成する装置である。 As shown in FIG. 2, the telephone terminal 10 includes a receiver 11, a transmitter 12, a key device 13, an audio signal converter 14, a connection controller 15, and a DTM signal generator 16. The receiver 11 and the transmitter 12 are a speaker for receiving and a microphone for transmitting, respectively. The voice signal conversion unit 14 is a part that converts an electrical signal from the telephone line into voice and converts the voice from the transmitter 12 into an electrical signal. The connection control unit 15 is a part that controls incoming / outgoing calls to connect / disconnect the line to / from the telephone terminal 10. The DTM signal generator 16 is a device that generates a DTM signal having a predetermined frequency in accordance with an input from the key device 13.

電話端末１０を利用する者は、キー装置１３から言語ごとに定められた規約のボタンにしたがって、自分の言語種別を入力できる。例えば、［＃］［１］は、日本語、［＃］［２］は、英語、［＃］［３］は、中国語のごとくである。電話端末１０は、それをＤＴＭ信号に変換し、電話端末１０の利用者の言語を判定するための信号として、電話回線５を介して、コミュニケーションサーバ１００に送信する。 A person who uses the telephone terminal 10 can input his / her language type from the key device 13 according to a button of a rule defined for each language. For example, [#] [1] is Japanese, [#] [2] is English, and [#] [3] is Chinese. The telephone terminal 10 converts it into a DTM signal and transmits it to the communication server 100 via the telephone line 5 as a signal for determining the language of the user of the telephone terminal 10.

コミュニケーションサーバ１００は、電話回線５を介して送信される音声信号をデコードして、音声に変換し、それを指定された言語（以下、「翻訳言語」という）に係る音声に翻訳し（以下、翻訳された音声を「翻訳音声」という）、その音声をエンコードして、電話回線５を介して電話端末１０に送信する装置である。 The communication server 100 decodes a voice signal transmitted via the telephone line 5, converts it into voice, and translates it into voice according to a designated language (hereinafter referred to as “translation language”) (hereinafter referred to as “translation language”). This is a device that encodes the translated speech and transmits it to the telephone terminal 10 via the telephone line 5.

コミュニケーションサーバ１００は、図３に示されるように、受信制御部１１０、送信制御部１２０、コミュニケーション制御部１５０、データベースアクセス部１６０、言語処理部２００、データベース３００からなる。 As illustrated in FIG. 3, the communication server 100 includes a reception control unit 110, a transmission control unit 120, a communication control unit 150, a database access unit 160, a language processing unit 200, and a database 300.

受信制御部１１０は、図３に示されるように、デコーダ１１１、制御信号解析部１１２、ＤＴＭＦ信号解析部１１３、音声解析部１１４、通話データ出力部１１５からなり、受信時の音声信号、ＤＴＭＦ信号の判別とデコード、受信した情報のデータベースへの書込みを行う部分である。受信制御部１１０は、電話端末１０からの制御信号、音声信号、又は、ＤＴＭＦ信号を電話回線を介して受信し、デコーダ１１１によりデコードする。そして、制御信号かＤＴＭＦ信号か、あるいは、音声かを判別し、制御信号の場合は、呼の呼び出しを認識して、通話データ出力部１１５に報告する。ＤＴＭＦ信号の場合は、ＤＴＭＦ信号解析部１１３が、ＤＴＭＦ信号に含まれている周波数を解析し、どのボタンが電話端末１０で押下されたかの認識を行ない、通話データ出力部１１５に報告する。音声の場合は、音声解析部１１４が、一文を判断して、その音声データを、通話データ出力部１１５に報告する。 As shown in FIG. 3, the reception control unit 110 includes a decoder 111, a control signal analysis unit 112, a DTMF signal analysis unit 113, a voice analysis unit 114, and a call data output unit 115. The reception voice signal and DTMF signal are received. This is the part that determines and decodes and writes the received information to the database. The reception control unit 110 receives a control signal, a voice signal, or a DTMF signal from the telephone terminal 10 via a telephone line and decodes it by the decoder 111. Then, it is discriminated whether it is a control signal, a DTMF signal, or a voice, and if it is a control signal, it recognizes the call of the call and reports it to the call data output unit 115. In the case of a DTMF signal, the DTMF signal analysis unit 113 analyzes the frequency included in the DTMF signal, recognizes which button is pressed on the telephone terminal 10, and reports it to the call data output unit 115. In the case of voice, the voice analysis unit 114 determines one sentence and reports the voice data to the call data output unit 115.

通話データ出力部１１５は、新たな呼の呼び出しがあったときには、新たな呼ＩＤを生成する。また、ボタン対応テーブル３１０を参照して、押されたボタンに対応する話者の言語ＩＤを取得する。そして、音声解析部１１４から音声が出力されたときには、呼ごとに異なる呼ＩＤ、話者が切り替わるごとに更新されるグループＩＤ、話者発話の一文ごとに更新されるシーケンシャルＩＤ、話者の音声かテキストか翻訳した音声かテキストかを示すフラグである種別、音声の言語を表す言語ＩＤの情報を付加したデータを作成し、データベースアクセス部１６０を介して、通話データテーブル３３０に保存する。また、一文終了後、次の一文を受け付けるためにシーケンシャルＩＤを更新する。呼ＩＤ、グループＩＤ、種別、言語ＩＤ、シーケンシャルＩＤについては、後に、通話データテーブル３３０の説明の所でも説明する。 The call data output unit 115 generates a new call ID when a new call is called. Further, referring to the button correspondence table 310, the language ID of the speaker corresponding to the pressed button is acquired. When voice is output from the voice analysis unit 114, a different call ID for each call, a group ID that is updated each time the speaker is switched, a sequential ID that is updated for each sentence of the speaker utterance, and the voice of the speaker Then, data is added to which is added a type ID that is a flag indicating whether it is text, translated speech, or text, and language ID information that represents the language of the speech, and is stored in the call data table 330 via the database access unit 160. Also, after the end of one sentence, the sequential ID is updated to accept the next sentence. The call ID, group ID, type, language ID, and sequential ID will be described later in the description of the call data table 330.

送信制御部１２０は、図３に示されるように、エンコーダ１２１、通話データ入力部１２２からなり、翻訳された音声を音声信号として、電話端末１０に送り返すための制御をする部分である。送信制御部１２０は、コミュニケーション制御部１５０の指示にしたがって、通話データテーブル３３０から音声データを取り出し、電話端末１０に送り返す制御を行う。すなわち、コミュニケーション制御部１５０が指示するタイミングで、送信制御部１２０の通話データ入力部１２２は、指定された言語の音声データのうち、未送信の音声データをグループＩＤ、シーケンシャルＩＤなどの情報に基づいて、データベースアクセス部１６０を介して通話データテーブル３３０から取得し、エンコーダ１２１よりエンコードして、電話端末１０に送信する。 As shown in FIG. 3, the transmission control unit 120 includes an encoder 121 and a call data input unit 122, and is a part that performs control for returning the translated voice to the telephone terminal 10 as a voice signal. The transmission control unit 120 performs control to extract voice data from the call data table 330 and send it back to the telephone terminal 10 in accordance with an instruction from the communication control unit 150. That is, at the timing instructed by the communication control unit 150, the call data input unit 122 of the transmission control unit 120 sets unsent voice data among the voice data in the specified language based on information such as a group ID and a sequential ID. The data is acquired from the call data table 330 via the database access unit 160, encoded by the encoder 121, and transmitted to the telephone terminal 10.

コミュニケーション制御部１５０は、受信制御部１１０から送られてくるＤＴＭＦ信号の解析情報と通話データテーブル３３０に格納された状態にしたがって、言語処理部２００、送信制御部１２０に指示を与える部分である。 The communication control unit 150 is a part that gives instructions to the language processing unit 200 and the transmission control unit 120 according to the analysis information of the DTMF signal sent from the reception control unit 110 and the state stored in the call data table 330.

言語処理部２００は、図３に示されるように、音声認識部２１０、翻訳エンジン２２０、音声合成部２３０からなり、音声を入力して、指定された言語にしたがって、翻訳し、翻訳言語の音声として、出力する部分である。音声認識部２１０は、指定された言語種別を認識して、テキスト化する。翻訳エンジン２２０では、翻訳辞書に基づいて、ある言語（例えば、日本語）を他の言語（例えば、英語）に翻訳する。音声合成部２３０は、翻訳言語のテキストを音声データに変換し、一つの読み取れる音声データとして出力する。 As shown in FIG. 3, the language processing unit 200 includes a speech recognition unit 210, a translation engine 220, and a speech synthesis unit 230. The language processing unit 200 inputs speech, translates it according to a designated language, and translates the speech of the translated language. As the output part. The voice recognition unit 210 recognizes the designated language type and converts it into text. The translation engine 220 translates a language (for example, Japanese) into another language (for example, English) based on the translation dictionary. The voice synthesizer 230 converts the text in the translation language into voice data and outputs it as one readable voice data.

データベースアクセス部１６０は、他のコンポーネントからのデータベース３００の読み出し、書込みの機能を提供する部分である。 The database access unit 160 is a part that provides functions for reading and writing the database 300 from other components.

データベース３００は、ボタン対応テーブル３１０、通話状態テーブル３２０、通話データテーブル３３０を保持している。なお、各々のテーブルについては、後に詳説する。 The database 300 holds a button correspondence table 310, a call state table 320, and a call data table 330. Each table will be described in detail later.

コミュニケーションサーバ１００の各々の機能は、ＦＰＧＡ（field-programmable gate array）のようなハードウェアロジックで実装してもよいし、メモリ上にロードされ、ＯＳ上で動作するプログラムとして、汎用のＣＰＵ（Central Processing Unit）がそのプログラムを実行することにより機能が実現されるものであってもよい。 Each function of the communication server 100 may be implemented by hardware logic such as a field-programmable gate array (FPGA), or a general-purpose CPU (Central CPU) as a program loaded on a memory and operating on the OS. The function may be realized by the processing unit) executing the program.

次に、図４Ａないし図６を用いて、コミュニケーションサーバ１００上で扱う状態とイベント、及び、その関係について説明する。
図４Ａは、コミュニケーション制御部が認識する状態を説明する図である。
図４Ｂは、コミュニケーション制御部が受け付けるイベントを説明する図である。
図５は、イベントによる状態遷移を示す状態遷移図である。
図６は、状態と発生するイベントによる処理マトリックスを示す図である。 Next, states and events handled on the communication server 100, and the relationship between them will be described with reference to FIGS. 4A to 6.
FIG. 4A is a diagram illustrating a state recognized by the communication control unit.
FIG. 4B is a diagram illustrating an event received by the communication control unit.
FIG. 5 is a state transition diagram showing state transition by event.
FIG. 6 is a diagram showing a processing matrix according to states and events that occur.

コミュニケーション制御部１５０は、「言語未選択」、「受付」、「翻訳中」、「翻訳音声送信中」の四つの状態を認識する。各々の状態の意味は、図４Ａに示す如くである。なお、後の図では、状態を説明するのに、この番号を用いることにする。 The communication control unit 150 recognizes four states: “language not selected”, “acceptance”, “under translation”, and “translated speech transmission”. The meaning of each state is as shown in FIG. 4A. In the following figures, this number will be used to describe the state.

また、コミュニケーション制御部１５０は、「ＤＴＭＦ信号（言語選択）」、「音声」、「翻訳完了」、「翻訳音声送信終了」、「ＤＴＭＦ信号（リピート再生）」の五つのイベントを受け付ける。各々のイベントの意味は、図４Ｂに示すごとくである。 The communication control unit 150 accepts five events of “DTMF signal (language selection)”, “voice”, “translation completion”, “translation voice transmission end”, and “DTMF signal (repeat playback)”. The meaning of each event is as shown in FIG. 4B.

上で説明した状態は、イベントの発生により遷移する。図５は、その状態とそのときに発生する主要なものを示したものである。例えば、「２：受付」の状態ときに、「ＤＴＭＦ信号（言語選択）」イベントが発生したときには、「３：翻訳中」に遷移し、「ＤＴＭＦ信号（リピート）」イベントが発生したときには、「４：翻訳音声送信中」に遷移し、「音声」イベントが発生したときには、「２：受付」の状態にとどまることを意味している。 The state described above transitions when an event occurs. FIG. 5 shows the state and main things that occur at that time. For example, when a “DTMF signal (language selection)” event occurs in the “2: reception” state, the state transitions to “3: Under translation”, and when a “DTMF signal (repeat)” event occurs, When “4: Translated voice transmission is in progress” and a “voice” event occurs, it means that the state remains “2: reception”.

また、図６に示される状態と発生するイベントによる処理マトリックスは、状態をカラム、イベントをロウで表現したマトリックスであり、カラムで表現した状態のときに、ロウで表現したイベントが発生したときには、その交点にあたる部分が適用されることを示している。交点の要素は、「状態（付随処理）」のように表現されており、カラムで表現した状態のときに、ロウで表現したイベントが発生したときに、その状態に遷移し、付随処理がその遷移にしたがって開始又は継続されることを示している。 6 is a matrix in which the state is represented by a column and the event is represented by a row. When an event represented by a row occurs in a state represented by a column, It shows that the portion corresponding to the intersection is applied. The element at the intersection is expressed as “state (accompanying process)”. When an event expressed in row occurs in the state expressed in the column, the state transitions to that state, and the accompanying process It shows that it starts or continues according to the transition.

例えば、「２：受付」の状態ときに、「ＤＴＭＦ信号（言語選択）」イベントが発生したときには、「３：翻訳中」に遷移し、「翻訳開始」処理がされ、「ＤＴＭＦ信号（リピート）」イベントが発生したときには、「４：翻訳音声送信中」に遷移し、「音声送信」が開始され、「音声」イベントが発生したときには、「２：受付」の状態にとどまり、「音声データ蓄積」処理が継続されることを示している。 For example, when a “DTMF signal (language selection)” event occurs in the “2: reception” state, the process transits to “3: translation in progress”, a “translation start” process is performed, and a “DTMF signal (repeat)” When an “event” occurs, a transition is made to “4: Translation voice transmission in progress”, “Voice transmission” is started, and when a “Voice” event occurs, the state remains “2: Accept”, and “Voice data storage” ”Indicates that the process is continued.

なお、図６の処理マトリックスでは、状態遷移図に示さなかった例外的な状態とイベントの関係も示されているが、後の処理の説明では、主に、図５の状態遷移図に示された状態とそのときに発生するイベントの例を取り上げることにする。 In the processing matrix of FIG. 6, the relationship between exceptional states and events not shown in the state transition diagram is also shown. However, in the description of the subsequent processing, it is mainly shown in the state transition diagram of FIG. Let's take a look at examples of events and events that occur.

次に、図７ないし図９を用いてコミュニケーションシステムで用いられるデータ構造について説明する。
図７は、ボタン対応テーブル３１０の仕様を説明する図である。
図８は、通話状態テーブル３２０の仕様を説明する図である。
図９は、通話データテーブル３３０の仕様を説明する図である。 Next, a data structure used in the communication system will be described with reference to FIGS.
FIG. 7 is a diagram for explaining the specifications of the button correspondence table 310.
FIG. 8 is a diagram for explaining the specifications of the call state table 320.
FIG. 9 is a diagram for explaining the specifications of the call data table 330.

ボタン対応テーブル３１０は、図７に示されるように、ボタン♯１、言語又は機能♯２、言語ＩＤ♯３のフィールドを有し、ＤＴＭＦ信号より割り出されたユーザが押下したボタンと各種情報を結びつけるテーブルである。ボタン♯１のフィールドは、ＤＴＭＦ信号を解析して得られたユーザがＤＴＭＦ信号発生の際に、押下したボタンを格納する。言語又は機能♯２のフィールドは、ボタン♯１の値に対応する言語又は機能を格納する。例えば、［♯］［１］は、日本語、［♯］［＊］は、リピートの如くである。言語ＩＤ♯３のフィールドは、各言語に対応する言語ＩＤを格納する。 As shown in FIG. 7, the button correspondence table 310 has buttons # 1, language or function # 2, and language ID # 3. The button corresponding to the button pressed by the user determined from the DTMF signal and various information are displayed. It is a table to tie. The field of the button # 1 stores a button pressed by the user when the DTMF signal is generated by analyzing the DTMF signal. The language or function # 2 field stores the language or function corresponding to the value of the button # 1. For example, [#] [1] is Japanese, and [#] [*] is repeat. The language ID # 3 field stores a language ID corresponding to each language.

通話状態テーブル３２０は、図８に示されるように、呼ＩＤ♯１、状態♯２、言語ＩＤ♯３のフィールドを有し、コミュニケーション制御部１５０が参照する処理の状態を格納するテーブルである。呼ＩＤ♯１のフィールドは、呼ごとに一意的に付与される呼の識別子を格納する。状態♯２のフィールドは、呼ＩＤにより識別される呼の現在の状態を表す識別子を格納する。状態の意味は、図４Ａで説明した通りである。言語ＩＤ♯３のフィールドは、現在選択されている言語の言語ＩＤを格納する。 As shown in FIG. 8, the call state table 320 has fields for call ID # 1, state # 2, and language ID # 3, and is a table that stores the state of processing referred to by the communication control unit 150. The call ID # 1 field stores a call identifier uniquely given to each call. The field of state # 2 stores an identifier representing the current state of the call identified by the call ID. The meaning of the state is as described in FIG. 4A. The language ID # 3 field stores the language ID of the currently selected language.

通話データテーブル３３０は、図９に示されるように、呼ＩＤ♯１のフィールドと、ｍ（ｍは、０以上の整数）個のテキスト♯１０、ｍ個の音声♯２０の構造体を有し、呼ＩＤごとに、通話に関する情報を格納するテーブルである。呼ＩＤ♯１のフィールドは、呼ごとに一意的に付与される呼の識別子を格納する。 As shown in FIG. 9, the call data table 330 has a field of call ID # 1, m (m is an integer of 0 or more) text # 10, and m speech # 20 structures. This is a table for storing information related to a call for each call ID. The call ID # 1 field stores a call identifier uniquely given to each call.

テキスト♯１０の構造体は、種別♯１１、グループＩＤ♯１２、言語ＩＤ♯１３、シーケンシャルＩＤ♯１４、テキストデータ♯１５のメンバを有し、話者の認識テキスト、その翻訳後のテキストに関する情報を格納するものである。種別♯１１は、そのテキストが話者の認識テキスト、その翻訳後のテキストかの別を示す識別子が格納する。グループＩＤ♯１２は、話者の交代ごとに一意的に付与されるグループＩＤを格納する。言語ＩＤ♯１３は、そのテキストの言語の言語ＩＤを格納する。シーケンシャルＩＤ♯１４は、音声データの分割単位ごとにシーケンシャルに付与されるシーケンシャルＩＤを格納する。テキストデータ♯１５は、テキストのコードデータを格納する。 The structure of text # 10 has members of type # 11, group ID # 12, language ID # 13, sequential ID # 14, and text data # 15, and information regarding the recognized text of the speaker and the translated text. Is stored. Type # 11 stores an identifier indicating whether the text is the speaker's recognized text or the translated text. The group ID # 12 stores a group ID that is uniquely assigned for each change of speaker. Language ID # 13 stores the language ID of the language of the text. Sequential ID # 14 stores a sequential ID assigned sequentially for each unit of audio data division. Text data # 15 stores text code data.

音声♯２０の構造体は、グループＩＤ♯２１、種別♯２２、言語ＩＤ♯２３、シーケンシャルＩＤ♯２４、音声データ♯２５のメンバを有し、話者の音声データ、その翻訳後の合成音声のデータに関する情報を格納するものである。 The structure of the voice # 20 has members of a group ID # 21, a type # 22, a language ID # 23, a sequential ID # 24, and a voice data # 25. Stores information about data.

グループＩＤ♯２１、種別♯２２、言語ＩＤ♯２３、シーケンシャルＩＤ♯２４、テキストデータ♯２５の内容は、それぞれ、テキストデータ♯１０の構造体のグループＩＤ♯１１、種別♯１２、言語ＩＤ♯１３、シーケンシャルＩＤ♯１４と同様である。音声データ♯２５は、音声データのコードデータを格納する。 The contents of group ID # 21, type # 22, language ID # 23, sequential ID # 24, and text data # 25 are the group ID # 11, type # 12, and language ID # 13 of the structure of text data # 10, respectively. This is the same as Sequential ID # 14. Audio data # 25 stores code data of audio data.

次に、図１０Ａ、図１０Ｂを用いて、コミュニケーションシステムの概要動作について説明する。
図１０Ａ、図１０Ｂは、コミュニケーションシステムの各々のコンポーネント間での受け渡しと、システムの状態を示す概要シーケンス図である。 Next, an outline operation of the communication system will be described with reference to FIGS. 10A and 10B.
10A and 10B are schematic sequence diagrams showing delivery between components of the communication system and the system status.

先ず、話者Ａ（日本語話者）（ＳＰ１）が、電話端末１０のボタン（［＃］［１］）を押し、コミュニケーションサーバ１００側にＤＴＭＦ信号を送信する（Ａ０１）。このときの状態は、言語未選択状態（状態＝１）であり、通話状態テーブル３２０の値は、呼ＩＤ＝１、状態＝１、言語ＩＤ＝０（Ｔ０１）である（Ｓ０１、Ｔ０１）。なお、ここでは、Ａ０１の前に、電話端末１０と、コミュニケーションサーバ１００の呼は、接続されており、既に、呼ＩＤが割振られたものとしている。また、通話状態テーブル３２０と通話データテーブル３３０の値は、説明に必要なもののみピックアップして、図示することにする。 First, the speaker A (Japanese speaker) (SP1) presses the button ([#] [1]) on the telephone terminal 10, and transmits a DTMF signal to the communication server 100 side (A01). The state at this time is a language non-selected state (state = 1), and the values of the call state table 320 are call ID = 1, state = 1, language ID = 0 (T01) (S01, T01). Here, it is assumed that the call between the telephone terminal 10 and the communication server 100 is connected before A01, and the call ID has already been allocated. Further, the values of the call state table 320 and the call data table 330 are shown only by picking up values necessary for explanation.

コミュニケーションサーバ１００のコミュニケーション制御部１５０は、ＤＴＭＦ信号を受けて、状態と言語を更新する（Ａ２０、（状態＝２（受付）、言語ＩＤ＝１（日本語）：Ｔ０２））。 The communication control unit 150 of the communication server 100 receives the DTMF signal and updates the state and language (A20, (state = 2 (acceptance), language ID = 1 (Japanese): T02)).

システムの状態は、ＤＴＭＦ信号（言語選択）イベントを受けて、受付（状態＝２）に遷移する（Ｓ０２）。
次に、電話端末１０から話者Ａの音声データ（日本語）が送信されてきたものとする（Ａ０２）。このとき、通話データテーブル３３０の音声♯２０の構造体に、値が設定される（Ａ２１、グループＩＤ＝１、種別＝０（話者）、言語ＩＤ＝１、シーケンシャルＩＤ＝１、音声データ：Ｔ０３）。 In response to the DTMF signal (language selection) event, the system state transitions to reception (state = 2) (S02).
Next, it is assumed that voice data (Japanese) of the speaker A is transmitted from the telephone terminal 10 (A02). At this time, values are set in the structure of the voice # 20 in the call data table 330 (A21, group ID = 1, type = 0 (speaker), language ID = 1, sequential ID = 1, voice data: T03).

次に、日本語の話者Ａから英語の話者Ｂに電話が受け渡されたものとする（ＳＰ１→ＳＰ２）。 Next, it is assumed that a call is handed over from Japanese speaker A to English speaker B (SP1 → SP2).

そして、話者Ｂ（英語話者）（ＳＰ２）が、電話端末１０のボタン（［＃］［２］）を押し、コミュニケーションサーバ１００側にＤＴＭＦ信号を送信する（Ａ０３）。 Then, the speaker B (English speaker) (SP2) presses the button ([#] [2]) on the telephone terminal 10, and transmits a DTMF signal to the communication server 100 side (A03).

コミュニケーションサーバ１００のコミュニケーション制御部１５０は、ＤＴＭＦ信号を受けて、状態と言語を更新する（Ａ２２、（状態＝３（翻訳中）、言語ＩＤ＝２（英語）：Ｔ０４））。 Upon receiving the DTMF signal, the communication control unit 150 of the communication server 100 updates the state and language (A22, (state = 3 (under translation), language ID = 2 (English): T04)).

そして、日本語から英語の翻訳が開始され、翻訳中状態（状態＝３）になる（Ｓ０３）。 Then, translation from Japanese into English is started, and a translation state (state = 3) is entered (S03).

言語処理部２００は、コミュニケーション制御部１５０からの指示を受け、通話データテーブル３３０の音声データを読み込み、翻訳して、翻訳音声を新しい構造体データとして書き込み、翻訳が完了すると、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を翻訳音声送信中（状態＝４）に書き換える（Ａ２３、Ａ２４、Ｔ０５、Ｓ０４）。
次に、送信制御部１２０は、翻訳された音声を取り出して（Ａ２５）、話者Ａの音声の翻訳結果（日本語→英語）として、電話端末１０に送信する（Ａ０４）。 Upon receiving an instruction from the communication control unit 150, the language processing unit 200 reads and translates the speech data in the call data table 330, writes the translated speech as new structure data, and when the translation is completed, the communication control unit 150 Then, the state of the call state table 320 is rewritten to “translated voice transmission (state = 4)” (A23, A24, T05, S04).
Next, the transmission control unit 120 extracts the translated speech (A25), and transmits it to the telephone terminal 10 as a translation result (Japanese → English) of the speech of the speaker A (A04).

そして、翻訳した音声の送信が完了すると、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を、受付状態（状態＝２）にする（Ａ２６、Ｓ０５）。 When the transmission of the translated voice is completed, the communication control unit 150 sets the state of the call state table 320 to the acceptance state (state = 2) (A26, S05).

ここで、話者Ｂが、電話により伝達された音声を聞きもらした、あるいは、理解しがたいなどと感じて、もう一度聞きたいという意思をもったとする。このときには、話者Ｂは、電話端末１０のキー装置１３を操作して、リピートを指示するボタン（［＃］［＊］）を押下する。これにより、電話端末１０からコミュニケーションサーバ１００に、リピート再生を意味するＤＴＭＦ信号が伝えられる（Ａ０５）。 Here, it is assumed that the speaker B hears the voice transmitted through the telephone or feels that it is difficult to understand and has an intention to listen again. At this time, the speaker B operates the key device 13 of the telephone terminal 10 and presses a button ([#] [*]) instructing repeat. Thereby, the DTMF signal meaning repeat reproduction is transmitted from the telephone terminal 10 to the communication server 100 (A05).

そして、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を、翻訳音声送信中状態（状態＝４）にする（Ａ２７、Ｓ０６）。 Then, the communication control unit 150 changes the state of the call state table 320 to a state in which translated speech is being transmitted (state = 4) (A27, S06).

次に、送信制御部１２０は、コミュニケーション制御部１５０の指示にしたがい、翻訳された音声データを取り出して（Ａ２８）、話者Ａの音声の翻訳結果（日本語→英語）として、再度、電話端末１０に送信する（Ａ０６）。 Next, according to the instruction of the communication control unit 150, the transmission control unit 120 extracts the translated voice data (A28), and again transmits the telephone terminal as a translation result (Japanese → English) of the speaker A. 10 (A06).

そして、２回目の翻訳した音声の送信が完了すると、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を、受付状態（状態＝２）にする（Ａ２９、Ｓ０７）。 When the second transmission of the translated voice is completed, the communication control unit 150 sets the state of the call state table 320 to the reception state (state = 2) (A29, S07).

次に、話者Ｂが話して、電話端末１０から音声（英語）が伝えられたものとする（Ａ０７、Ａ０８）。 Next, it is assumed that the speaker B speaks and voice (English) is transmitted from the telephone terminal 10 (A07, A08).

それにより、順次、通話データテーブル３３０に、音声データが書き込まれる（Ａ３０、Ｔ０９、Ａ３１、Ｔ１０）。 Thereby, the voice data is sequentially written in the call data table 330 (A30, T09, A31, T10).

次に、英語の話者Ｂから日本語の話者Ａに電話が受け渡されたものとする（ＳＰ２→ＳＰ３）。 Next, it is assumed that the telephone is delivered from the English speaker B to the Japanese speaker A (SP2 → SP3).

そして、話者Ａ（日本語話者）（ＳＰ３）が、電話端末１０のボタン（［＃］［１］）を押し、コミュニケーションサーバ１００側にＤＴＭＦ信号を送信する（図１０ＢのＡ０９）。 Then, the speaker A (Japanese speaker) (SP3) presses the button ([#] [1]) on the telephone terminal 10, and transmits a DTMF signal to the communication server 100 side (A09 in FIG. 10B).

コミュニケーションサーバ１００のコミュニケーション制御部１５０は、ＤＴＭＦ信号を受けて、状態と言語を更新する（Ａ３２、（状態＝３（翻訳中）、言語ＩＤ＝１（日本語）：Ｔ１１））。 The communication control unit 150 of the communication server 100 receives the DTMF signal and updates the state and language (A32, (state = 3 (under translation), language ID = 1 (Japanese): T11)).

そして、英語から日本語の翻訳が開始され、翻訳中状態（状態＝３）になる（Ｓ０８）。 Then, translation from English into Japanese is started, and a translation state (state = 3) is entered (S08).

言語処理部２００は、コミュニケーション制御部１５０からの指示を受け、通話データテーブル３３０の音声データを読み込み、翻訳して、翻訳音声を新しい構造体データとして書き込み、翻訳が完了すると、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を翻訳音声送信中（状態＝４）に書き換える（Ａ３３、Ａ３４、Ｔ１２、Ｓ０９）。
次に、送信制御部１２０は、翻訳された音声データを、順次取り出して（Ａ３５）、話者Ｂの音声の翻訳結果（英語→日本語）として、電話端末１０に送信する（Ａ１０、Ａ１１）。 Upon receiving an instruction from the communication control unit 150, the language processing unit 200 reads and translates the speech data in the call data table 330, writes the translated speech as new structure data, and when the translation is completed, the communication control unit 150 Then, the state of the call state table 320 is rewritten to “translated voice transmission” (state = 4) (A33, A34, T12, S09).
Next, the transmission control unit 120 sequentially extracts the translated voice data (A35) and transmits it to the telephone terminal 10 as a translation result (English → Japanese) of the voice of the speaker B (A10, A11). .

そして、翻訳した音声の送信が完了すると、コミュニケーション制御部１５０は、通話状態テーブル３２０の状態を、受付状態（状態＝２）にする（Ａ３６、Ｓ１０）。 When transmission of the translated voice is completed, the communication control unit 150 sets the state of the call state table 320 to the reception state (state = 2) (A36, S10).

次に、話者Ａが話して、電話端末１０から音声（日本語）が伝えられたものとする（Ａ１２）。 Next, it is assumed that the speaker A speaks and voice (Japanese) is transmitted from the telephone terminal 10 (A12).

それにより、通話データテーブル３３０に、音声データが書き込まれる（Ａ３７、Ｔ１４）。 Thereby, voice data is written in the call data table 330 (A37, T14).

以下は、通話終了まで同様のシークエンスが繰り返される。 Thereafter, the same sequence is repeated until the end of the call.

次に、図１１ないし図１３を用いて、コミュニケーションシステムの処理について説明する。
図１１は、コミュニケーションサーバの処理を示す概要フローチャートである。
図１２は、音声解析処理を示すフローチャートである。
図１３は、リピート処理を示すフローチャートである。 Next, processing of the communication system will be described with reference to FIGS. 11 to 13.
FIG. 11 is a schematic flowchart showing processing of the communication server.
FIG. 12 is a flowchart showing the voice analysis process.
FIG. 13 is a flowchart showing the repeat process.

先ず、コミュニケーションサーバ１００は、電話端末１０から電話回線５を介して信号を受け、受信制御部１１０のデコーダ１１１は、送信された信号を解析し（Ｓ１００）、制御信号か、ＤＴＭＦ信号か、音声信号かを判別し、その結果にしたがって、制御信号解析部１１２、ＤＴＭＦ信号解析部１１３、音声解析部１１４に振り分ける（Ｓ１０１）。 First, the communication server 100 receives a signal from the telephone terminal 10 via the telephone line 5, and the decoder 111 of the reception control unit 110 analyzes the transmitted signal (S100), and determines whether the signal is a control signal, a DTMF signal, or a voice. It is determined whether the signal is a signal, and is distributed to the control signal analysis unit 112, the DTMF signal analysis unit 113, and the voice analysis unit 114 according to the result (S101).

制御信号が、発呼信号のときには、受信制御部１１０の制御信号解析部１１２は、通話データ出力部１１５に連絡し、呼ＩＤを設定する（Ｓ１３０）。 When the control signal is a call signal, the control signal analysis unit 112 of the reception control unit 110 contacts the call data output unit 115 and sets a call ID (S130).

制御信号が、ＤＴＭＦ信号のときには、受信制御部１１０のＤＴＭＦ信号解析部１１３は、その解析情報を通信データ出力部１１５に連絡し、通信データ出力部１１５は、ＤＴＭＦ信号が、ボタン対応テーブル３１０により必要な言語、機能の情報を取り出す（Ｓ１０２）。 When the control signal is a DTMF signal, the DTMF signal analysis unit 113 of the reception control unit 110 communicates the analysis information to the communication data output unit 115, and the communication data output unit 115 stores the DTMF signal according to the button correspondence table 310. Necessary language and function information is extracted (S102).

制御信号が、音声信号のときには、音声解析処理を行う（Ｓ１５０）。なお、音声解析処理については、後に、図１２のフローチャートにより後に詳述する。 When the control signal is an audio signal, an audio analysis process is performed (S150). The voice analysis process will be described later in detail with reference to the flowchart of FIG.

制御信号が、ＤＴＭＦ信号のときに、ＤＴＭＦ信号の送信が初回処理のときには（Ｓ１０３：Ｙｅｓ）、通話データ出力部１１５は、グループＩＤの値を初期化し（Ｓ１４０）、言語ＩＤを通話データテーブル３３０に書き込む（Ｓ１４１）。 When the control signal is the DTMF signal and the transmission of the DTMF signal is the first process (S103: Yes), the call data output unit 115 initializes the value of the group ID (S140) and sets the language ID to the call data table 330. (S141).

ＤＴＭＦ信号の送信が初回ではなく（Ｓ１０３：Ｎｏ）、そのＤＴＭＦ信号がリピートを表す信号のときは（Ｓ１０４：Ｙｅｓ）、ＤＴＭＦ信号解析部１１３は、コミュニケーション制御部１５０に連絡する。送信制御部１２０は、コミュニケーション制御部１５０の指示にしたがって、リピート処理を行ない（Ｓ１６０）、リピート処理で取り出した音声データを、送信制御部１２０のエンコーダ１２１が送信信号にエンコードして、電話端末１０に送信する。なお、リピート処理については、図１３のフローチャートにより後に詳述する。 When the transmission of the DTMF signal is not the first time (S103: No) and the DTMF signal is a signal indicating repeat (S104: Yes), the DTMF signal analysis unit 113 contacts the communication control unit 150. The transmission control unit 120 performs a repeat process in accordance with an instruction from the communication control unit 150 (S160), and the audio data extracted by the repeat process is encoded by the encoder 121 of the transmission control unit 120 into a transmission signal, so that the telephone terminal 10 Send to. The repeat process will be described later in detail with reference to the flowchart of FIG.

そのＤＴＭＦ信号がリピートを表す信号ではないときは（Ｓ１０４：Ｎｏ）、指定された言語の言語ＩＤに設定を切り換える（Ｓ１０５）。 If the DTMF signal is not a signal representing repeat (S104: No), the setting is switched to the language ID of the designated language (S105).

呼ＩＤ、言語ＩＤ、グループＩＤ、種別、シーケンシャルＩＤなどのパラメタが、コミュニケーション制御部１５０経由で、通話データ出力部１１５から言語処理部２００に渡され、言語処理部２００は、コミュニケーション制御部１５０の指示に従い、通話データテーブル３３０から該当する音声データを取得し（Ｓ１０６）、翻訳処理を行う（Ｓ１０７）。また、言語ＩＤを切り換え、種別を翻訳音声として、翻訳音声の音声データを通話データテーブル３３０に格納する（Ｓ１０８）。なお、音声データを認識したテキスト、翻訳テキストも通話データテーブル３３０に書き込まれる。 Parameters such as a call ID, a language ID, a group ID, a type, and a sequential ID are passed from the call data output unit 115 to the language processing unit 200 via the communication control unit 150. The language processing unit 200 According to the instruction, the corresponding voice data is acquired from the call data table 330 (S106), and translation processing is performed (S107). Further, the language ID is switched, the type is the translated voice, and the voice data of the translated voice is stored in the call data table 330 (S108). Note that the text in which the voice data is recognized and the translated text are also written in the call data table 330.

そして、全てのシーケンシャルＩＤの音声データ（一文の音声データ）を翻訳済みのときには（Ｓ１０９：Ｙｅｓ）、次のＳ１１０のステップに行き、翻訳済みでないときには（Ｓ１０９：Ｎｏ）、シーケンシャルＩＤを更新し（Ｓ１１３）、Ｓ１０６に戻り、処理を繰り返す。 If all the sequential ID voice data (one sentence voice data) has been translated (S109: Yes), the process proceeds to the next step S110. If not translated (S109: No), the sequential ID is updated (S109: No). S113), returning to S106, the process is repeated.

グループＩＤに属する全ての音声データを翻訳したときには、コミュニケーション制御部１５０から指示を受け、送信制御部１２０の通話データ入力部１２２は、通話データテーブル３３０から翻訳済みかつ未送信の翻訳音声の音声データを取り出し（Ｓ１１０）、かつ、受信制御部１１０の通話データ出力部１１５は、グループＩＤを更新する（Ｓ１１１）。 When all voice data belonging to the group ID is translated, the communication control unit 150 receives an instruction, and the call data input unit 122 of the transmission control unit 120 translates the voice data of the translated voice that has been translated from the call data table 330. (S110), and the call data output unit 115 of the reception control unit 110 updates the group ID (S111).

そして、送信制御部１２０のエンコーダが、Ｓ１１０で取り出した音声データを送信信号にエンコードして（Ｓ１１２）、電話端末１０に送信する。 Then, the encoder of the transmission control unit 120 encodes the audio data extracted in S110 into a transmission signal (S112) and transmits it to the telephone terminal 10.

次に、図１２を用いてＳ１５０の音声解析処理について説明する。 Next, the voice analysis process of S150 will be described with reference to FIG.

先ず、受信制御部１１０の通話データ出力部１１５は、シーケンシャルＩＤを初期化する（Ｓ２００）。 First, the call data output unit 115 of the reception control unit 110 initializes a sequential ID (S200).

次に、音声データの有無を判定し（Ｓ２０１）、音声データがないときには（Ｓ２０１：Ｎｏ）、処理を終了し、音声データがあるときには（Ｓ２０１：Ｙｅｓ）、次に、Ｓ２０２判定に行く（Ｓ２０２）。 Next, the presence / absence of voice data is determined (S201). When there is no voice data (S201: No), the process is terminated. When voice data is present (S201: Yes), the process goes to S202 (S202). ).

音声データに区切り（無音部分）があるときには（Ｓ２０２：Ｙｅｓ）、呼ＩＤ、言語ＩＤ、グループＩＤ、シーケンシャルＩＤに基づいて、通話データテーブル３３０に、その区切りの部分までの音声データを格納し（Ｓ２０３）、シーケンシャルＩＤを更新し（Ｓ２０４）、Ｓ２０１の判断に戻る。 When there is a break (silent part) in the voice data (S202: Yes), the voice data up to the break part is stored in the call data table 330 based on the call ID, language ID, group ID, and sequential ID ( S203), the sequential ID is updated (S204), and the process returns to the determination of S201.

音声データに区切りがないときには（Ｓ２０２：Ｎｏ）、区切り判断のポインタをインクリメントし（Ｓ２０５）、Ｓ２０２の判断に戻る。 When there is no break in the audio data (S202: No), the break determination pointer is incremented (S205), and the process returns to the determination of S202.

次に、図１３を用いてＳ１６０のリピート処理について説明する。 Next, the repeat process of S160 will be described with reference to FIG.

先ず、送信制御部１２０の通話データ入力部は、コミュニケーション制御部１５０からの指示を受け、呼ＩＤ、言語ＩＤ、グループＩＤ、種別、シーケンシャルＩＤに基づいて、通話データテーブル３３０から、直前に送信した音声データを取得する（Ｓ３００）。 First, the call data input unit of the transmission control unit 120 receives an instruction from the communication control unit 150 and transmits it from the call data table 330 immediately before based on the call ID, language ID, group ID, type, and sequential ID. Audio data is acquired (S300).

そして、そのグループＩＤ内の全てのシーケンシャルＩＤの音声データを取得したときには（Ｓ３０１：Ｙｅｓ）、処理を終了し、取得していないシーケンシャルＩＤの音声データがあるときには（Ｓ３０１：Ｎｏ）、シーケンシャルＩＤを更新し（Ｓ３０２）、Ｓ３００に戻る。 When the audio data of all the sequential IDs in the group ID are acquired (S301: Yes), the process is ended. When there is audio data of the sequential ID that has not been acquired (S301: No), the sequential ID is changed. Update (S302) and return to S300.

本実施形態のコミュニケーションシステムは、特殊な信号を生成する装置ではなく、ＤＴＭＦ信号をサポートしている全ての電話端末で利用可能であるという特徴がある。また、電話のボタンをプッシュすることは、広く普及している方法なので、自動翻訳を利用した経験のない者でも、とまどいなく簡便に利用できる
さらに、話者が言語を明確に指定するので、翻訳側の装置の負荷が少なく、判定も短時間で行なえるという特徴がある。 The communication system according to the present embodiment is not an apparatus that generates a special signal, but is characterized in that it can be used by all telephone terminals that support the DTMF signal. Pushing a button on a phone is a widely used method, so even those who have never used automatic translation can easily use it without difficulty. In addition, the speaker clearly specifies the language, so There is a feature that the load on the side device is small and the determination can be performed in a short time.

また、対面音声通訳において、１台の電話機を交互に受け渡しながら、自動翻訳の音声を聞く場合に、利用者にとって、自分の言語種別をキーにより指定した後に、相手の翻訳音声が流されるので、聞き逃しを防止することができるという特徴がある。 Also, in the face-to-face speech interpretation, when listening to the voice of automatic translation while alternately passing one phone, the translated voice of the other party is played after the user specifies his / her language type with the key. There is a feature that it is possible to prevent missed listening.

５…電話回線
１０…電話端末
１１…受話器
１２…送話器
１３…キー装置
１４…音声信号変換部
１５…接続制御部
１６…ＤＴＭ信号生成部
１００…コミュニケーションサーバ
１１０…受信制御部
１２０…送信制御部
１５０…コミュニケーション制御部
１６０…データベースアクセス部
２００…言語処理部
３００…データベース
３１０…ボタン対応テーブル
３２０…通話状態テーブル
３３０…通話データテーブル DESCRIPTION OF SYMBOLS 5 ... Telephone line 10 ... Telephone terminal 11 ... Handset 12 ... Transmitter 13 ... Key device 14 ... Voice signal conversion part 15 ... Connection control part 16 ... DTM signal generation part 100 ... Communication server 110 ... Reception control part 120 ... Transmission control Unit 150 ... Communication control unit 160 ... Database access unit 200 ... Language processing unit 300 ... Database 310 ... Button correspondence table 320 ... Call state table 330 ... Call data table

Claims

A communication system in which a telephone terminal and a communication server are connected by a telephone line,
Means for transmitting and receiving a call to the communication server; means for generating a DTMF (Dual-Tone Multi-Frequency) signal by input from a key device; and transmitting the signal to the communication server via the telephone line; Have
The communication server includes a language processing unit that performs speech translation from a first language to a second language, and voice in a first language related to a voice signal transmitted through the telephone line. Means for translating and transmitting to the telephone terminal; voice data in the language of the speaker; and a call data table for storing voice data of translated voice obtained by translating the voice data;
When the communication server receives the first DTMF signal from the telephone terminal, it receives the first DTMF signal until it receives the second DTMF signal operated and transmitted by another speaker. Storing the voice related to the voice signal received in the call data table as voice data in the language represented by the first DTMF signal,
After the second DTMF signal is transmitted, the speech signal related to the speech data of the translated speech obtained by translating the speech data of the language represented by the first DTMF signal into the language represented by the second DTMF signal, A communication system characterized by transmitting to the telephone terminal.

The said communication server transmits again the audio | voice signal which concerns on the audio | voice data transmitted to the said telephone terminal after reception of the last DTMF signal, when the DTMF signal showing a repeat function is received. Communication system.

A communication method in a communication system in which a telephone terminal and a communication server are connected by a telephone line,
The telephone terminal transmitting and receiving a call to the communication server;
The telephone terminal generates a DTMF (Dual-Tone Multi-Frequency) signal by input from a key device and transmits the signal to the communication server via the telephone line;
When the communication server receives the first DTMF signal from the telephone terminal, it receives the first DTMF signal until receiving the second DTMF signal operated and transmitted by another speaker. Storing the voice related to the voice signal received in the speech data table as voice data in the language represented by the first DTMF signal;
After the communication server transmits the second DTMF signal, the speech data of the language represented by the first DTMF signal is converted into the speech data of the translated speech that is translated into the language represented by the second DTMF signal. And transmitting the voice signal to the telephone terminal.

The communication server includes a step of retransmitting an audio signal related to audio data transmitted to the telephone terminal after receiving the immediately preceding DTMF signal when receiving a DTMF signal representing a repeat function. 3. The communication method according to 3.