JP2017191959A

JP2017191959A - Multilanguage voice translation system for tv conference system

Info

Publication number: JP2017191959A
Application number: JP2016078568A
Authority: JP
Inventors: 隼大迫; Hayato Osako; ゆりか上村; Yurika Uemura
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-04-11
Filing date: 2016-04-11
Publication date: 2017-10-19

Abstract

PROBLEM TO BE SOLVED: To translate languages so that communication may be performed in a TV conference using multiple languages.SOLUTION: This TV conference system identifies languages to be used in a conference for each base on the basis of attribute information of attendants, translates a language to the languages set to respective bases by translation apparatus connected to the system, and outputs translated contents to the respective bases as character information or voice information. This invention also includes the following mode of displaying translated contents to one's own base and receiving correction from an attendant. Further, display to the respective bases and correction in parallel is also one mode of the invention.SELECTED DRAWING: Figure 1

Description

本発明は、音声翻訳技術に関し、その中でも特に、ＴＶ会議システムでの翻訳技術に関する。 The present invention relates to speech translation technology, and more particularly to translation technology in a TV conference system.

現在、ＴＶ会議システムでの会議を行うことがなされている。ＴＶ会議においては、遠隔地同士で、移動を省略することが可能であり、有効活用されている。例えば、特許文献１には、特に、会議への出席者が国境をまたがる場合、費用削減や時間節約の効果を発揮する。 Currently, there is a conference using a TV conference system. In a video conference, it is possible to omit movement between remote locations and it is effectively used. For example, Patent Document 1 demonstrates the effect of cost reduction and time saving, particularly when attendees of a conference cross a border.

例えば、特許文献１には、ネットワーク回線網を利用したＴＶ会議システムにおいて、出席する拠点ＰＣ数が多くても容易に且つ迅速に多拠点ＰＣ間の接続を確立できるようにする技術が開示されている。 For example, Patent Document 1 discloses a technique for enabling a connection between multi-site PCs to be established easily and quickly even in a TV conference system using a network line network even if the number of local PCs attending is large. Yes.

特開2008-187659号公報JP 2008-187659 Gazette

ここで、TV会議システムの特性である遠隔地同士での会議が可能である点に着目すると、国境を跨ってのＴＶ会議が想定される。国境を跨る場合、それぞれの国で言語異なっていることになる。また、拠点が国境跨らない場合でも、出席者によって多言語が使われることがある。 Here, paying attention to the fact that a conference can be held between remote locations, which is a characteristic of the TV conference system, a TV conference across borders is assumed. When crossing national borders, each country has a different language. Even if the site does not cross the border, multilinguals may be used by attendees.

そこで、本発明では、出席者の属性情報に基づいて、該当会議で利用される言語を各拠点ごとに特定し、TV会議システムに接続された翻訳装置で、各拠点に設定された言語に翻訳し、翻訳内容を各拠点で文字情報ないし音声情報として出力するものである。本発明には、さらに以下の態様も含まれる。自拠点に翻訳内容を表示し、参加者からの修正を受け付けることも含まれる。さらに、各拠点への表示と修正を並行して行うことも発明の一態様である。 Therefore, in the present invention, based on the attendee attribute information, the language used in the corresponding conference is specified for each site, and is translated into the language set for each site by a translation device connected to the TV conference system. The translated content is output as text information or voice information at each site. The present invention further includes the following aspects. This includes displaying the contents of translation at your site and accepting corrections from participants. Furthermore, it is also one aspect of the invention to display and correct each site in parallel.

以上の本発明の構成により、多言語利用のＴＶ会議を実施可能になる。 With the above-described configuration of the present invention, a multilingual TV conference can be implemented.

本発明の一実施形態の構成および基本概念を示す図である。It is a figure which shows the structure and basic concept of one Embodiment of this invention. 本発明の一実施形態の拠点ごとのフローチャートである。It is a flowchart for every base of one Embodiment of this invention. 本発明の一実施形態のシステムフローチャートである。It is a system flowchart of one Embodiment of this invention. 本発明の一実施形態のシステム構成図およびデータベースを示す図である。It is a figure which shows the system configuration | structure figure and database of one Embodiment of this invention. 本発明の一実施形態のシステム構成図およびデータベースの遷移を示す図である。It is a figure which shows the system configuration | structure figure and database transition of one Embodiment of this invention. 本発明の一実施形態のＳ９、Ｓ１２およびＳ１３でのデータ遷移を示す図である。It is a figure which shows the data transition in S9, S12, and S13 of one Embodiment of this invention.

以下、本発明の一実施形態を、図面を用いて説明する。まず、図１が、本実施形態におけるTV会議システムの概念を示す図である。本実施形態では、音声を自動的に翻訳しテキスト化する機能をTV会議システムに搭載する。それにより、TV会議システムの画面に参加者に対応したローカル言語の字幕が表示されるようになり、より簡便に海外とのTV会議システムの利用を促進できる。また、自分が発信した音声のテキスト化も自分側の画面に映る仕組みとなっており、テキスト化された文字が容易に修正でき、他拠点にもすぐ反映できる。より詳細には、各拠点から個人情報を登録管理サーバ4に送信し、登録管理DB5に登録しておく。これに基づいて、翻訳言語を特定し、文字翻訳サーバ6が翻訳を実行し、各拠点のTV画面2,3に表示される。以下、図2以降を用いて、上記の処理内容を説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. First, FIG. 1 is a diagram showing a concept of the TV conference system in the present embodiment. In the present embodiment, a function for automatically translating speech into text is installed in the TV conference system. As a result, local language subtitles corresponding to the participants are displayed on the screen of the video conference system, and the use of the video conference system with foreign countries can be more easily promoted. In addition, the text of the voice that you have sent is also reflected on your screen, so you can easily correct the text and change it to other locations. More specifically, personal information is transmitted from each base to the registration management server 4 and registered in the registration management DB 5 in advance. Based on this, the translation language is specified, the character translation server 6 executes the translation, and is displayed on the TV screens 2 and 3 at each site. Hereinafter, the processing contents will be described with reference to FIG.

図4は、本実施形態のシステム構成図である。各装置がネットワーク1を介して接続されている。自拠点TV2及び他拠点TV3は、各会議室に設置されたTV会議用モニタである。ここでは、自拠点及び他拠点と表現を別々にしているが、同じ機能を有するものである。また、登録管理サーバ4及び登録管理DB5は、本TV会議システムに最低一つ用意されていればよい。そして、登録管理サーバ4は、TV会議を行うための個人の音声と言語情報を記憶する登録管理DB5を有する。これによって、各拠点の利用言語を特定できる。また、文字翻訳サーバ6は、音声を文字化する機能と文字を他言語文字へ翻訳する機能を有する。また、文字翻訳DB7には、文字化された結果が変換前の音声情報と対応付けて記憶されている。以上の登録管理サーバ4及び文字翻訳サーバ6はいわゆるコンピュータで実現されるものであり、以降説明するフローチャートの処理は、自身が有するプログラムに従って、CPUのような演算装置により実行される。 FIG. 4 is a system configuration diagram of the present embodiment. Each device is connected via the network 1. The own site TV 2 and the other site TV 3 are TV conference monitors installed in each conference room. Here, the expression is separated from the own base and other bases, but they have the same function. Further, at least one registration management server 4 and registration management DB 5 may be prepared in the video conference system. The registration management server 4 has a registration management DB 5 that stores personal voice and language information for conducting a video conference. As a result, the language used at each site can be specified. Further, the character translation server 6 has a function of converting speech into characters and a function of translating characters into other language characters. The character translation DB 7 stores the characterized result in association with the speech information before conversion. The registration management server 4 and the character translation server 6 described above are realized by so-called computers, and the processing of the flowchart described below is executed by an arithmetic device such as a CPU according to the program that the device itself has.

次に図2を用いて、会議前に実行する利用者手順を説明する。本図2の処理は、特に断らない限り、登録管理サーバ4で実行される。まずS1において、会議に出席する利用者のLDAPを登録する。本登録においては、各参加者が有する図示しないPCを介して、LDAPのごとき個人識別番号を受け付け、登録管理DB5の「LDAP」の項目に入力する。この際、参加予定者から会議の参加の意思を受け付け、その結果を会議参加フラグに記録する。また、登録管理DB5の利用拠点の項目については、LDAPを受け付ける際、利用拠点を識別する情報を受け付け、登録する。 Next, the user procedure executed before the meeting will be described with reference to FIG. The processing of FIG. 2 is executed by the registration management server 4 unless otherwise specified. First, in S1, the LDAP of the user who will attend the meeting is registered. In the actual registration, a personal identification number such as LDAP is received via a PC (not shown) possessed by each participant and entered in the item “LDAP” of the registration management DB 5. At this time, the intention of participating in the conference is accepted from the prospective participants, and the result is recorded in the conference participation flag. In addition, with respect to the use base item of the registration management DB 5, when LDAP is received, information for identifying the use base is received and registered.

次にS2において、言語及び音声情報の登録要否の判断を行う。この処理では、S1で入力されたLDAPをキーに登録管理DB5を検索して、登録されたことがあるかの判断を行う。なおこの判断は、LDAPの有無で判断しても良いし、検索されたLDAPに対応する情報の有無で判断しても良い。ここで、LDAPの有無で判断する場合、S1の登録の際に判断する。検索の結果、登録したことがない場合はS3に進む。 Next, in S2, it is determined whether or not language and voice information registration is necessary. In this process, the registration management DB 5 is searched using the LDAP entered in S1 as a key to determine whether it has been registered. This determination may be made based on the presence or absence of LDAP or the presence or absence of information corresponding to the searched LDAP. Here, when judging based on the presence or absence of LDAP, it is judged at the time of registration of S1. As a result of the search, if it has never been registered, the process proceeds to S3.

S3及びS4において、S2で受け付けられた言語を特定する情報と音声情報を登録管理DB5の利用言語の項目及び音声コードの項目に登録する。この際、言語を特定する情報である利用言語に関しては、参加者から入力された音声を解析することによって登録してもよい。S3及びS4においては、S1と同様に、PCを介して各参加者から自身が利用する言語を特定する情報を受け付ける。また、参加者からマイク等を介して、音声の入力を受け付ける。 In S3 and S4, the information specifying the language accepted in S2 and the voice information are registered in the use language item and the voice code item of the registration management DB5. At this time, the language used, which is information for specifying the language, may be registered by analyzing the voice input from the participant. In S3 and S4, as in S1, information specifying the language used by each participant is received from each participant via the PC. Also, voice input is accepted from the participant via a microphone or the like.

S2で登録されたことがあると判断した場合及びS4までの処理が終わった場合、S5に進む。S5においては、会議参加者の内、登録した言語以外の言語を利用する人がいるかを判断する。この判断は、当該拠点で複数の言語を利用するかであり、当該拠点の参加予定者が判断する。他言語を利用する場合は、S1に戻り、他言語を利用する参加予定者の登録手続きを行う。また、他言語を利用する人がいない場合は、S6に進み、当該会議の利用者情報一覧を登録する。この登録処理は、登録管理サーバ4が会議参加フラグが○になっているLDAP、利用言語、利用拠点、音声コードを文字翻訳サーバ6に送信し、文字翻訳サーバ6が文字翻訳DB7に登録する。以上の処理を各拠点ごとに実行する。 If it is determined that it has been registered in S2, or if the processing up to S4 is completed, the process proceeds to S5. In S5, it is determined whether there is a person who uses a language other than the registered language among the conference participants. This determination is based on whether a plurality of languages are used at the site, and is determined by a person who plans to participate in the site. If you want to use another language, return to S1 and register for the prospective participants who will use the other language. If there is no person who uses another language, the process proceeds to S6 to register the user information list of the conference. In this registration process, the registration management server 4 transmits the LDAP, use language, use base, and voice code whose conference participation flag is ◯ to the character translation server 6, and the character translation server 6 registers it in the character translation DB 7. The above processing is executed for each base.

次に、図3を用いて、会議中の処理内容を説明する。TV会議開始時点においては、上述したように、登録管理DB5より会議参加フラグに○がついている人の情報は、あらかじめ文字翻訳DB7に登録されている。この内容は図5に示す登録管理DB5及び文字翻訳DB7の通りである。つまり、TV会議開始時点では、文字翻訳DB7の音声情報、文字データ、文字コード、自拠点フラグは空白となっている。この状態において、会議が始まった場合、参加者から発言がなされ、S7の処理を行う。S7において、文字翻訳サーバ6は自拠点TVを介して、発言者の音声情報を受信し、音声情報から発信者を認識し、文字データ、文字コードを作成する。本例では、拠点AにいるLDAP111 の方が「ありがとう」と発言した場合、利用拠点と音声コードから音声情報を文字変換し、下記の文字データ、文字コードを作成する。音声情報から文字変換した場合は、自拠点フラグを立てる。これらの情報を文字翻訳DB7に格納する。この格納した状態を示すのが図5におけるS7と記載した箇所である。 Next, the processing contents during the meeting will be described with reference to FIG. At the time of starting the video conference, as described above, the information on the person whose conference participation flag is marked from the registration management DB 5 is registered in the character translation DB 7 in advance. The contents are as shown in the registration management DB 5 and the character translation DB 7 shown in FIG. That is, at the time of starting the video conference, the voice information, character data, character code, and own site flag of the character translation DB 7 are blank. In this state, when the conference starts, the participant makes a remark and performs the process of S7. In S7, the character translation server 6 receives the voice information of the speaker through its own TV, recognizes the caller from the voice information, and creates character data and character codes. In this example, when the LDAP 111 in the site A says “Thank you”, the voice information is converted from the usage site and the voice code to create the following character data and character code. When character conversion is performed from voice information, a local base flag is set. These pieces of information are stored in the character translation DB7. The stored state is indicated by S7 in FIG.

S8において、S7で作成された文字データ、文字コード、自拠点フラグを自拠点に返し、自拠点TV画面上部に字幕表示する。この際、自拠点フラグより利用拠点Aは自拠点であることを判断し、画面上部に文字データを字幕として映す。 In S8, the character data, character code, and own site flag created in S7 are returned to the own site, and the caption is displayed on the upper part of the own site TV screen. At this time, it is determined from the own site flag that the use site A is the own site, and character data is displayed as subtitles at the top of the screen.

S9において、文字データを他言語に翻訳し、データ化する。本例では、文字データ「ありがとう」を他拠点の利用言語に合わせ、文字データに翻訳する。翻訳されたデータは図6のS9の表に示したように、文字翻訳DB7に格納される。この格納については、S7で作成した文字コードも各行に登録する。その後、拠点B、拠点Cに各文字データ「Thank you」「多謝」、文字コード、自拠点フラグを返す。 In S9, character data is translated into another language and converted into data. In this example, the character data “Thank you” is translated into character data in accordance with the language used at another base. The translated data is stored in the character translation DB 7 as shown in the table of S9 in FIG. For this storage, the character code created in S7 is also registered in each line. After that, the character data “Thank you”, “Takei”, character code, and own site flag are returned to the sites B and C.

またS10において、S8の自拠点への表示に対する他拠点の表示を行う。この表示においては、自拠点フラグから利用拠点B、利用拠点Cは他拠点であると判断し、画面下部に文字データを字幕として映す。本実施形態では、自拠点の字幕は画面上部に表示し、他拠点の字幕は画面下部に表示しているが、それぞれ異なる位置関係になるよう表示できればよく、上下を入れ替えたり、左右にしたりなど他の態様も含む。 In S10, other sites are displayed relative to the display in S8. In this display, it is determined that the use base B and the use base C are other bases from the own base flag, and character data is displayed as subtitles at the bottom of the screen. In this embodiment, subtitles at your site are displayed at the top of the screen, and subtitles at other sites are displayed at the bottom of the screen, but it is only necessary to be able to display them in different positional relationships. Other embodiments are also included.

S11において、自拠点が発した字幕が正しく表示されているかの入力を受け付ける。この結果、誤表記有りとの入力を受け付けた場合は、S12に進む。なお、S9〜S10のフローとS11〜S12は並行に行うことで、効率よく修正ができる。 In S11, an input as to whether the subtitles issued by the user's own site are correctly displayed is accepted. As a result, if an input indicating that there is an error is received, the process proceeds to S12. In addition, it can correct efficiently by performing the flow of S9-S10 and S11-S12 in parallel.

S12において、参加者からの修正個所の選択及び文字データの修正内容を受け付け、それに従って、修正を行う。この結果、図6のS12の表示内容に、文字翻訳DB7が修正される。本例では、A拠点のTV画面から修正を行い、文字データの中身が「ありがとう」から「ごめんなさい」に修正された。 In S12, the selection of the correction part and the correction contents of the character data from the participant are received, and the correction is performed accordingly. As a result, the character translation DB 7 is corrected to the display content of S12 in FIG. In this example, corrections were made from the TV screen at site A, and the content of the character data was changed from "Thank you" to "I'm sorry".

S13において、S12で修正された箇所を再翻訳し、他拠点に表示させる。この結果、図6のS13の表示内容に、文字翻訳DB7が修正される。本例では、文字データ「ごめんなさい」を他拠点の利用言語に合わせて、下表の文字データに再翻訳する。その後、拠点B、拠点Cに、各文字データ「Sorry」「対不起」、文字コード、自拠点フラグを返す。各拠点は同一の文字コードの文字データが既に存在していることにより、再翻訳であることを認識し、修正する。修正部分は色つき等で目立たせて、表示しても良い。以上が、本実施での処理の内容である。なお、S11,S12はS9,S10と並行処理で行っても良い。 In S13, the part corrected in S12 is re-translated and displayed on other sites. As a result, the character translation DB 7 is corrected to the display content of S13 in FIG. In this example, the character data “I'm sorry” is re-translated into the character data shown in the table below according to the language used at the other site. After that, each character data “Sorry”, “vs. no occurrence”, character code, and own site flag are returned to site B and site C. Each base recognizes that it is re-translation because character data with the same character code already exists, and corrects it. The corrected part may be displayed with a color or the like. The above is the content of the processing in this embodiment. S11 and S12 may be performed in parallel with S9 and S10.

１…ネットワーク、２…自拠点ＴＶ画面、３…他拠点ＴＶ画面、４…登録管理サーバ、５…登録管理ＤＢ、６…文字翻訳サーバ、７…文字翻訳ＤＢ DESCRIPTION OF SYMBOLS 1 ... Network, 2 ... Own site TV screen, 3 ... Other site TV screen, 4 ... Registration management server, 5 ... Registration management DB, 6 ... Character translation server, 7 ... Character translation DB

Claims

In a multi-lingual speech translation system in a TV conference system for translating the speech of each speaker in a TV conference system connected via a network,
Based on the attendee attribute information, a means for identifying the language used in the meeting for each location,
A multilingual speech translation system in a TV conference system, comprising: means for translating into a language set in each site and outputting the translated content on the TV conference screen of each site.

In the multilingual speech translation system in the video conference system according to claim 1,
A multilingual speech translation system in a TV conference system, further comprising means for displaying a translation content at a local site of the TV conference system and receiving a correction from a participant.

The multilingual speech translation system in the video conference system according to claim 2,
The multilingual speech translation system in a TV conference system, wherein the receiving means performs display on the other site and correction reception in parallel.