JP2020119043A

JP2020119043A - Voice translation system and voice translation method

Info

Publication number: JP2020119043A
Application number: JP2019007295A
Authority: JP
Inventors: 真太郎海南; Shintaro Kainan
Original assignee: Daiei Trading Co Ltd
Current assignee: Daiei Trading Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2020-08-06

Abstract

To provide a voice translation system and a voice translation method that smoothly translate a conversation without having to specify that a language is different for each language to be used even in situations that it is the first time for the conversation without any information about the other party.SOLUTION: A voice translation system 1 includes: an information terminal 10 capable of inputting a voice of a conversation in which each user uses a different language and outputting the content of the input voice of the user in a language different from the language used by the user; and a translation server 20 connected to the information terminal via a network 30 and translating the content of the voice of the user input to the information terminal into the language different from the language used by the user. The translation server includes: language specifying means 21 for specifying a plurality of languages used in the conversation based on the voice of the conversation; and translation means 22 for translating the content of the voice of the user into a language different from the language used by the user among the plurality of languages specified by the language specifying means.SELECTED DRAWING: Figure 1

Description

本発明は、音声翻訳システムおよび音声翻訳方法に関する。 The present invention relates to a speech translation system and a speech translation method.

互いの言語を理解できない人同士の会話、例えばタクシーの乗務員と日本語以外の言語でしか話すことができない顧客（海外からの観光客等）との会話を母国語で理解できるようにするために、話し手の音声を聞き手の母国語に翻訳する翻訳システムが多数提案されている。 In order to understand the conversation between people who do not understand each other's language, for example, the conversation between a taxi driver and a customer who can only speak in a language other than Japanese (such as tourists from overseas) in their native language. , Many translation systems have been proposed that translate the speaker's voice into the listener's native language.

例えば、特許文献１には、音声の入力及び出力が可能であり、入力された音声の内容を異なる言語に翻訳して音声として出力する音声翻訳装置と、音声の入力及び出力が可能であると共に、前記音声翻訳装置と無線接続された音声入出力装置と、を備え、前記音声翻訳装置は、前記音声入出力装置に入力された音声の内容を異なる言語に翻訳して当該音声翻訳装置から音声出力すると共に、当該音声翻訳装置に入力された音声の内容を異なる言語に翻訳して前記音声入出力装置から音声出力させる、音声翻訳システムが開示されている。 For example, in Patent Document 1, a voice can be input and output, a voice translation device that translates the content of the input voice into a different language and outputs the voice, and a voice input and output are possible. And a voice input/output device wirelessly connected to the voice translation device, wherein the voice translation device translates the content of the voice input to the voice input/output device into a different language and outputs the voice from the voice translation device. A voice translation system is disclosed that outputs the voice, translates the content of the voice input to the voice translation device into different languages, and outputs the voice from the voice input/output device.

特開２０１８−１７３９１０号公報JP, 2008-173910, A

しかしながら、上述した音声翻訳システムでは、最初に、音声翻訳システムで使用する言語を指定しなければならないという問題点があった。 However, the above-described speech translation system has a problem that the language used in the speech translation system must be designated first.

また、使用する言語を指定するためには、相手がどの言語を話すのか予め知っておく必要があり、例えばタクシーの乗務員と顧客との関係等のように、相手に関する情報が一切なく、会話するのが初めてという場合には、上述した音声翻訳システムを利用しにくいという問題点があった。 In addition, in order to specify the language to be used, it is necessary to know in advance which language the other party speaks. For example, there is no information about the other party, such as the relationship between the taxi crew and the customer, and the conversation takes place. If this is the first time, there is a problem that it is difficult to use the above speech translation system.

さらに、上述した音声翻訳システムでは、会話ごと、すなわち話し手が変わるごとに、言語が異なることを指定する必要があるという問題点があった。 Further, the above-described voice translation system has a problem that it is necessary to specify that the language is different for each conversation, that is, for each speaker.

本発明は上述した事情に鑑み、会話ごとに、言語が異なることを指定する必要がなく、かつ相手に関する情報（どの言語を話すか等の情報）が一切なく、会話するのが初めてという状況であっても、会話をスムーズに翻訳することができる音声翻訳システムおよび音声翻訳方法を提供することを目的とする。 In view of the above-mentioned circumstances, the present invention is a situation in which it is not necessary to specify that the language is different for each conversation, and there is no information about the other party (information such as which language to speak) at all for the first time. An object of the present invention is to provide a voice translation system and a voice translation method that can smoothly translate a conversation.

本発明の発明者は、上述した問題点に関して鋭意研究・開発を続けた結果、以下のような画期的な音声翻訳システムおよび音声翻訳方法を見出した。 The inventor of the present invention, as a result of earnest research and development on the above-mentioned problems, has found the following epoch-making speech translation system and speech translation method.

上記課題を解決するための本発明の第１の態様は、各ユーザが異なる言語を使用する会話の音声を入力することができ、入力されたユーザの音声の内容をユーザが使用する言語と異なる言語で出力する情報端末と、情報端末とネットワークを介して接続され、情報端末に入力されたユーザの音声の内容をユーザが使用する言語と異なる言語に翻訳する翻訳サーバとを具備する音声翻訳システムであって、翻訳サーバは、会話の音声から、会話に使用される複数の言語を特定する言語特定手段と、ユーザの音声の内容を、言語特定手段によって特定された複数の言語のうち、ユーザが使用する言語と異なる言語に翻訳する翻訳手段と、を有することを特徴とする音声翻訳システムにある。 A first aspect of the present invention for solving the above problem is that each user can input a voice of conversation using a different language, and the content of the input voice of the user is different from the language used by the user. A speech translation system including an information terminal for outputting in a language and a translation server connected to the information terminal via a network and translating the content of the user's voice input to the information terminal into a language different from the language used by the user. That is, the translation server determines the content of the user's voice from the language of the conversation, and the content of the user's voice from among the plurality of languages identified by the language identification means. And a translation means for translating the language into a language different from the language used by the voice translation system.

ここで、「出力」とは、音声として出力することだけでなく、音声の内容を文字で表示する場合を含む概念である。また、「音声」とは、音声だけではなく音声データを含む概念である。ただし、「音声出力」または「音声で出力」とは、音声データではなく音声として出力されることをいう。 Here, “output” is a concept including not only outputting as voice but also displaying the content of voice in characters. In addition, “voice” is a concept that includes voice data as well as voice. However, "voice output" or "voice output" means that the voice data is output as voice.

かかる第１の態様では、会話ごとに、会話に使用される言語を指定することなく、かつ相手に関する情報が一切なく、会話するのが初めてという状況であっても、会話をスムーズに翻訳することができる。 In the first aspect, the conversation can be smoothly translated for each conversation even if the conversation is the first time without specifying the language used for the conversation and without any information about the other party. You can

本発明の第２の態様は、翻訳手段は、人工知能を用いて翻訳を行うことを特徴とする第１の態様に記載の音声翻訳システムにある。 A second aspect of the present invention is the speech translation system according to the first aspect, wherein the translation means translates using artificial intelligence.

かかる第２の態様では、より正確な翻訳を提供することができる。 In the second aspect, more accurate translation can be provided.

本発明の第３の態様は、情報端末からの出力が、音声として出力されることを特徴とする第１または第２の態様に記載の音声翻訳システムにある。 A third aspect of the present invention is the voice translation system according to the first or second aspect, wherein the output from the information terminal is output as voice.

かかる第３の態様では、翻訳が音声として出力されるので、異なる言語を使用した会話をスムーズに行うことができる。 In the third aspect, since the translation is output as a voice, conversation using different languages can be smoothly conducted.

本発明に係る第４の態様は、情報端末は、言語指定手段を有し、翻訳サーバは、言語指定手段により指定された言語を、言語特定手段により特定される言語に加えることを特徴とする第１〜第３の態様の何れか１つに記載の音声翻訳システムにある。 A fourth aspect of the present invention is characterized in that the information terminal has a language designating means, and the translation server adds the language designated by the language designating means to the language designated by the language specifying means. The speech translation system according to any one of the first to third aspects.

かかる第４の態様では、使用される言語のうち、少なくとも１つの言語を指定しておくことができるので、より正確に入力された音声の内容を翻訳することができる。 In the fourth aspect, at least one of the languages used can be designated, so that the content of the input voice can be translated more accurately.

本発明の第５の態様は、ユーザの耳またはその近傍に取り付けられると共に情報端末および翻訳サーバの少なくとも何れか一方とネットワークを介して接続され、翻訳サーバによって異なる言語に翻訳されたユーザの音声の内容を音声として出力する音声出力端末をさらに具備することを特徴とする第１〜第４の態様の何れか１つに記載の音声翻訳システムにある。 A fifth aspect of the present invention is to attach a user's voice that is attached to a user's ear or its vicinity and is connected to at least one of an information terminal and a translation server via a network, and translated into different languages by the translation server. The voice translation system according to any one of the first to fourth aspects, further comprising a voice output terminal that outputs the content as voice.

ここで、「ユーザの耳の近傍」とは、出力された音声をユーザが確実に聞き取れるような場所を言い、例えば、耳の上部や首元等が挙げられる。 Here, “in the vicinity of the user's ear” means a place where the user can reliably hear the output voice, and examples thereof include the upper part of the ear and the neck.

かかる第５の態様では、翻訳された音声を容易に聞き取ることができるので、異なる言語を使用した会話をよりスムーズに行うことができる。 In the fifth aspect, the translated voice can be easily heard, so that conversation using different languages can be conducted more smoothly.

本発明の第６の態様は、各ユーザが異なる言語を使用する会話のうち、入力されたユーザの音声の内容をユーザが使用する言語と異なる言語で出力する音声翻訳方法であって、会話の音声から、会話に使用される複数の言語を特定する言語特定ステップと、ユーザの音声の内容を、言語特定ステップによって特定された複数の言語のうち、ユーザが使用する言語と異なる言語に翻訳する翻訳ステップと、を具備することを特徴とする音声翻訳方法にある。 A sixth aspect of the present invention is a speech translation method for outputting the content of the input voice of the user in a language different from the language used by the user among the conversations in which the users use different languages. A language specifying step for specifying a plurality of languages used for conversation and a content of a user's voice are translated from a voice into a language different from the language used by the user among the plurality of languages specified by the language specifying step. And a translation step.

かかる第６の態様では、会話に使用される言語を指定することなく、かつ相手に関する情報が一切なく、例えば、会話するのが初めてという状況であっても、会話をスムーズに翻訳することができる。 In the sixth aspect, the conversation can be smoothly translated without designating the language used for the conversation and without any information about the other party, for example, even in the case of the first conversation. ..

なお、本発明において、「手段」、「システム」とは、単に物理的手段を意味するものではなく、その「手段」、「システム」が有する機能をソフトウェアによって実現する場合も含まれる。また、１つの「手段」、「システム」が有する機能が２つ以上の物理的手段や装置により実現されても、２つ以上の「手段」、「システム」の機能が１つの物理的手段や装置により実現されてもよい。 In the present invention, “means” and “system” do not simply mean physical means, but include cases where the functions of the “means” and “system” are realized by software. Further, even if the function of one “means” or “system” is realized by two or more physical means or devices, the functions of two or more “means” or “system” are one physical means or It may be realized by the device.

図１は実施形態１に係る音声翻訳システムの概略概念図である。FIG. 1 is a schematic conceptual diagram of the speech translation system according to the first embodiment. 図２は実施形態１に係る音声翻訳システムの動作を示すフローチャートである。FIG. 2 is a flowchart showing the operation of the speech translation system according to the first embodiment. 図３は実施形態２に係る音声翻訳システムの概略概念図である。FIG. 3 is a schematic conceptual diagram of the speech translation system according to the second embodiment.

以下に添付図面を参照して、本発明に係る音声翻訳システムの実施形態を説明する。なお、本発明は、以下の実施形態に限定されるものではない。
（実施形態１） An embodiment of a speech translation system according to the present invention will be described below with reference to the accompanying drawings. The present invention is not limited to the embodiments below.
(Embodiment 1)

図１は、本実施形態に係る音声翻訳システムの概略概念図である。この図に示すように、音声翻訳システム１は、情報端末１０と、情報端末１０にネットワーク３０を介して接続された翻訳サーバ２０と、情報端末１０と接続された複数（３つ）の音声出力端末４０とで構成されている。 FIG. 1 is a schematic conceptual diagram of a speech translation system according to this embodiment. As shown in this figure, the speech translation system 1 includes an information terminal 10, a translation server 20 connected to the information terminal 10 via a network 30, and a plurality of (three) speech outputs connected to the information terminal 10. It is composed of a terminal 40.

情報端末１０は、ユーザ１００ａ〜１００ｃが話す音声を入力することができるものであれば特に限定されない。なお、情報端末１０は、後述する翻訳された音声を出力できるようにスピーカー等の音声出力装置を有していてもよい。情報端末１０としては、例えば、携帯電話、スマートフォンや音声入出力機能を有する専用端末などが挙げられる。ここで、ユーザ１００ａ〜１００ｂはそれぞれ異なる言語で会話する人物である。 The information terminal 10 is not particularly limited as long as it can input the voice spoken by the users 100a to 100c. The information terminal 10 may have a voice output device such as a speaker so that a translated voice described later can be output. Examples of the information terminal 10 include a mobile phone, a smartphone, and a dedicated terminal having a voice input/output function. Here, the users 100a to 100b are people who speak in different languages.

翻訳サーバ２０は、ユーザ１００ａ〜１００ｃの会話から、各ユーザ１００ａ〜１００ｃが使用する言語を特定する言語特定手段２１と、ユーザの音声の内容を、言語特定手段２１によって特定された複数の言語のうち、そのユーザが使用する言語と異なる言語に翻訳する翻訳手段２２とを有している。 The translation server 20 identifies the language used by each of the users 100a to 100c from the conversation of the users 100a to 100c and the content of the user's voice in a plurality of languages identified by the language specifying means 21. Among them, it has a translation means 22 for translating it into a language different from the language used by the user.

言語特定手段２１としては、各ユーザ１００ａ〜１００ｃが使用する言語を特定する機能を有するものであれば特に限定されず、例えば、そのような機能を有するプログラム等が挙げられる。具体的には、例えば、ユーザ１００ａは日本語を話し、ユーザ１００ｂは英語を話し、ユーザ１００ｃはフランス語を話す場合に、言語特定手段２１としては、ユーザ１００ａの音声から会話に日本語が使われていることを特定し、ユーザ１００ｂの音声から会話に英語が使われていることを特定し、ユーザ１００ｃの音声から会話にフランス語が使われていることを特定するプログラム等が挙げられる。ここで、言語を特定する方法としては、例えば、入力された音声に含まれる単語を検出し、その単語からどの言語が使用されているか判断するという方法等が挙げられる。 The language specifying unit 21 is not particularly limited as long as it has a function of specifying the language used by each of the users 100a to 100c, and examples thereof include a program having such a function. Specifically, for example, when the user 100a speaks Japanese, the user 100b speaks English, and the user 100c speaks French, the language specifying means 21 uses Japanese from the voice of the user 100a for conversation. A program that specifies that the user 100b speaks English, and that the user 100c sounds that French is used in the conversation. Here, as a method of specifying the language, for example, a method of detecting a word included in the input voice and determining which language is used from the word, and the like can be mentioned.

また、翻訳手段２２としては、言語特定手段２１によって特定された複数の言語のうち、そのユーザが使用する言語と異なる言語に翻訳する機能（例えば、まず音声情報をテキストデータ等に変換し、そのテキストデータ等を別の言語に翻訳された翻訳テキストデータに変換し、さらにその翻訳テキストデータを翻訳音声情報に変換する機能等）を有するものであれば特に限定されず、例えば、そのような機能を有するプログラム等が挙げられる。具体的には、例えば、ユーザ１００ａは日本語を話し、ユーザ１００ｂは英語を話し、ユーザ１００ｃはフランス語を話す場合に、翻訳手段２２としては、ユーザ１００ａの音声の内容（日本語で表現された内容）を英語に翻訳したり、フランス語に翻訳するプログラム等が挙げられる。 In addition, the translation unit 22 has a function of translating into a language different from the language used by the user among the plurality of languages identified by the language identification unit 21 (for example, first, converting voice information into text data, There is no particular limitation as long as it has a function of converting text data or the like into translated text data translated into another language, and further converting the translated text data into translated voice information. For example, such a function And the like. Specifically, for example, when the user 100a speaks Japanese, the user 100b speaks English, and the user 100c speaks French, the translating means 22 serves as the translating means 22 (the content of the voice of the user 100a (expressed in Japanese). (Contents) into English or French.

なお、翻訳手段２２としては、ユーザが使用する言語と異なる言語に、そのユーザの音声の内容を翻訳できるものであれば特に限定されないが、人工知能（ＡＩ）を利用して翻訳できるものが好ましい。人工知能を利用することによって、人工知能の学習効果によって、より正確に翻訳できるようになる。人工知能としては、特に限定されず、翻訳に利用できる公知の技術を用いることができる。 The translation unit 22 is not particularly limited as long as it can translate the content of the user's voice into a language different from the language used by the user, but it is preferable that it can be translated using artificial intelligence (AI). .. By using artificial intelligence, the learning effect of artificial intelligence enables more accurate translation. The artificial intelligence is not particularly limited, and known techniques that can be used for translation can be used.

翻訳サーバ２０は、上述した言語特定手段２１および翻訳手段２２を有するものであれば特に限定されない。翻訳サーバ２０としては、例えば、パーソナルコンピュータやそれらを並列に接続したコンピュータシステムなどが挙げられる。 The translation server 20 is not particularly limited as long as it has the language specifying unit 21 and the translation unit 22 described above. Examples of the translation server 20 include personal computers and computer systems in which they are connected in parallel.

ネットワーク３０は、情報端末１０と翻訳サーバ２０とを接続（有線での接続や無線での接続を含む）できるものであれば特に限定されず、インターネットやイントラネット等が挙げられる。 The network 30 is not particularly limited as long as it can connect the information terminal 10 and the translation server 20 (including wired connection and wireless connection), and examples thereof include the Internet and an intranet.

音声出力端末４０は、情報端末１０と接続（有線での接続や無線での接続を含む）することができ、翻訳サーバ２０によって異なる言語に翻訳されたユーザ１００ａ〜１００ｃの音声の内容を音声出力することができるものであれば特に限定されない。音声出力端末４０としては、例えば、情報端末１０に無線を介して接続されるイヤホンやスピーカー等が挙げられるが、イヤホンが好ましい。イヤホンを用いることによって、外部環境音等のノイズの影響を抑えることができるので、翻訳された音声を分かり易くユーザ１００ａ〜１００ｃに聞かせることができる。 The voice output terminal 40 can be connected to the information terminal 10 (including wired connection and wireless connection), and outputs the voice content of the users 100a to 100c translated by the translation server 20 into different languages. There is no particular limitation as long as it can be done. Examples of the voice output terminal 40 include earphones and speakers that are connected to the information terminal 10 wirelessly, but earphones are preferable. By using the earphones, it is possible to suppress the influence of noise such as external environmental sound, so that the translated sounds can be heard by the users 100a to 100c in an easily understandable manner.

次に、音声翻訳システム１の動作を説明する。図２は、音声翻訳システム１の動作を示すフローチャートである。 Next, the operation of the speech translation system 1 will be described. FIG. 2 is a flowchart showing the operation of the speech translation system 1.

この図に示すように、情報端末１０の電源や翻訳のスタートボタンが押されると、音声翻訳システム１が動作を始め、まずユーザ１００ａ〜１００ｃの会話の音声が情報端末１０に入力される（Ｓ１）。 As shown in this figure, when the power of the information terminal 10 or the start button for translation is pressed, the speech translation system 1 starts to operate, and the speech of the conversation of the users 100a to 100c is input to the information terminal 10 (S1). ).

すると、情報端末１０は、入力された音声を翻訳サーバに送信する（Ｓ２）。送信された音声を受信すると、翻訳サーバ２０は、言語特定手段２１により、その会話に使用されている言語を特定する（Ｓ３）。具体的には、送信された音声には、日本語、英語およびフランス語が使用されていると特定する。 Then, the information terminal 10 transmits the input voice to the translation server (S2). Upon receiving the transmitted voice, the translation server 20 identifies the language used for the conversation by the language identifying means 21 (S3). Specifically, it specifies that the transmitted voice uses Japanese, English, and French.

次に、翻訳サーバ２０は、言語特定手段２１によって特定された複数の言語のうち、その音声に使用されている言語と異なる言語にその音声の内容を翻訳する（Ｓ４）。具体的には、日本語の音声の内容を英語の音声とフランス語の音声に翻訳する。すなわち、日本語の音声を二ヵ国語（英語、フランス語）に翻訳する。 Next, the translation server 20 translates the content of the voice into a language different from the language used for the voice among the plurality of languages identified by the language identifying means 21 (S4). Specifically, it translates Japanese voice content into English voice and French voice. That is, the Japanese voice is translated into two languages (English and French).

そして、翻訳サーバ２０は、異なる外国語に翻訳された音声を情報端末１０に送信する（Ｓ５）。すると、情報端末１０は、その翻訳された音声を音声出力端末４０に送信する（Ｓ６）。 Then, the translation server 20 transmits the voice translated into a different foreign language to the information terminal 10 (S5). Then, the information terminal 10 transmits the translated voice to the voice output terminal 40 (S6).

音声出力端末４０は、自動音声として翻訳された音声を出力する（Ｓ７）。具体的には、音声出力端末４０は、例えば、英語の音声を出力した後、フランス語の音声を出力する。そして、Ｓ４〜Ｓ７のステップが、会話が終了するまで繰り返され（Ｓ８）、会話が終了すると、音声翻訳システム１は停止する。 The voice output terminal 40 outputs the voice translated as an automatic voice (S7). Specifically, the voice output terminal 40 outputs, for example, an English voice and then a French voice. Then, steps S4 to S7 are repeated until the conversation ends (S8), and when the conversation ends, the speech translation system 1 stops.

以上説明したように、本実施形態に係る音声翻訳システム１を構成することによって、会話に使用される言語を指定することなく、かつ相手に関する情報が一切なく、会話するのが初めてという状況であっても、会話をスムーズに翻訳することができる。
（実施形態２） As described above, by configuring the speech translation system 1 according to the present embodiment, it is the first time for a conversation without specifying the language used for the conversation and without any information about the other party. However, the conversation can be translated smoothly.
(Embodiment 2)

上述した実施形態１では、会話で使用する言語を指定しなかったが、使用する言語が決まっている場合には、使用する言語を予め指定できるように音声翻訳システムを構成してもよい。 In the above-described first embodiment, the language used in the conversation is not designated, but when the language to be used is determined, the speech translation system may be configured so that the language to be used can be designated in advance.

例えば、図３に示す本実施形態に係る音声翻訳システム１Ａは、情報端末１０Ａと、情報端末１０Ａにネットワーク３０を介して接続された翻訳サーバ２０とで構成されており、実施形態１に係る音声翻訳システムと異なり、音声出力端末は存在しない。 For example, the speech translation system 1A according to the present embodiment shown in FIG. 3 includes an information terminal 10A and a translation server 20 connected to the information terminal 10A via a network 30. Unlike the translation system, there is no voice output terminal.

情報端末１０Ａは、ユーザ１００ａ、１００ｂが話す音声を入力することができ、かつ翻訳された音声を出力することができる機能に加え、ユーザ１００ａ、１００ｂのどちらかが使用する言語を指定することができる言語指定手段（図示しない）を有するものであれば特に限定されない。 The information terminal 10A can specify a language used by either of the users 100a and 100b, in addition to the function of being able to input a voice spoken by the users 100a and 100b and outputting a translated voice. There is no particular limitation as long as it has a language specification means (not shown) that can be used.

言語指定手段としては、例えば液晶画面等の表示部に、英語表記で言語名を表示し、その言語名に指でタッチすることにより言語を指定することができるプログラム等が挙げられる。 Examples of the language designating means include a program that displays a language name in English on a display unit such as a liquid crystal screen, and can designate the language by touching the language name with a finger.

情報端末１０Ａとしては、例えば、携帯電話、スマートフォンや上述した機能を有する専用端末などが挙げられる。なお、翻訳サーバ２０およびネットワーク３０は、実施形態１と同様である。 As the information terminal 10A, for example, a mobile phone, a smartphone, a dedicated terminal having the above-mentioned function, or the like can be given. The translation server 20 and the network 30 are the same as in the first embodiment.

本実施形態では、例えばユーザ１００ａが使用する言語は分かっているものとする。したがって、ユーザ１００ａとユーザ１００ｂとが会話を始める前に、情報端末１０に、使用する言語を言語指定手段によって予め入力することができる。入力された言語に関する情報は、翻訳サーバ２０に送信され、言語特定手段２１によって指定される言語に加えられる。すると、会話に使用される言語（少なくとも１つの言語）が誤って指定されることはなくなる。その結果、翻訳サーバ２０によって音声が誤って翻訳される確率が低減される。 In this embodiment, for example, the language used by the user 100a is known. Therefore, before the user 100a and the user 100b start a conversation, the language to be used can be input to the information terminal 10 in advance by the language designating means. The input information about the language is transmitted to the translation server 20 and added to the language specified by the language specifying means 21. Then, the language used for the conversation (at least one language) will not be erroneously specified. As a result, the probability that the translation server 20 erroneously translates the voice is reduced.

以上説明したように、本実施形態に係る音声翻訳システム１Ａを構成することによって、より正確に翻訳することができる音声翻訳システムを提供することができる。
（他の実施形態） As described above, by configuring the speech translation system 1A according to this embodiment, it is possible to provide a speech translation system that can translate more accurately.
(Other embodiments)

上述した実施形態では、音声翻訳システムを使用するユーザが２〜３人の場合について説明したが、本発明はこれに限定されず、より多くのユーザが参加する会話についても本発明に係る音声翻訳システムを使用できることは言うまでもない。 In the above-described embodiment, the case where the number of users who use the voice translation system is two or three has been described, but the present invention is not limited to this, and the voice translation according to the present invention can be applied to a conversation in which more users participate. It goes without saying that the system can be used.

また、上述した実施形態では、最終的に自動音声として翻訳された音声を出力するように音声翻訳システムを構成したが、本発明はこれに限定されない。例えば、情報端末に表示部を設け、その表示部に翻訳された音声のテキストデータを表示させるようにしてもよい。このようにしても、上述した実施形態と同様の効果が得られる。 Further, in the above-described embodiment, the voice translation system is configured so as to finally output the voice translated as the automatic voice, but the present invention is not limited to this. For example, a display unit may be provided in the information terminal and the translated text data of the voice may be displayed on the display unit. Even in this case, the same effect as that of the above-described embodiment can be obtained.

さらに、上述した実施形態では、１つの情報端末で音声翻訳システムを構成したが、複数の情報端末を翻訳サーバに接続して音声翻訳システムを構成してもよい。このように音声翻訳システムを構成しても、上述した実施形態と同様の効果が得られる。 Further, in the above-described embodiment, the voice translation system is configured by one information terminal, but the voice translation system may be configured by connecting a plurality of information terminals to the translation server. Even if the voice translation system is configured as described above, the same effect as that of the above-described embodiment can be obtained.

１、１Ａ音声翻訳システム
１０、１０Ａ情報端末
２０翻訳サーバ
２１言語特定手段
２２翻訳手段
３０ネットワーク
４０音声出力端末

1, 1A voice translation system 10, 10A information terminal 20 translation server 21 language specifying means 22 translation means 30 network 40 voice output terminal

Claims

An information terminal in which each user can input the voice of conversation using a different language, and outputs the content of the input user's voice in a language different from the language used by the user,
A speech translation system comprising a translation server which is connected to the information terminal via a network and translates the content of the user's voice input to the information terminal into a language different from the language used by the user,
The translation server has a language specifying means for specifying a plurality of languages used in the conversation from the voice of the conversation,
Translation means for translating the content of the user's voice into a language different from the language used by the user among the plurality of languages identified by the language identification means.
A voice translation system characterized by the following.

The speech translation system according to claim 1, wherein the translation unit translates using artificial intelligence.

The speech translation system according to claim 1, wherein the output from the information terminal is output as a voice.

The information terminal has a language designating means,
The speech translation system according to any one of claims 1 to 3, wherein the translation server adds the language specified by the language specifying means to the language specified by the language specifying means.

It is attached to the user's ear or its vicinity and is connected to at least one of the information terminal and the translation server via a network,
The voice translation system according to any one of claims 1 to 4, further comprising a voice output terminal that outputs, as a voice, the content of the user's voice translated into a different language by the translation server.

A speech translation method for outputting, in a conversation in which each user uses a different language, the content of the input user's voice in a language different from the language used by the user,
A language identification step of identifying a plurality of languages used in the conversation from the voice of the conversation,
A translation step of translating the content of the user's voice into a language different from the language used by the user among the plurality of languages identified by the language identification step;
A speech translation method comprising: