WO2019225028A1

WO2019225028A1 - Translation device, system, method, program, and learning method

Info

Publication number: WO2019225028A1
Application number: PCT/JP2018/038704
Authority: WO
Inventors: 海都水嶋
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2018-05-25
Filing date: 2018-10-17
Publication date: 2019-11-28
Also published as: JP2021144256A

Abstract

A translation device (2) outputs a result of machine translation to a user of a translation destination in response to an input from a user of a translation source. This translation device is provided with: first and second acquisition units (22, 26a, 26b); a control unit (20); and output units (23, 27). The first acquisition unit acquires an inputted statement in a language of the translation source. The second acquisition unit acquires user information related to the inputted statement. The control unit acquires a translated statement indicating the translation result of the inputted statement in accordance with the user information in a language of the translation destination on the basis of the inputted statement and the user information. The output unit outputs the translated statement. The user information includes information indicating the role of the user of the translation source and/or the user of the translation destination for the related inputted statement.

Description

Translation apparatus, system, method, program, and learning method

The present disclosure relates to a translation device based on machine translation, a translation system, a translation method, a program, and a learning method.

Non-Patent Document 1 proposes a technique that enables translation between multiple languages using a single neural machine translation model. In Non-Patent Document 1, a machine translation model is shared across a number of languages by introducing a token that identifies a language to be translated at the beginning of an input sentence. This achieves zero-shot translation between pairs of languages not learned in the neural machine translation model.

Non-Patent Document 2 proposes a technique for controlling honorifics in a neural machine translation model. Non-Patent Document 2 utilizes an incidental condition for controlling the level of honorifics at the translation destination when performing machine translation from a language that does not have the concept of honorifics such as English. The incidental condition is set to any one of “careful”, “not formal”, and “none”.

This disclosure provides a translation apparatus, system, method, program, and learning method that can perform translation according to a user in machine translation.

The translation device according to one aspect of the present disclosure outputs the result of machine translation to the user of the translation destination in response to the input of the user of the translation source. The translation apparatus includes a first acquisition unit, a second acquisition unit, a control unit, and an output unit. The first acquisition unit acquires an input sentence in the language of the translation source. The second acquisition unit acquires user information related to the input sentence. Based on the input sentence and the user information, the control unit acquires a translated sentence indicating a translation result of the input sentence corresponding to the user information in the language of the translation destination. The output unit outputs the translated sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.

A translation system according to an aspect of the present disclosure includes the above translation device and a machine translator. The machine translator performs machine translation based on information acquired by the translation device, and generates a translated sentence.

The translation method according to an aspect of the present disclosure is a method of executing machine translation so as to generate a translation result output to a translation destination user in response to an input from a translation source user. The method includes a step in which the first acquisition unit acquires an input sentence in the language of the translation source, and a step in which the second acquisition unit acquires user information related to the input sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. In this method, the control unit obtains a translated sentence indicating the translation result of the input sentence according to the user information in the translation destination language based on the input sentence and the user information, and the output unit outputs the translated sentence Including the step of.

The program according to an aspect of the present disclosure is a program that causes a computer to execute a process of outputting a machine translation result to a translation destination user in response to an input of a translation source user. The program includes a step in which the computer acquires an input sentence in the language of the translation source, and a step in which user information related to the input sentence is acquired. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. The program includes a step in which a computer acquires a translated sentence indicating a translation result of an input sentence corresponding to user information in a translation destination language based on the input sentence and user information, and a step of outputting the translated sentence. .

The learning method according to an aspect of the present disclosure is a method for obtaining a translation model in which machine translation from a translation source user to a translation destination user is realized in computer machine learning. A parameter group that defines a translation model based on machine learning is stored in the storage unit of the computer. The method includes a step in which a computer inputs information associating an input sentence and user information in the language of the translation source to the translation model being learned, and causes the translation model to generate a translation sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. The method includes the step of the computer adjusting the parameter group according to the generated translation.

According to the translation apparatus, system, method, program, and learning method according to the present disclosure, translation according to the user can be performed in machine translation.

The figure which shows the outline | summary of the translation system which concerns on Embodiment 1 of this indication. 1 is a block diagram illustrating a configuration of a translation apparatus according to a first embodiment. 1 is a block diagram illustrating a configuration of a translation server according to the first embodiment. Diagram illustrating translation method by translation system 6 is a flowchart illustrating processing of the translation apparatus according to the first embodiment. Diagram for explaining the usage example of translation system The figure for demonstrating the training data in the learning method of Embodiment 1. 6 is a flowchart illustrating processing of a translation model learning method according to the first embodiment; The figure which shows the example of a display of the display part in a translation apparatus The figure which shows the outline | summary of the translation system which concerns on Embodiment 2. Flowchart illustrating processing of translation apparatus according to embodiment 2

Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.

The applicant provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and is not intended to limit the subject matter described in the claims. Absent.

(Embodiment 1)
Hereinafter, Embodiment 1 of the present disclosure will be described with reference to the drawings.

1. Configuration 1-1. System Overview A translation system according to Embodiment 1 will be described with reference to FIG. FIG. 1 is a diagram showing an outline of a translation system 1 according to the present embodiment.

The translation system 1 according to the present embodiment includes a translation device 2 and

various servers

3, 11, and 12 as shown in FIG. 1. The translation system 1 inputs the utterance of one user as a translation source from the translation device 2 so as to enable dialogue between the

users

5a and 5b using languages different from each other, and changes to the translation destination language for the other user. Machine translation is performed.

The translation system 1 of the present embodiment is applicable to scenes such as customer service including various types of guidance in various industries such as airports, hotels, and restaurants. In the following description, the user 5a in the role of the host that serves customers is abbreviated as “host 5a”, and the user 5a in the role of guest that receives customers is abbreviated as “guest 5b”. The translation system 1 of the present embodiment realizes appropriate translation of machine translation as dialogue between the host 5a and the guest 5b in various scenes.

In this embodiment, the translation apparatus 2 performs data communication with the

various servers

3, 11, and 12 via the communication network 10 such as the Internet. The translation system 1 may include a plurality of translation devices 2. In this case, the

server

3, 11 and 12 can appropriately transmit the data to the translation device 2 indicated by the received identification information by including the identification information of the own device in the data transmitted by each translation device 2.

The

various servers

3, 11, 12 of the translation system 1 are, for example, APS servers, and include the translation server 3, the speech recognition server 11, and the speech synthesis server 12. The translation server 3 is an example of a machine translator that executes machine translation in the translation method of the present embodiment. The speech recognition server 11 has a speech recognition function for an input sentence to be machine-translated. The speech synthesis server 12 has a speech synthesis function for translated sentences indicating the result of machine translation. Details of the configuration of the translation system 1 will be described below.

1-2. Configuration of Translation Device The configuration of the translation device 2 in the translation system 1 of the present embodiment will be described with reference to FIGS. FIG. 2 is a block diagram illustrating the configuration of the translation apparatus 2.

The translation device 2 is composed of an information terminal such as a tablet terminal, a smartphone or a PC. The translation device 2 illustrated in FIG. 2 includes a control unit 20, a storage unit 21, an operation unit 22, a display unit 23, a device interface 24, and a network interface 25. Hereinafter, the interface is abbreviated as “I / F”. For example, the translation apparatus 2 includes two

microphones

26 a and 26 b and a speaker 27.

In the translation apparatus 2 of the present embodiment, as shown in FIG. 1, one of the two

microphones

26a and 26b is a host microphone 26a used by the host 5a, and the other is a guest microphone used by the guest 5b. This is the microphone 26b. Each of the

microphones

26a and 26b is an input device that collects sound and inputs sound data. Each

microphone

26a, 26b is an example of an acquisition unit in the present embodiment.

The speaker 27 is an output device that outputs audio data as audio, and is an example of an output unit in the present embodiment. 1 and 2 illustrate a case where the speaker 27 is shared between the host 5a and the guest 5b. The translation apparatus 2 may include a host speaker and a guest speaker separately. The

microphones

26a and 26b and the speaker 27 may be provided externally to the information terminal that constitutes the translation device 2, or may be incorporated in the information terminal.

The control unit 20 includes, for example, a CPU or MPU that realizes a predetermined function in cooperation with software, and controls the overall operation of the translation apparatus 2. The control unit 20 reads out data and programs stored in the storage unit 21 and performs various arithmetic processes to realize various functions. For example, the control unit 20 executes a program including an instruction group for realizing the processing of the translation apparatus 2 in the translation method of the present embodiment. The above program may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.

Note that the control unit 20 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The control unit 20 may be configured by various semiconductor integrated circuits such as a CPU, MPU, GPU, GPGPU, TPU, microcomputer, DSP, FPGA, and ASIC.

The storage unit 21 is a storage medium that stores programs and data necessary to realize the functions of the translation apparatus 2. As shown in FIG. 2, the storage unit 21 includes a storage unit 21a and a temporary storage unit 21b.

The storage unit 21a stores parameters, data, a control program, and the like for realizing a predetermined function. The storage unit 21a is configured with, for example, an HDD or an SSD. For example, the storage unit 21a stores the above program.

The temporary storage unit 21b is configured by a RAM such as a DRAM or an SRAM, for example, and temporarily stores (that is, holds) data. For example, the temporary storage unit 21b holds an input sentence, a translated sentence, user information described later, and the like. The temporary storage unit 21 b may function as a work area for the control unit 20 or may be configured by a storage area in the internal memory of the control unit 20.

The operation unit 22 is a user interface that is operated by a user. FIG. 1 shows an example in which the operation unit 22 forms a touch panel together with the display unit 23. The operation unit 22 is not limited to a touch panel, and may be a keyboard, a touch pad, buttons, switches, or the like, for example. The operation unit 22 is an example of an acquisition unit that acquires various information input by a user operation.

The display unit 23 is an example of an output unit configured with, for example, a liquid crystal display or an organic EL display. The display unit 23 performs output display of information for outputting a translated sentence to the user, for example. The display unit 23 may display various information such as various icons for operating the operation unit 22 and information input from the operation unit 22.

The device I / F 24 is a circuit for connecting an external device to the translation device 2. The device I / F 24 is an example of a communication unit that performs communication according to a predetermined communication standard. The predetermined standard includes USB, HDMI (registered trademark), IEEE 1395, WiFi, Bluetooth (registered trademark), and the like. The device I / F 24 may constitute an acquisition unit that receives various information or an output unit that transmits information to an external device in the translation apparatus 2.

The network I / F 25 is a circuit for connecting the translation apparatus 2 to the communication network 10 via a wireless or wired communication line. The network I / F 25 is an example of a communication unit that performs communication based on a predetermined communication standard. The predetermined communication standard includes communication standards such as IEEE802.3, IEEE802.11a / 11b / 11g / 11ac. The network I / F 25 may constitute an acquisition unit that receives various information or an output unit that transmits the information via the communication network 10 in the translation apparatus 2.

The configuration of the translation device 2 as described above is an example, and the configuration of the translation device 2 is not limited to this. For example, the translation apparatus 2 does not have to include the host microphone 26a and the guest microphone 26b. For example, a microphone shared between the host 5a and the guest 5b may be used. Moreover, the translation apparatus 2 may be comprised with the various computer which is not restricted to an information terminal.

Further, the acquisition unit in the translation apparatus 2 may be realized by cooperation with various software in the control unit 20 or the like. The acquisition unit in the translation device 2 acquires various information by reading various information stored in various storage media (for example, the storage unit 21a) into the work area (for example, the temporary storage unit 21b) of the control unit 20. There may be. Each of the various acquisition units described above may be a first acquisition unit that acquires an input sentence as a translation source, or may be a second acquisition unit that acquires user information related to the input sentence. The first and second acquisition units may be combined with one hardware element.

1-3. Server Configuration As an example of the hardware configuration of the

various servers

3, 11, and 12 in the translation system 1 of the present embodiment, the configuration of the translation server 3 will be described with reference to FIG. 3. FIG. 3 is a block diagram illustrating the configuration of the translation server 3 in this embodiment.

The translation server 3 illustrated in FIG. 3 includes an arithmetic processing unit 30, a storage unit 31, and a communication unit 32. The translation server 3 is composed of one or a plurality of computers.

The arithmetic processing unit 30 includes, for example, a CPU and a GPU that realize predetermined functions in cooperation with software, and controls the operation of the translation server 3. The arithmetic processing unit 30 reads out data and programs stored in the storage unit 31 and performs various arithmetic processes to realize various functions.

For example, the arithmetic processing unit 30 executes a program of the translation model 35 that executes machine translation in the translation method of the present embodiment. The translation model 35 is composed of various neural networks, for example. For example, the translation model 35 may be a neural machine translation model shared between multiple languages (see, for example, Non-Patent Document 1). The arithmetic processing unit 30 may execute a program for performing machine learning of the translation model 35. Each of the above programs may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.

Note that the arithmetic processing unit 30 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The arithmetic processing unit 30 may be configured by various semiconductor integrated circuits such as a CPU, GPU, TPU, MPU, microcomputer, DSP, FPGA, and ASIC.

The storage unit 31 is a storage medium that stores programs and data necessary for realizing the functions of the translation server 3, and includes, for example, an HDD or an SSD. The storage unit 31 may include a DRAM or SRAM, for example, and may function as a work area for the arithmetic processing unit 30. The storage unit 31 stores, for example, a program of the translation model 35 and various parameter groups that define the translation model 35 based on machine learning. The parameter group includes various weight parameters of a neural network, for example.

The communication unit 32 is an I / F circuit for performing communication according to a predetermined communication standard, and communicatively connects the translation server 3 to the communication network 10 or an external device. The predetermined communication standards include IEEE 802.3, IEEE 802.11a / 11b / 11g / 11ac, USB, HDMI, IEEE 1395, WiFi, Bluetooth, and the like.

In the same configuration as the translation server 3 as described above, for example, a program for a speech recognition function or a speech synthesis function is appropriately introduced instead of the translation model 35, so that the speech recognition server 11 and the speech synthesis server 12 are introduced. Can be configured. The

various servers

3, 11, 12 in the translation system 1 are not limited to the above configuration, and may have various configurations. The translation method of the present embodiment may be executed in cloud computing. Further, hardware resources that realize the functions of the

various servers

3, 11, and 12 may be shared.

In the translation system 1, the speech recognition server 11 and the speech synthesis server 12 may be omitted. For example, the translation apparatus 2 may have a voice recognition function. For example, voice data generated by the

microphones

26a and 26b may be voice-recognized and converted into text data. The translation device 2 may have a speech synthesis function. For example, the translation device 2 may synthesize text data based on machine translation and output the speech from the speaker 27.

2. Operation The operation of the translation system 1 configured as described above will be described below.

2-1. About Translation Method A translation method by the translation system 1 according to the present embodiment will be described with reference to FIGS. FIG. 4 is a diagram illustrating a translation method by the translation system 1.

The translation system 1 according to the present embodiment executes machine translation into the language of the translation destination, using the language of the speaker as the translation source language each time one of the speakers speaks during the dialogue between the host 5a and the guest 5b. For example, the language of the translation source may be recognized by speech from the utterance of the speaker, or may be set by operating the translation device 2 or the like. Moreover, the language of a translation destination is suitably set according to the other user who is not a speaker, for example. In the host 5a and the guest 5b, the speaker is an example of a translation source user, and the counterpart is an example of a translation destination user.

In the example of FIG. 4 (a), the host 5a utters the Japanese input sentence 51 "When is the flight?" As a speaker. When the utterance voice of the input sentence 51 is input, the translation device 2 of the present embodiment can recognize the input sentence 51 by using, for example, voice recognition of the voice recognition server 11 shown in FIG. At this time, the translation apparatus 2 of the present embodiment can acquire information related to the user such as the speaker in addition to the input sentence 51.

In the present embodiment, as shown in FIG. 1, the translation server 3 executes machine translation based on information acquired by the translation device 2 and generates a translation that indicates the translation result of the input sentence in the translation destination language. To do. The translation system 1 of this embodiment uses user information indicating whether the speaker is a host 5a or a guest 5b in machine translation of an input sentence, and is exemplified in FIGS. 4 (a) and 4 (b). Realize the translation as you do.

In the example of FIG. 4A, a translated sentence 61 “When is your flight?” Is output from the speaker 27 based on the machine translation from the translation source Japanese to the translation destination English. In the translation system 1 according to the present embodiment, the speech synthesis server 12 can synthesize a translated sentence 61.

FIG. 4B shows an example in which an input sentence 51 having the same language and content as FIG. 4A is spoken by the guest 5b. In the example of FIG. 4B, the translation device 2 outputs a translated sentence 62 “When is my flight?” Having a content different from that of the translated sentence 61 of FIG.

In the example of FIGS. 4A and 4B, it is assumed that a dialogue is performed between a host 5a such as an airport staff and a guest 5b scheduled to board at an airport counter. In such a scene, it is considered that a translation result suitable for the flight passenger being the guest 5b is appropriate. In the example of FIG. 4A, since the speaker is the host 5a, it is understood that “your” in the translated sentence 61 indicates the guest 5b, and an appropriate translation result is obtained. On the other hand, when the speaker is the guest 5b and the same translated sentence as the translated sentence 61 in FIG. 4A is output, “your” in the translated sentence indicates the host 5a and becomes inappropriate. .

Here, in the conventional machine translation technique, there is basically one way for the input sentence 51 that assumes two kinds of translated

sentences

61 and 62 as shown in FIGS. 4 (a) and 4 (b). Only the translation was translated, and it was not possible to determine which was appropriate. As described above, in the prior art, it has been difficult to realize the translation of the translated sentence for the same input sentence.

On the other hand, in the translation system 1 of the present embodiment, user information indicating the speaker is sequentially acquired by the translation device 2 during the dialogue between the host 5a and the guest 5b and used for machine translation of the corresponding input sentence 51. In this way, appropriate translation can be realized. Details of the operation of the translation system 1 in the translation method will be described below.

2-1-1. Operation of Translation Device The operation of the translation device 2 in the above translation method will be described with reference to FIG.

FIG. 5 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment. Each process of the flowchart shown in FIG. 5 is executed by the control unit 20 of the translation apparatus 2. This flowchart is started when, for example, one of the host 5a and the guest 5b utters a desired input sentence.

First, the control unit 20 of the translation apparatus 2 inputs the voice data of the voices spoken by the speaker from the host microphone 26a or the guest microphone 26b (S1). The voice data of the uttered voice is an example of information indicating an input sentence by the utterance of the speaker. The control unit 20 may select one of the microphones based on the volume of the two

microphones

26a and 26b or according to various operations of the speaker.

Next, the control unit 20 acquires an input sentence indicated by the uttered voice via, for example, the network I / F 25 (S2). Specifically, the translation device 2 transmits the voice data of the uttered voice to the voice recognition server 11 via the communication network 10. The speech recognition server 11 executes speech recognition processing based on the speech data from the translation device 2, generates text data as a speech recognition result, and transmits it to the translation device 2. The network I / F 25 of the translation device 2 receives the input sentence of the generated text data from the speech recognition server 11.

Further, the control unit 20 executes processing for specifying the speaker information using the

microphones

26a and 26b as acquisition units, for example (S3 to S5). The speaker information is an example of user information indicating “host” or “guest” as the current speaker. Note that the order of processing between step S2 and steps S3 to S5 is not particularly limited, and one may be performed first or may be performed in parallel.

For example, the control unit 20 determines whether or not an utterance voice is input from the host microphone 26a (S3). In the example of FIG. 4A, based on the input of the uttered voice from the host microphone 26a (YES in S3), the control unit 20 sets the speaker information to “host” (S4). On the other hand, in the example of FIG. 4B, based on the input of the utterance voice from the guest microphone 26b (NO in S3), the control unit 20 sets the speaker information to “guest” (S5).

Next, the control unit 20 associates the acquired input sentence and speaker information with each other and transmits them to the translation server 3 (S6). For example, the control unit 20 tags the input sentence using tag information indicating “host” or “guest” in the speaker information, and transmits the tag to the translation server 3 from the network I / F 25. The information transmitted to the translation server 3 may include designation information for the language of the translation destination.

When the translation server 3 receives the input sentence associated with the speaker information from the translation device 2, the translation server 3 executes machine translation based on the learned translation model 35, for example. As a result, the translation server 3 generates a translated sentence for the received input sentence so as to indicate a translation result corresponding to “host” or “guest” indicated by the associated speaker information. The translation server 3 transmits the generated translated sentence to the translation apparatus 2 using text data or the like. In the translation apparatus 2, the control unit 20 receives information indicating the translated sentence from the translation server 3 via the network I / F 25 (S7).

Next, the control unit 20 outputs a translation result such as a voice output of a translated sentence (S8). For example, the control unit 20 transmits the text data of the translated sentence to the speech synthesis server 12 and causes the speech synthesis server 12 to perform speech synthesis processing of the translated sentence. The control unit 20 receives the voice data as a processing result from the voice synthesis server 12 and controls the voice output from the speaker 27. In addition to or instead of the voice output, the control unit 20 may output and display a text image of the translated sentence on the display unit 23.

When the control unit 20 outputs the translation result (S8), the process according to this flowchart is terminated. The translation result is output to the user of the translation apparatus 2 by the process of step S8.

According to the above processing, the host microphone 26a and the guest microphone 26b in the translation apparatus 2 function as an input sentence and speaker information acquisition unit by inputting a corresponding speaker's speech (S1). To S5). In the translation device 2, the corresponding speaker information is acquired every time the utterance voice of the input sentence is input (S1 to S5), and appropriate so that the speaker is translated into the case of the host 5a and the case of the host 5a. Can output a translated sentence (S6 to S8).

2-1-2. For example, according to the translation system 1 of the present embodiment, as shown in FIGS. 4A and 4B, even if the input sentence 51 by the utterance is the same, the speaker information

Different translations

61 and 62 are output from the translation device 2 based on whether “I” is “host” or “guest”. The translated sentence 62 in the example of FIG. 4B includes “my” instead of “your” in the translated sentence 61 of FIG. 4A, and a word indicating the guest 5 b when the guest 5 b speaks, Translated properly.

The translation of the example of FIGS. 4A and 4B is due to the ambiguity that the subject of the topic such as “your” or “my” in the translated

sentences

61 and 62 is not clearly shown in the input sentence 51. ing. Such ambiguity of the subject may occur frequently when the translation source is Japanese, for example. According to the translation system 1 of the present embodiment, even if the subject is ambiguous in the input sentence, by acquiring the speaker information, the subject is identified implicitly, and the translation is appropriately translated. Can do. In particular, in a relationship between users such as the host 5a and the guest 5b whose roles are clear, the subject of the topic in the dialogue can be estimated relatively from the speaker.

Also, the above-described translation using the speaker information in the translation system 1 can be applied to various ambiguous input sentences. Another example of an ambiguous input sentence will be described with reference to FIGS. 6 (a) and 6 (b).

FIGS. 6A and 6B show examples of using the translation system 1 when the translation destination is Japanese. FIG. 6A shows an example in which the speaker uses the host 5a, and FIG. 6B shows an example in which the speaker uses the guest 5b.

6 (a) and 6 (b), in a scene where baggage is delivered between the host 5a and the guest 5b at the airport counter, an English input sentence 52 "Do you have the bag?" The case where the speech is made from 5a or the guest 5b is illustrated. In such a situation, the contents of the above-mentioned input sentence 52 have different meanings in Japanese when the host 5a says to the guest 5b and when the guest 5b tells the host 5a, and from the viewpoint of honorifics However, it is considered appropriate to use another language. Therefore, the translation system 1 according to the present embodiment performs translation based on the speaker information by the above-described processing.

Specifically, in the example of FIG. 6A, the translation device 2 outputs a Japanese translation sentence 63 “Do you have a bag?” For the input sentence 52 described above. The translated sentence 63 is considered to be natural Japanese as an utterance that the host 5a on the side of the baggage of the speaker treats the guest 5b. Moreover, the translated sentence 63 includes a respected expression “o” in which the speaker respects the actions and possessions of the other party such as the guest 5b, and is appropriately worded.

On the other hand, in the example of FIG. 6B, the translation device 2 outputs a Japanese translation 64 "Do you have a bag?" The translated sentence 64 is considered to be natural Japanese as a remark that the guest 5b on the side where the speaker has checked his / her baggage prompts the host 5a to confirm. In addition, the language of the translated sentence 64 is considered to be appropriate when the speaker is the guest 5b without including excessive honorifics although it is increasingly toned.

In addition, because there is a strong honorific concept in Japanese that depends on the position of the users during the conversation, if the translation destination is Japanese, the appropriate wording as a translated sentence from the input sentence alone Can be ambiguous. With regard to such ambiguity of wording, according to the translation method of the present embodiment, it is possible to obtain a translation with appropriate wording according to the positions of the speaker and the other party based on the speaker information.

In the above example, the translation is described when Japanese is the translation source or translation destination and English is the translation destination or translation source. However, the translation method of the present embodiment is not limited to Japanese or English, and various Applicable to any language. Machine translation using speaker information translates the translation so that it is appropriate when the speaker speaks the content of the input sentence to the other party in the target language according to various common sense in various languages. Can be divided.

2-2. About Translation Model Learning Method Machine translation in the translation method of the translation system 1 as described above can be realized by machine learning, for example. The learning method in the present embodiment will be described with reference to FIGS.

FIG. 7 is a diagram for explaining the training data D1 in the learning method of the present embodiment. FIG. 8 is a flowchart illustrating the process of the translation model 35 learning method according to this embodiment.

In the present embodiment, an example of performing machine learning of the translation model 35 of the translation server 3 which is an example of a machine translator in the learning method using the training data D1 will be described. The training data D1 constitutes a bilingual corpus between the translation source language and the translation destination language, for example. FIG. 7A illustrates a case where the translation source is Japanese and the translation destination is English. FIG. 7B illustrates a case where the translation source is English and the translation destination is Japanese.

Training data D1 records “speaker information”, “source language sentence”, and “target language sentence” in association with each other, for example, as shown in FIGS. 7 (a) and 7 (b). The “source language sentence” is an example sentence of an input sentence for causing the translation model 35 to learn, and is described in the language of the translation source. The “target language sentence” indicates the correct answer of the translated sentence based on the “speaker information” when the corresponding “source language sentence” is translated into the language of the translation destination. In the training data D1 of the present embodiment, the speaker information is associated with a set of source language sentences and target language sentences by tagging “host” or “guest”, for example. Each target language sentence includes a natural expression when the content of the input sentence is uttered in the corresponding speaker information.

For example, as shown in FIG. 7A, in the training data D1, the Japanese source language sentence “When do you leave?” Is the target language sentence “When do you” when the speaker information is “host”. is associated with “start?”. The source language sentence having the same content as described above is associated with the target language sentence “When do we start?” When the speaker information is “guest”. The above two target language sentences include different subjects depending on the difference in the speaker information. The training data D1 may include a source language sentence associated with only one of “host” and “guest”.

The processing of the learning method using the training data D1 as described above is illustrated in FIG. Each process of the flowchart shown in FIG. 8 is executed by, for example, the arithmetic processing unit 30 of the translation server 3. The flowchart starts with, for example, the training data D1 stored in the storage unit 31 and the parameter group of the translation model 35 set to initial values, that is, the translation model 35 to be learned is prepared.

First, the arithmetic processing unit 30 refers to the training data D1 in the storage unit 31 and inputs the source language sentence and the speaker information associated with the training data D1 to the translation model 35 to be learned (S11). The speaker information is input to the translation model 35 in association with the source language sentence as tag information, for example. The language of the translation destination is designated in advance, for example.

Next, the arithmetic processing unit 30 executes machine translation based on the input information in the translation model 35 being learned (S12). In step S12, the arithmetic processing unit 30 causes the translation model 35 to generate a translation sentence according to the current parameter group.

Next, the arithmetic processing unit 30 adjusts the parameter group based on the error between the translated sentence obtained by the translation model 35 being learned and the corresponding target language sentence (S13). The processing in step S13 is performed according to the error back propagation method or the like with reference to the target language sentence associated with the source language sentence input to the translation model 35 in the training data D1.

Next, the arithmetic processing unit 30 determines whether learning of the translation model 35 is completed based on a predetermined learning end condition (S14). The learning end condition is set in advance according to, for example, the number of learnings. If the learning of the translation model 35 has not been completed (NO in S14), the arithmetic processing unit 30 performs the processes after step S11. Each time the processes of steps S11 to S13 are repeated, the parameter group of the translation model 35 is updated.

When the learning of the translation model 35 is completed (YES in S14), the arithmetic processing unit 30 records the final parameter group values in the storage unit 31, and determines the parameter group that defines the learned translation model 35. (S15).

The arithmetic processing unit 30 determines the parameter group of the learned translation model 35 (S15), and ends the process according to the flowchart of FIG.

According to the above processing, the translation model 35 in which the translation of the translation sentence according to the speaker information is learned can be obtained by machine learning using the training data D1 including the speaker information. According to the learning method of the present embodiment, it is possible to generate a translation model 35 that has acquired common sense that is considered natural in various scenes in order to perform appropriate translation between the host 5a and the guest 5b.

The same processing as step S12 in the above processing is performed between steps S6 and S7 in FIG. 5 in the learned translation model 35. Based on common sense acquired by the translation model 35 by machine learning, it is possible to appropriately translate the translated sentence.

In the above description, the example in which the arithmetic processing unit 30 of the translation server 3 executes the processing of the learning method has been described. The processing of the learning method may be performed on various computers different from the translation server 3. The generated learned translation model 35 can be provided as appropriate.

In the above description, as shown in FIGS. 7A and 7B, the training data D1 associating the “source language sentence” with the “target language sentence” based on the “speaker information” is exemplified. The training data D1 is not limited to this. For example, instead of distinguishing between the source language sentence and the target language sentence, example sentences that are translated in various languages may be recorded in association with each other based on the speaker information. A source language sentence to be learned from the training data D1 may be selected by appropriately specifying a translation source language when learning the translation model 35.

3. Summary As described above, in this embodiment, the translation device 2 uses the host 5a, the guest 5b, and the like to send the machine translation result to the counterpart (ie, the translation destination) according to the input of the speaker (ie, the translation source user). Output to the user. The translation device 2 includes two

microphones

26a and 26b as an example of first and second acquisition units, a control unit 20, and a speaker 27 as an example of an output unit. As the first acquisition unit, each of the

microphones

26a and 26b acquires an input sentence in the language of the translation source. Each

microphone

26a, 26b as the second acquisition unit acquires speaker information, which is an example of user information related to the input sentence. Based on the input sentence and the speaker information, the control unit 20 obtains a translated sentence indicating the translation result of the input sentence corresponding to the user information in the translation destination language. The speaker 27 outputs a translated sentence by voice output. The speaker information includes information indicating the role of the speaker regarding the related input sentence.

According to the translation apparatus 2 described above, for example, during the dialogue between the host 5a and the guest 5b, based on the speaker information acquired according to the utterance of the input sentence by the speaker, the translation according to the speaker is performed in the machine translation. It can be carried out.

In the present embodiment, the role of at least one of the speaker and the other party in the speaker information includes at least one of “host” and “guest”. Thereby, in various scenes by the host 5a and the guest 5b, it is possible to realize appropriate translation as each role.

In addition, the first and second acquisition units in the present embodiment are not limited to the plurality of

microphones

26a and 26b, but are at least one of the

microphones

26a and 26b, the operation unit 22, the network I / F 25, and the device I / F 24, respectively. One may be included. Such an example will be described with reference to FIG.

FIG. 9 shows a display example of the display unit 23 in the translation apparatus 2. In this example, the display unit 23 that constitutes the touch panel together with the operation unit 22 displays a host utterance icon 23a, a guest utterance icon 23b, an input sentence region 23c, and a translated sentence region 23d.

The

utterance icons

23 a and 23 b are icons for inputting a touch operation for the corresponding user to start utterance to the operation unit 22. In the input sentence area 23c, an image of the input sentence is displayed according to the utterance. In the translated sentence area 23d, an image of the translated sentence is displayed according to the translation result of the input sentence.

In this example, the operation unit 22 functions as a second acquisition unit using the

speech icons

23a and 23b. For example, the translation device 2 starts processing as shown in the flowchart of FIG. 5 when one of the two

utterance icons

23a and 23b is touched. In this case, for example, instead of steps S3 to S5 in FIG. 5, the control unit 20 sets “host” when the operation unit 22 inputs the operation of the host utterance icon 23a, and inputs the operation of the guest utterance icon 23b. Then, the speaker information can be set so as to be “guest”.

In addition, the acquisition of the speaker information may be performed based on, for example, information obtained via the communication I / F 25 or the device I / F 24 or the result of speech recognition of the input sentence. For example, “host” or “guest” of the speaker information is set based on information indicating which of the languages recognized by the speech recognition corresponds to each of the host 5a and the guest 5b. Also good.

In this embodiment, the translation apparatus 2 further includes a network I / F 25 as an example of a communication unit that communicates with an external translation server 3. The control unit 20 transmits the input sentence and the speaker information to the translation server 3 via the network I / F 25, and receives the translated sentence of the transmitted input sentence from the translation server 3. Thereby, for example, the learning result of the translation model 35 based on the speaker information can be applied, and the translation server 3 can generate an appropriate translation sentence according to the speaker information from the translation device 2.

The translation system 1 in this embodiment includes a translation device 2 and a translation server 3 that is an example of a machine translator. The translation server 3 performs machine translation based on the information acquired by the translation device 2 and generates a translation. By using the speaker information in the translation system 1, it is possible to perform translation according to the speaker in machine translation.

The translation method in the present embodiment executes machine translation so as to generate a translation result output to the other party in response to an input from the speaker. In this method, the first acquisition unit obtains an input sentence in a source language (S1, S2), and the second acquisition unit obtains speaker information related to the input sentence (S3 to S5). ). The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. In this method, the control unit obtains a translated sentence indicating a translation result of the input sentence according to the speaker information in the translation destination language based on the input sentence and the speaker information (S6, S7), and an output Includes a step (S8) of outputting a translated sentence. According to this method, by using the speaker information, it is possible to perform translation according to the speaker in machine translation.

The program in the present embodiment causes a computer such as the translation device 2 to execute a process of outputting the result of machine translation to the other party in accordance with the input of the speaker. The program includes a step (S1, S2) in which the computer acquires an input sentence in the language of the translation source, and a step (S3-S5) in which speaker information related to the input sentence is acquired. The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. In this program, the computer acquires a translated sentence indicating a translation result of the input sentence according to the speaker information in the language to be translated based on the input sentence and the speaker information (S6, S7); (S8). According to this program, by using the speaker information, it is possible to perform translation according to the speaker in machine translation.

The learning method in the present embodiment is a method for generating a translation model 35 that obtains a translation model 35 in which machine translation from a speaker to a partner is realized in machine learning of a computer such as the translation server 3. The storage unit 31 of the computer stores a parameter group that defines the translation model 35 based on machine learning. In this method, the computer inputs information relating the input sentence and the speaker information in the language of the translation source to the translation model 35 being learned, and causes the translation model 35 to generate a translation sentence (S11, S12). The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. The method includes a step (S13) in which the computer adjusts the parameter group according to the generated translation. According to this method, it is possible to generate a translation model 35 learned to perform translation according to the speaker in machine translation.

(Embodiment 2)
The second embodiment will be described below with reference to the drawings. In the first embodiment, the example using the machine translator that has learned the utterances of both the host and the guest has been described. In the second embodiment, an example in which a plurality of machine translators for hosts and guests is used will be described.

Hereinafter, description of the configuration and operation similar to those of the translation system 1 according to the first embodiment will be omitted as appropriate, and the translation system according to the present embodiment will be described.

FIG. 10 is a diagram showing an outline of the translation system 1A according to the second embodiment. As shown in FIG. 10, the translation system 1A according to the present embodiment includes a host translation server 3a and a guest translation server 3b instead of one translation server 3 in the same configuration as in the first embodiment. Prepare. The two

translation servers

3a and 3b are an example of a plurality of machine translators in the present embodiment.

Each

translation server

3a, 3b is configured in the same manner as the translation server 3 of the first embodiment, for example. For example, the host translation server 3a has a translation model in which machine learning about the utterance of the host is performed. The guest translation server 3b has a translation model in which machine learning about a guest's utterance is performed.

FIG. 11 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment. In this embodiment, the control part 20 of the translation apparatus 2 performs step S6A, S6B instead of step S6 of FIG. 5, for example.

Specifically, when the speaker information is “host” (S4), the control unit 20 transmits the acquired input sentence from the network I / F to the host translation server 3a (S6A). On the other hand, when the speaker information is “guest” (S5), the translation apparatus 2 transmits the acquired input sentence from the network I / F to the guest translation server 3b (S6B). Thereby, the translation apparatus 2 receives the translation sentence of a translation result from the translation server selected according to speaker information from the two

translation servers

3a and 3b (S7).

As described above, in the translation apparatus 2 of the present embodiment, the communication I / F 25 communicates with the

translation servers

3a and 3b, which are a plurality of external machine translators. The control unit 20 transmits an input sentence to different machine translators according to user information such as speaker information via the communication I / F 25 (S6A, S6B). The control unit 20 receives the translated sentence of the transmitted input sentence from the machine translator via the communication I / F 25 (S7). Also according to this, based on the speaker information acquired according to the utterance of the input sentence by the speaker, translation according to the speaker can be performed in the machine translation.

(Other embodiments)
As described above,

Embodiments

1 and 2 have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can also be applied to an embodiment in which changes, substitutions, additions, omissions, and the like are appropriately performed. Moreover, it is also possible to combine each component demonstrated by each said embodiment into a new embodiment. Accordingly, other embodiments will be exemplified below.

In the first and second embodiments, the speaker information is described as an example of the user information. The user information is not limited to the speaker information, but may be information about the other party of the speaker, for example. For example, when the speaker is the host 5a or the guest 5b, by acquiring the user information “guest” or “host” of the other party, appropriate translation can be realized as in the first embodiment. Further, the user information may include information about both the speaker and the other party (that is, the translation source and translation destination users).

Further, in each of the above embodiments, “host” and “guest” are exemplified as user roles indicated by the user information. The role in the user information is not limited to this, and may be various roles such as “teacher” and “student” or “superior” and “subordinate”. This also makes it possible to realize appropriate translation according to the role indicated by the user information.

Further, the user information may include additional information related to other users in addition to the above role. For example, the user information may include information indicating at least one of the gender and age of the translation source user, the gender and age of the translation destination user, and the scene of the interaction between the translation source and translation destination users. . For example, additional tagging according to various types of additional information may be performed on the input sentence, the training data D1, and the like. Thereby, according to various kinds of additional information, for example, wording according to an adult / child or a man / woman can be appropriately translated.

Further, the user information may be used for a process of correcting the input sentence before machine translation of the input sentence. For example, when the input sentence has an ambiguity of the subject, the control unit 20 or the arithmetic processing unit 30 may correct the input sentence so as to complement the ambiguous subject based on the user information. Also by this, by translating the corrected input sentence, it is possible to realize the translation including the proper subject according to the user information.

Also, in each of the above-described embodiments, the translation systems 1 and 1A in which an input sentence is input by voice have been described. In the present embodiment, the input sentence may not be input by voice, for example, may be input by text. For example, the translation source user may input a text input sentence into the translation device 2 by operating the operation unit 22 instead of speaking. The translation system of this embodiment can omit the voice recognition function. Moreover, the translation system of this embodiment may abbreviate | omit a speech synthesis function, for example, may output a translation sentence by the display of the display part 23. FIG.

Also, in each of the above embodiments, an example in which machine translation of an input sentence is performed in a machine translator outside the translation apparatus 2 has been described. In the present embodiment, machine translation may be performed inside the translation apparatus 2. For example, a program similar to the translation model 35 is stored in the storage unit 21 of the translation device 2, and the control unit 20 executes the program to generate a translation according to the acquired input sentence and speaker information. May be.

In each of the above embodiments, the example in which the translation model 35 of the machine translator is configured by a neural network has been described. The translation model of the machine translator in the present embodiment is not limited to this, and may be constituted by a probabilistic model, for example. Further, the machine translator and the translation model of the present embodiment are not necessarily based on machine learning.

As described above, the embodiments have been described as examples of the technology in the present disclosure. For this purpose, the accompanying drawings and detailed description are provided.

Accordingly, among the components described in the accompanying drawings and the detailed description, not only the components essential for solving the problem, but also the components not essential for solving the problem in order to illustrate the above technique. May also be included. Therefore, it should not be immediately recognized that these non-essential components are essential as those non-essential components are described in the accompanying drawings and detailed description.

In addition, since the above-described embodiments are for illustrating the technique in the present disclosure, various modifications, substitutions, additions, omissions, and the like can be made within the scope of the claims or an equivalent scope thereof.

The translation device, system, method, program, and learning method according to the present disclosure can be applied to machine translation in various scenes.

Claims

A translation device that outputs a result of machine translation to a translation destination user in response to an input of a translation source user,
A first acquisition unit for acquiring an input sentence in the language of the translation source;
A second acquisition unit for acquiring user information related to the input sentence;
Based on the input sentence and the user information, a control unit that acquires a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
An output unit for outputting the translated sentence,
The said user information is a translation apparatus containing the information which shows the role of at least one of the said translation origin user and the said translation destination user regarding a related input sentence.
2. The translation apparatus according to claim 1, wherein one of the roles of the translation source user and the translation destination user in the user information is a host and the other role is a guest.
The user information includes information indicating at least one of the gender and age of the translation source user, the gender and age of the translation destination user, and a scene of the dialogue between the translation source and translation destination users. The translation apparatus according to claim 1 or 2.
The translation apparatus according to any one of claims 1 to 3, wherein each of the first and second acquisition units includes at least one of one or a plurality of microphones, an operation unit, a network interface, and a device interface.
A communication unit that communicates with an external machine translator;
5. The control unit according to claim 1, wherein the control unit transmits the input sentence and the user information to the machine translator via the communication unit, and receives a translated sentence of the input sentence from the machine translator. The translation apparatus according to claim 1.
A communication unit that communicates with a plurality of external machine translators;
The control unit transmits the input sentence to a machine translator selected according to the user information from a plurality of machine translators via the communication unit, and the translated sentence of the input sentence is transmitted from the machine translator. The translation device according to any one of claims 1 to 4, wherein
The translation device according to any one of claims 1 to 6,
A translation system comprising: a machine translator that performs machine translation based on information acquired by the translation device and generates the translated sentence.
A translation method for executing machine translation so as to generate a translation result output to a translation destination user in response to an input from a translation source user,
A first obtaining unit obtaining an input sentence in the language of the translation source;
The second acquisition unit is a step of acquiring user information related to the input sentence, wherein the user information is at least one of the translation source user and the translation destination user for the related input sentence A step containing information indicating the role of
A control unit, based on the input sentence and the user information, obtaining a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
A translation method comprising: an output unit outputting the translated sentence.
A program for causing a computer to execute a process of outputting a result of machine translation to a translation destination user according to an input of a translation source user,
The computer is
Obtaining an input sentence in the source language;
Obtaining user information related to the input sentence, wherein the user information is information indicating a role of at least one of the translation source user and the translation destination user for the related input sentence; Including, steps,
Based on the input sentence and the user information, obtaining a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
Outputting the translated sentence.
In computer machine learning, a learning method for obtaining a translation model in which machine translation from a translation source user to a translation destination user is realized,
The storage unit of the computer stores a parameter group that defines the translation model based on machine learning,
The computer is
Inputting information that associates an input sentence and user information in the source language to a translation model that is being learned, and generating the translation sentence in the translation model, wherein the user information is related input sentences Including information indicating the role of at least one of the user of the translation source and the user of the translation destination for:
Adjusting the parameter group according to the generated translation.