WO2019225028A1 - Translation device, system, method, program, and learning method - Google Patents

Translation device, system, method, program, and learning method Download PDF

Info

Publication number
WO2019225028A1
WO2019225028A1 PCT/JP2018/038704 JP2018038704W WO2019225028A1 WO 2019225028 A1 WO2019225028 A1 WO 2019225028A1 JP 2018038704 W JP2018038704 W JP 2018038704W WO 2019225028 A1 WO2019225028 A1 WO 2019225028A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
user
sentence
input sentence
information
Prior art date
Application number
PCT/JP2018/038704
Other languages
French (fr)
Japanese (ja)
Inventor
海都 水嶋
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2019225028A1 publication Critical patent/WO2019225028A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the present disclosure relates to a translation device based on machine translation, a translation system, a translation method, a program, and a learning method.
  • Non-Patent Document 1 proposes a technique that enables translation between multiple languages using a single neural machine translation model.
  • a machine translation model is shared across a number of languages by introducing a token that identifies a language to be translated at the beginning of an input sentence. This achieves zero-shot translation between pairs of languages not learned in the neural machine translation model.
  • Non-Patent Document 2 proposes a technique for controlling honorifics in a neural machine translation model.
  • Non-Patent Document 2 utilizes an incidental condition for controlling the level of honorifics at the translation destination when performing machine translation from a language that does not have the concept of honorifics such as English.
  • the incidental condition is set to any one of “careful”, “not formal”, and “none”.
  • This disclosure provides a translation apparatus, system, method, program, and learning method that can perform translation according to a user in machine translation.
  • the translation device outputs the result of machine translation to the user of the translation destination in response to the input of the user of the translation source.
  • the translation apparatus includes a first acquisition unit, a second acquisition unit, a control unit, and an output unit.
  • the first acquisition unit acquires an input sentence in the language of the translation source.
  • the second acquisition unit acquires user information related to the input sentence.
  • the control unit acquires a translated sentence indicating a translation result of the input sentence corresponding to the user information in the language of the translation destination.
  • the output unit outputs the translated sentence.
  • the user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.
  • a translation system includes the above translation device and a machine translator.
  • the machine translator performs machine translation based on information acquired by the translation device, and generates a translated sentence.
  • the translation method is a method of executing machine translation so as to generate a translation result output to a translation destination user in response to an input from a translation source user.
  • the method includes a step in which the first acquisition unit acquires an input sentence in the language of the translation source, and a step in which the second acquisition unit acquires user information related to the input sentence.
  • the user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.
  • the control unit obtains a translated sentence indicating the translation result of the input sentence according to the user information in the translation destination language based on the input sentence and the user information, and the output unit outputs the translated sentence Including the step of.
  • the program according to an aspect of the present disclosure is a program that causes a computer to execute a process of outputting a machine translation result to a translation destination user in response to an input of a translation source user.
  • the program includes a step in which the computer acquires an input sentence in the language of the translation source, and a step in which user information related to the input sentence is acquired.
  • the user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.
  • the program includes a step in which a computer acquires a translated sentence indicating a translation result of an input sentence corresponding to user information in a translation destination language based on the input sentence and user information, and a step of outputting the translated sentence. .
  • the learning method is a method for obtaining a translation model in which machine translation from a translation source user to a translation destination user is realized in computer machine learning.
  • a parameter group that defines a translation model based on machine learning is stored in the storage unit of the computer.
  • the method includes a step in which a computer inputs information associating an input sentence and user information in the language of the translation source to the translation model being learned, and causes the translation model to generate a translation sentence.
  • the user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.
  • the method includes the step of the computer adjusting the parameter group according to the generated translation.
  • translation according to the user can be performed in machine translation.
  • FIG. 1 is a block diagram illustrating a configuration of a translation apparatus according to a first embodiment.
  • 1 is a block diagram illustrating a configuration of a translation server according to the first embodiment.
  • Diagram illustrating translation method by translation system 6 is a flowchart illustrating processing of the translation apparatus according to the first embodiment.
  • Diagram for explaining the usage example of translation system The figure for demonstrating the training data in the learning method of Embodiment 1.
  • 6 is a flowchart illustrating processing of a translation model learning method according to the first embodiment; The figure which shows the example of a display of the display part in a translation apparatus The figure which shows the outline
  • FIG. 1 is a diagram showing an outline of a translation system 1 according to the present embodiment.
  • the translation system 1 includes a translation device 2 and various servers 3, 11, and 12 as shown in FIG. 1.
  • the translation system 1 inputs the utterance of one user as a translation source from the translation device 2 so as to enable dialogue between the users 5a and 5b using languages different from each other, and changes to the translation destination language for the other user. Machine translation is performed.
  • the translation system 1 of the present embodiment is applicable to scenes such as customer service including various types of guidance in various industries such as airports, hotels, and restaurants.
  • the user 5a in the role of the host that serves customers is abbreviated as “host 5a”
  • the user 5a in the role of guest that receives customers is abbreviated as “guest 5b”.
  • the translation system 1 of the present embodiment realizes appropriate translation of machine translation as dialogue between the host 5a and the guest 5b in various scenes.
  • the translation apparatus 2 performs data communication with the various servers 3, 11, and 12 via the communication network 10 such as the Internet.
  • the translation system 1 may include a plurality of translation devices 2.
  • the server 3, 11 and 12 can appropriately transmit the data to the translation device 2 indicated by the received identification information by including the identification information of the own device in the data transmitted by each translation device 2.
  • the various servers 3, 11, 12 of the translation system 1 are, for example, APS servers, and include the translation server 3, the speech recognition server 11, and the speech synthesis server 12.
  • the translation server 3 is an example of a machine translator that executes machine translation in the translation method of the present embodiment.
  • the speech recognition server 11 has a speech recognition function for an input sentence to be machine-translated.
  • the speech synthesis server 12 has a speech synthesis function for translated sentences indicating the result of machine translation. Details of the configuration of the translation system 1 will be described below.
  • FIG. 2 is a block diagram illustrating the configuration of the translation apparatus 2.
  • the translation device 2 is composed of an information terminal such as a tablet terminal, a smartphone or a PC.
  • the translation device 2 illustrated in FIG. 2 includes a control unit 20, a storage unit 21, an operation unit 22, a display unit 23, a device interface 24, and a network interface 25.
  • the interface is abbreviated as “I / F”.
  • the translation apparatus 2 includes two microphones 26 a and 26 b and a speaker 27.
  • one of the two microphones 26a and 26b is a host microphone 26a used by the host 5a, and the other is a guest microphone used by the guest 5b. This is the microphone 26b.
  • Each of the microphones 26a and 26b is an input device that collects sound and inputs sound data.
  • Each microphone 26a, 26b is an example of an acquisition unit in the present embodiment.
  • the speaker 27 is an output device that outputs audio data as audio, and is an example of an output unit in the present embodiment. 1 and 2 illustrate a case where the speaker 27 is shared between the host 5a and the guest 5b.
  • the translation apparatus 2 may include a host speaker and a guest speaker separately.
  • the microphones 26a and 26b and the speaker 27 may be provided externally to the information terminal that constitutes the translation device 2, or may be incorporated in the information terminal.
  • the control unit 20 includes, for example, a CPU or MPU that realizes a predetermined function in cooperation with software, and controls the overall operation of the translation apparatus 2.
  • the control unit 20 reads out data and programs stored in the storage unit 21 and performs various arithmetic processes to realize various functions.
  • the control unit 20 executes a program including an instruction group for realizing the processing of the translation apparatus 2 in the translation method of the present embodiment.
  • the above program may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.
  • control unit 20 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function.
  • the control unit 20 may be configured by various semiconductor integrated circuits such as a CPU, MPU, GPU, GPGPU, TPU, microcomputer, DSP, FPGA, and ASIC.
  • the storage unit 21 is a storage medium that stores programs and data necessary to realize the functions of the translation apparatus 2. As shown in FIG. 2, the storage unit 21 includes a storage unit 21a and a temporary storage unit 21b.
  • the storage unit 21a stores parameters, data, a control program, and the like for realizing a predetermined function.
  • the storage unit 21a is configured with, for example, an HDD or an SSD.
  • the storage unit 21a stores the above program.
  • the temporary storage unit 21b is configured by a RAM such as a DRAM or an SRAM, for example, and temporarily stores (that is, holds) data.
  • the temporary storage unit 21b holds an input sentence, a translated sentence, user information described later, and the like.
  • the temporary storage unit 21 b may function as a work area for the control unit 20 or may be configured by a storage area in the internal memory of the control unit 20.
  • the operation unit 22 is a user interface that is operated by a user.
  • FIG. 1 shows an example in which the operation unit 22 forms a touch panel together with the display unit 23.
  • the operation unit 22 is not limited to a touch panel, and may be a keyboard, a touch pad, buttons, switches, or the like, for example.
  • the operation unit 22 is an example of an acquisition unit that acquires various information input by a user operation.
  • the display unit 23 is an example of an output unit configured with, for example, a liquid crystal display or an organic EL display.
  • the display unit 23 performs output display of information for outputting a translated sentence to the user, for example.
  • the display unit 23 may display various information such as various icons for operating the operation unit 22 and information input from the operation unit 22.
  • the device I / F 24 is a circuit for connecting an external device to the translation device 2.
  • the device I / F 24 is an example of a communication unit that performs communication according to a predetermined communication standard.
  • the predetermined standard includes USB, HDMI (registered trademark), IEEE 1395, WiFi, Bluetooth (registered trademark), and the like.
  • the device I / F 24 may constitute an acquisition unit that receives various information or an output unit that transmits information to an external device in the translation apparatus 2.
  • the network I / F 25 is a circuit for connecting the translation apparatus 2 to the communication network 10 via a wireless or wired communication line.
  • the network I / F 25 is an example of a communication unit that performs communication based on a predetermined communication standard.
  • the predetermined communication standard includes communication standards such as IEEE802.3, IEEE802.11a / 11b / 11g / 11ac.
  • the network I / F 25 may constitute an acquisition unit that receives various information or an output unit that transmits the information via the communication network 10 in the translation apparatus 2.
  • the configuration of the translation device 2 as described above is an example, and the configuration of the translation device 2 is not limited to this.
  • the translation apparatus 2 does not have to include the host microphone 26a and the guest microphone 26b.
  • a microphone shared between the host 5a and the guest 5b may be used.
  • the translation apparatus 2 may be comprised with the various computer which is not restricted to an information terminal.
  • the acquisition unit in the translation apparatus 2 may be realized by cooperation with various software in the control unit 20 or the like.
  • the acquisition unit in the translation device 2 acquires various information by reading various information stored in various storage media (for example, the storage unit 21a) into the work area (for example, the temporary storage unit 21b) of the control unit 20. There may be.
  • Each of the various acquisition units described above may be a first acquisition unit that acquires an input sentence as a translation source, or may be a second acquisition unit that acquires user information related to the input sentence.
  • the first and second acquisition units may be combined with one hardware element.
  • FIG. 3 is a block diagram illustrating the configuration of the translation server 3 in this embodiment.
  • the translation server 3 illustrated in FIG. 3 includes an arithmetic processing unit 30, a storage unit 31, and a communication unit 32.
  • the translation server 3 is composed of one or a plurality of computers.
  • the arithmetic processing unit 30 includes, for example, a CPU and a GPU that realize predetermined functions in cooperation with software, and controls the operation of the translation server 3.
  • the arithmetic processing unit 30 reads out data and programs stored in the storage unit 31 and performs various arithmetic processes to realize various functions.
  • the arithmetic processing unit 30 executes a program of the translation model 35 that executes machine translation in the translation method of the present embodiment.
  • the translation model 35 is composed of various neural networks, for example.
  • the translation model 35 may be a neural machine translation model shared between multiple languages (see, for example, Non-Patent Document 1).
  • the arithmetic processing unit 30 may execute a program for performing machine learning of the translation model 35.
  • Each of the above programs may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.
  • the arithmetic processing unit 30 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function.
  • the arithmetic processing unit 30 may be configured by various semiconductor integrated circuits such as a CPU, GPU, TPU, MPU, microcomputer, DSP, FPGA, and ASIC.
  • the storage unit 31 is a storage medium that stores programs and data necessary for realizing the functions of the translation server 3, and includes, for example, an HDD or an SSD.
  • the storage unit 31 may include a DRAM or SRAM, for example, and may function as a work area for the arithmetic processing unit 30.
  • the storage unit 31 stores, for example, a program of the translation model 35 and various parameter groups that define the translation model 35 based on machine learning.
  • the parameter group includes various weight parameters of a neural network, for example.
  • the communication unit 32 is an I / F circuit for performing communication according to a predetermined communication standard, and communicatively connects the translation server 3 to the communication network 10 or an external device.
  • the predetermined communication standards include IEEE 802.3, IEEE 802.11a / 11b / 11g / 11ac, USB, HDMI, IEEE 1395, WiFi, Bluetooth, and the like.
  • a program for a speech recognition function or a speech synthesis function is appropriately introduced instead of the translation model 35, so that the speech recognition server 11 and the speech synthesis server 12 are introduced.
  • the various servers 3, 11, 12 in the translation system 1 are not limited to the above configuration, and may have various configurations.
  • the translation method of the present embodiment may be executed in cloud computing. Further, hardware resources that realize the functions of the various servers 3, 11, and 12 may be shared.
  • the speech recognition server 11 and the speech synthesis server 12 may be omitted.
  • the translation apparatus 2 may have a voice recognition function.
  • voice data generated by the microphones 26a and 26b may be voice-recognized and converted into text data.
  • the translation device 2 may have a speech synthesis function.
  • the translation device 2 may synthesize text data based on machine translation and output the speech from the speaker 27.
  • FIG. 4 is a diagram illustrating a translation method by the translation system 1.
  • the translation system 1 executes machine translation into the language of the translation destination, using the language of the speaker as the translation source language each time one of the speakers speaks during the dialogue between the host 5a and the guest 5b.
  • the language of the translation source may be recognized by speech from the utterance of the speaker, or may be set by operating the translation device 2 or the like.
  • the language of a translation destination is suitably set according to the other user who is not a speaker, for example.
  • the speaker is an example of a translation source user
  • the counterpart is an example of a translation destination user.
  • the host 5a utters the Japanese input sentence 51 "When is the flight?" As a speaker.
  • the translation device 2 of the present embodiment can recognize the input sentence 51 by using, for example, voice recognition of the voice recognition server 11 shown in FIG.
  • the translation apparatus 2 of the present embodiment can acquire information related to the user such as the speaker in addition to the input sentence 51.
  • the translation server 3 executes machine translation based on information acquired by the translation device 2 and generates a translation that indicates the translation result of the input sentence in the translation destination language.
  • the translation system 1 of this embodiment uses user information indicating whether the speaker is a host 5a or a guest 5b in machine translation of an input sentence, and is exemplified in FIGS. 4 (a) and 4 (b). Realize the translation as you do.
  • a translated sentence 61 “When is your flight?” Is output from the speaker 27 based on the machine translation from the translation source Japanese to the translation destination English.
  • the speech synthesis server 12 can synthesize a translated sentence 61.
  • FIG. 4B shows an example in which an input sentence 51 having the same language and content as FIG. 4A is spoken by the guest 5b.
  • the translation device 2 outputs a translated sentence 62 “When is my flight?” Having a content different from that of the translated sentence 61 of FIG.
  • FIGS. 4A and 4B it is assumed that a dialogue is performed between a host 5a such as an airport staff and a guest 5b scheduled to board at an airport counter. In such a scene, it is considered that a translation result suitable for the flight passenger being the guest 5b is appropriate.
  • a host 5a such as an airport staff
  • guest 5b scheduled to board at an airport counter.
  • a translation result suitable for the flight passenger being the guest 5b is appropriate.
  • the speaker since the speaker is the host 5a, it is understood that “your” in the translated sentence 61 indicates the guest 5b, and an appropriate translation result is obtained.
  • the speaker is the guest 5b and the same translated sentence as the translated sentence 61 in FIG. 4A is output, “your” in the translated sentence indicates the host 5a and becomes inappropriate. .
  • the translation system 1 of the present embodiment user information indicating the speaker is sequentially acquired by the translation device 2 during the dialogue between the host 5a and the guest 5b and used for machine translation of the corresponding input sentence 51. In this way, appropriate translation can be realized. Details of the operation of the translation system 1 in the translation method will be described below.
  • FIG. 5 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment. Each process of the flowchart shown in FIG. 5 is executed by the control unit 20 of the translation apparatus 2. This flowchart is started when, for example, one of the host 5a and the guest 5b utters a desired input sentence.
  • the control unit 20 of the translation apparatus 2 inputs the voice data of the voices spoken by the speaker from the host microphone 26a or the guest microphone 26b (S1).
  • the voice data of the uttered voice is an example of information indicating an input sentence by the utterance of the speaker.
  • the control unit 20 may select one of the microphones based on the volume of the two microphones 26a and 26b or according to various operations of the speaker.
  • the control unit 20 acquires an input sentence indicated by the uttered voice via, for example, the network I / F 25 (S2).
  • the translation device 2 transmits the voice data of the uttered voice to the voice recognition server 11 via the communication network 10.
  • the speech recognition server 11 executes speech recognition processing based on the speech data from the translation device 2, generates text data as a speech recognition result, and transmits it to the translation device 2.
  • the network I / F 25 of the translation device 2 receives the input sentence of the generated text data from the speech recognition server 11.
  • control unit 20 executes processing for specifying the speaker information using the microphones 26a and 26b as acquisition units, for example (S3 to S5).
  • the speaker information is an example of user information indicating “host” or “guest” as the current speaker. Note that the order of processing between step S2 and steps S3 to S5 is not particularly limited, and one may be performed first or may be performed in parallel.
  • the control unit 20 determines whether or not an utterance voice is input from the host microphone 26a (S3).
  • the control unit 20 sets the speaker information to “host” (S4).
  • the control unit 20 sets the speaker information to “guest” (S5).
  • control unit 20 associates the acquired input sentence and speaker information with each other and transmits them to the translation server 3 (S6).
  • the control unit 20 tags the input sentence using tag information indicating “host” or “guest” in the speaker information, and transmits the tag to the translation server 3 from the network I / F 25.
  • the information transmitted to the translation server 3 may include designation information for the language of the translation destination.
  • the translation server 3 When the translation server 3 receives the input sentence associated with the speaker information from the translation device 2, the translation server 3 executes machine translation based on the learned translation model 35, for example. As a result, the translation server 3 generates a translated sentence for the received input sentence so as to indicate a translation result corresponding to “host” or “guest” indicated by the associated speaker information. The translation server 3 transmits the generated translated sentence to the translation apparatus 2 using text data or the like. In the translation apparatus 2, the control unit 20 receives information indicating the translated sentence from the translation server 3 via the network I / F 25 (S7).
  • the control unit 20 outputs a translation result such as a voice output of a translated sentence (S8).
  • a translation result such as a voice output of a translated sentence (S8).
  • the control unit 20 transmits the text data of the translated sentence to the speech synthesis server 12 and causes the speech synthesis server 12 to perform speech synthesis processing of the translated sentence.
  • the control unit 20 receives the voice data as a processing result from the voice synthesis server 12 and controls the voice output from the speaker 27.
  • the control unit 20 may output and display a text image of the translated sentence on the display unit 23.
  • the host microphone 26a and the guest microphone 26b in the translation apparatus 2 function as an input sentence and speaker information acquisition unit by inputting a corresponding speaker's speech (S1). To S5).
  • the corresponding speaker information is acquired every time the utterance voice of the input sentence is input (S1 to S5), and appropriate so that the speaker is translated into the case of the host 5a and the case of the host 5a.
  • Can output a translated sentence (S6 to S8).
  • the speaker information Different translations 61 and 62 are output from the translation device 2 based on whether “I” is “host” or “guest”.
  • the translated sentence 62 in the example of FIG. 4B includes “my” instead of “your” in the translated sentence 61 of FIG. 4A, and a word indicating the guest 5 b when the guest 5 b speaks, Translated properly.
  • the translation of the example of FIGS. 4A and 4B is due to the ambiguity that the subject of the topic such as “your” or “my” in the translated sentences 61 and 62 is not clearly shown in the input sentence 51. ing. Such ambiguity of the subject may occur frequently when the translation source is Japanese, for example. According to the translation system 1 of the present embodiment, even if the subject is ambiguous in the input sentence, by acquiring the speaker information, the subject is identified implicitly, and the translation is appropriately translated. Can do. In particular, in a relationship between users such as the host 5a and the guest 5b whose roles are clear, the subject of the topic in the dialogue can be estimated relatively from the speaker.
  • FIGS. 6A and 6B show examples of using the translation system 1 when the translation destination is Japanese.
  • FIG. 6A shows an example in which the speaker uses the host 5a
  • FIG. 6B shows an example in which the speaker uses the guest 5b.
  • the translation system 1 performs translation based on the speaker information by the above-described processing.
  • the translation device 2 outputs a Japanese translation sentence 63 “Do you have a bag?” For the input sentence 52 described above.
  • the translated sentence 63 is considered to be natural Japanese as an utterance that the host 5a on the side of the baggage of the speaker treats the guest 5b.
  • the translated sentence 63 includes a respected expression “o” in which the speaker respects the actions and possessions of the other party such as the guest 5b, and is appropriately worded.
  • the translation device 2 outputs a Japanese translation 64 "Do you have a bag?"
  • the translated sentence 64 is considered to be natural Japanese as a remark that the guest 5b on the side where the speaker has checked his / her baggage prompts the host 5a to confirm.
  • the language of the translated sentence 64 is considered to be appropriate when the speaker is the guest 5b without including excessive honorifics although it is increasingly toned.
  • the translation destination is Japanese
  • the appropriate wording as a translated sentence from the input sentence alone Can be ambiguous.
  • the translation is described when Japanese is the translation source or translation destination and English is the translation destination or translation source.
  • the translation method of the present embodiment is not limited to Japanese or English, and various Applicable to any language.
  • Machine translation using speaker information translates the translation so that it is appropriate when the speaker speaks the content of the input sentence to the other party in the target language according to various common sense in various languages. Can be divided.
  • FIG. 7 is a diagram for explaining the training data D1 in the learning method of the present embodiment.
  • FIG. 8 is a flowchart illustrating the process of the translation model 35 learning method according to this embodiment.
  • the training data D1 constitutes a bilingual corpus between the translation source language and the translation destination language, for example.
  • FIG. 7A illustrates a case where the translation source is Japanese and the translation destination is English.
  • FIG. 7B illustrates a case where the translation source is English and the translation destination is Japanese.
  • Training data D1 records “speaker information”, “source language sentence”, and “target language sentence” in association with each other, for example, as shown in FIGS. 7 (a) and 7 (b).
  • the “source language sentence” is an example sentence of an input sentence for causing the translation model 35 to learn, and is described in the language of the translation source.
  • the “target language sentence” indicates the correct answer of the translated sentence based on the “speaker information” when the corresponding “source language sentence” is translated into the language of the translation destination.
  • the speaker information is associated with a set of source language sentences and target language sentences by tagging “host” or “guest”, for example. Each target language sentence includes a natural expression when the content of the input sentence is uttered in the corresponding speaker information.
  • the Japanese source language sentence “When do you leave?” Is the target language sentence “When do you” when the speaker information is “host”. is associated with “start?”.
  • the source language sentence having the same content as described above is associated with the target language sentence “When do we start?” When the speaker information is “guest”.
  • the above two target language sentences include different subjects depending on the difference in the speaker information.
  • the training data D1 may include a source language sentence associated with only one of “host” and “guest”.
  • FIG. 8 The processing of the learning method using the training data D1 as described above is illustrated in FIG.
  • Each process of the flowchart shown in FIG. 8 is executed by, for example, the arithmetic processing unit 30 of the translation server 3.
  • the flowchart starts with, for example, the training data D1 stored in the storage unit 31 and the parameter group of the translation model 35 set to initial values, that is, the translation model 35 to be learned is prepared.
  • the arithmetic processing unit 30 refers to the training data D1 in the storage unit 31 and inputs the source language sentence and the speaker information associated with the training data D1 to the translation model 35 to be learned (S11).
  • the speaker information is input to the translation model 35 in association with the source language sentence as tag information, for example.
  • the language of the translation destination is designated in advance, for example.
  • the arithmetic processing unit 30 executes machine translation based on the input information in the translation model 35 being learned (S12).
  • the arithmetic processing unit 30 causes the translation model 35 to generate a translation sentence according to the current parameter group.
  • the arithmetic processing unit 30 adjusts the parameter group based on the error between the translated sentence obtained by the translation model 35 being learned and the corresponding target language sentence (S13).
  • the processing in step S13 is performed according to the error back propagation method or the like with reference to the target language sentence associated with the source language sentence input to the translation model 35 in the training data D1.
  • the arithmetic processing unit 30 determines whether learning of the translation model 35 is completed based on a predetermined learning end condition (S14).
  • the learning end condition is set in advance according to, for example, the number of learnings. If the learning of the translation model 35 has not been completed (NO in S14), the arithmetic processing unit 30 performs the processes after step S11. Each time the processes of steps S11 to S13 are repeated, the parameter group of the translation model 35 is updated.
  • the arithmetic processing unit 30 records the final parameter group values in the storage unit 31, and determines the parameter group that defines the learned translation model 35. (S15).
  • the arithmetic processing unit 30 determines the parameter group of the learned translation model 35 (S15), and ends the process according to the flowchart of FIG.
  • the translation model 35 in which the translation of the translation sentence according to the speaker information is learned can be obtained by machine learning using the training data D1 including the speaker information. According to the learning method of the present embodiment, it is possible to generate a translation model 35 that has acquired common sense that is considered natural in various scenes in order to perform appropriate translation between the host 5a and the guest 5b.
  • step S12 in the above processing is performed between steps S6 and S7 in FIG. 5 in the learned translation model 35. Based on common sense acquired by the translation model 35 by machine learning, it is possible to appropriately translate the translated sentence.
  • the arithmetic processing unit 30 of the translation server 3 executes the processing of the learning method.
  • the processing of the learning method may be performed on various computers different from the translation server 3.
  • the generated learned translation model 35 can be provided as appropriate.
  • the training data D1 associating the “source language sentence” with the “target language sentence” based on the “speaker information” is exemplified.
  • the training data D1 is not limited to this.
  • example sentences that are translated in various languages may be recorded in association with each other based on the speaker information.
  • a source language sentence to be learned from the training data D1 may be selected by appropriately specifying a translation source language when learning the translation model 35.
  • the translation device 2 uses the host 5a, the guest 5b, and the like to send the machine translation result to the counterpart (ie, the translation destination) according to the input of the speaker (ie, the translation source user). Output to the user.
  • the translation device 2 includes two microphones 26a and 26b as an example of first and second acquisition units, a control unit 20, and a speaker 27 as an example of an output unit.
  • As the first acquisition unit each of the microphones 26a and 26b acquires an input sentence in the language of the translation source.
  • Each microphone 26a, 26b as the second acquisition unit acquires speaker information, which is an example of user information related to the input sentence.
  • the control unit 20 Based on the input sentence and the speaker information, the control unit 20 obtains a translated sentence indicating the translation result of the input sentence corresponding to the user information in the translation destination language.
  • the speaker 27 outputs a translated sentence by voice output.
  • the speaker information includes information indicating the role of the speaker regarding the related input sentence.
  • the translation apparatus 2 for example, during the dialogue between the host 5a and the guest 5b, based on the speaker information acquired according to the utterance of the input sentence by the speaker, the translation according to the speaker is performed in the machine translation. It can be carried out.
  • the role of at least one of the speaker and the other party in the speaker information includes at least one of “host” and “guest”. Thereby, in various scenes by the host 5a and the guest 5b, it is possible to realize appropriate translation as each role.
  • the first and second acquisition units in the present embodiment are not limited to the plurality of microphones 26a and 26b, but are at least one of the microphones 26a and 26b, the operation unit 22, the network I / F 25, and the device I / F 24, respectively. One may be included. Such an example will be described with reference to FIG.
  • FIG. 9 shows a display example of the display unit 23 in the translation apparatus 2.
  • the display unit 23 that constitutes the touch panel together with the operation unit 22 displays a host utterance icon 23a, a guest utterance icon 23b, an input sentence region 23c, and a translated sentence region 23d.
  • the utterance icons 23 a and 23 b are icons for inputting a touch operation for the corresponding user to start utterance to the operation unit 22.
  • the input sentence area 23c an image of the input sentence is displayed according to the utterance.
  • the translated sentence area 23d an image of the translated sentence is displayed according to the translation result of the input sentence.
  • the operation unit 22 functions as a second acquisition unit using the speech icons 23a and 23b.
  • the translation device 2 starts processing as shown in the flowchart of FIG. 5 when one of the two utterance icons 23a and 23b is touched.
  • the control unit 20 sets “host” when the operation unit 22 inputs the operation of the host utterance icon 23a, and inputs the operation of the guest utterance icon 23b. Then, the speaker information can be set so as to be “guest”.
  • the acquisition of the speaker information may be performed based on, for example, information obtained via the communication I / F 25 or the device I / F 24 or the result of speech recognition of the input sentence.
  • “host” or “guest” of the speaker information is set based on information indicating which of the languages recognized by the speech recognition corresponds to each of the host 5a and the guest 5b. Also good.
  • the translation apparatus 2 further includes a network I / F 25 as an example of a communication unit that communicates with an external translation server 3.
  • the control unit 20 transmits the input sentence and the speaker information to the translation server 3 via the network I / F 25, and receives the translated sentence of the transmitted input sentence from the translation server 3.
  • the translation server 3 can generate an appropriate translation sentence according to the speaker information from the translation device 2.
  • the translation system 1 in this embodiment includes a translation device 2 and a translation server 3 that is an example of a machine translator.
  • the translation server 3 performs machine translation based on the information acquired by the translation device 2 and generates a translation.
  • the translation method in the present embodiment executes machine translation so as to generate a translation result output to the other party in response to an input from the speaker.
  • the first acquisition unit obtains an input sentence in a source language (S1, S2)
  • the second acquisition unit obtains speaker information related to the input sentence (S3 to S5).
  • the speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence.
  • the control unit obtains a translated sentence indicating a translation result of the input sentence according to the speaker information in the translation destination language based on the input sentence and the speaker information (S6, S7), and an output Includes a step (S8) of outputting a translated sentence.
  • this method by using the speaker information, it is possible to perform translation according to the speaker in machine translation.
  • the program in the present embodiment causes a computer such as the translation device 2 to execute a process of outputting the result of machine translation to the other party in accordance with the input of the speaker.
  • the program includes a step (S1, S2) in which the computer acquires an input sentence in the language of the translation source, and a step (S3-S5) in which speaker information related to the input sentence is acquired.
  • the speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence.
  • the computer acquires a translated sentence indicating a translation result of the input sentence according to the speaker information in the language to be translated based on the input sentence and the speaker information (S6, S7); (S8). According to this program, by using the speaker information, it is possible to perform translation according to the speaker in machine translation.
  • the learning method in the present embodiment is a method for generating a translation model 35 that obtains a translation model 35 in which machine translation from a speaker to a partner is realized in machine learning of a computer such as the translation server 3.
  • the storage unit 31 of the computer stores a parameter group that defines the translation model 35 based on machine learning.
  • the computer inputs information relating the input sentence and the speaker information in the language of the translation source to the translation model 35 being learned, and causes the translation model 35 to generate a translation sentence (S11, S12).
  • the speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence.
  • the method includes a step (S13) in which the computer adjusts the parameter group according to the generated translation. According to this method, it is possible to generate a translation model 35 learned to perform translation according to the speaker in machine translation.
  • FIG. 10 is a diagram showing an outline of the translation system 1A according to the second embodiment.
  • the translation system 1A according to the present embodiment includes a host translation server 3a and a guest translation server 3b instead of one translation server 3 in the same configuration as in the first embodiment.
  • the two translation servers 3a and 3b are an example of a plurality of machine translators in the present embodiment.
  • Each translation server 3a, 3b is configured in the same manner as the translation server 3 of the first embodiment, for example.
  • the host translation server 3a has a translation model in which machine learning about the utterance of the host is performed.
  • the guest translation server 3b has a translation model in which machine learning about a guest's utterance is performed.
  • FIG. 11 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment.
  • the control part 20 of the translation apparatus 2 performs step S6A, S6B instead of step S6 of FIG. 5, for example.
  • the control unit 20 transmits the acquired input sentence from the network I / F to the host translation server 3a (S6A).
  • the speaker information is “guest” (S5)
  • the translation apparatus 2 transmits the acquired input sentence from the network I / F to the guest translation server 3b (S6B).
  • the translation apparatus 2 receives the translation sentence of a translation result from the translation server selected according to speaker information from the two translation servers 3a and 3b (S7).
  • the communication I / F 25 communicates with the translation servers 3a and 3b, which are a plurality of external machine translators.
  • the control unit 20 transmits an input sentence to different machine translators according to user information such as speaker information via the communication I / F 25 (S6A, S6B).
  • the control unit 20 receives the translated sentence of the transmitted input sentence from the machine translator via the communication I / F 25 (S7). Also according to this, based on the speaker information acquired according to the utterance of the input sentence by the speaker, translation according to the speaker can be performed in the machine translation.
  • Embodiments 1 and 2 have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can also be applied to an embodiment in which changes, substitutions, additions, omissions, and the like are appropriately performed. Moreover, it is also possible to combine each component demonstrated by each said embodiment into a new embodiment. Accordingly, other embodiments will be exemplified below.
  • the speaker information is described as an example of the user information.
  • the user information is not limited to the speaker information, but may be information about the other party of the speaker, for example.
  • the user information may include information about both the speaker and the other party (that is, the translation source and translation destination users).
  • “host” and “guest” are exemplified as user roles indicated by the user information.
  • the role in the user information is not limited to this, and may be various roles such as “teacher” and “student” or “superior” and “subordinate”. This also makes it possible to realize appropriate translation according to the role indicated by the user information.
  • the user information may include additional information related to other users in addition to the above role.
  • the user information may include information indicating at least one of the gender and age of the translation source user, the gender and age of the translation destination user, and the scene of the interaction between the translation source and translation destination users.
  • additional tagging according to various types of additional information may be performed on the input sentence, the training data D1, and the like. Thereby, according to various kinds of additional information, for example, wording according to an adult / child or a man / woman can be appropriately translated.
  • the user information may be used for a process of correcting the input sentence before machine translation of the input sentence.
  • the control unit 20 or the arithmetic processing unit 30 may correct the input sentence so as to complement the ambiguous subject based on the user information. Also by this, by translating the corrected input sentence, it is possible to realize the translation including the proper subject according to the user information.
  • the translation systems 1 and 1A in which an input sentence is input by voice have been described.
  • the input sentence may not be input by voice, for example, may be input by text.
  • the translation source user may input a text input sentence into the translation device 2 by operating the operation unit 22 instead of speaking.
  • the translation system of this embodiment can omit the voice recognition function.
  • the translation system of this embodiment may abbreviate
  • machine translation may be performed inside the translation apparatus 2.
  • a program similar to the translation model 35 is stored in the storage unit 21 of the translation device 2, and the control unit 20 executes the program to generate a translation according to the acquired input sentence and speaker information. May be.
  • the example in which the translation model 35 of the machine translator is configured by a neural network has been described.
  • the translation model of the machine translator in the present embodiment is not limited to this, and may be constituted by a probabilistic model, for example. Further, the machine translator and the translation model of the present embodiment are not necessarily based on machine learning.
  • the translation device, system, method, program, and learning method according to the present disclosure can be applied to machine translation in various scenes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A translation device (2) outputs a result of machine translation to a user of a translation destination in response to an input from a user of a translation source. This translation device is provided with: first and second acquisition units (22, 26a, 26b); a control unit (20); and output units (23, 27). The first acquisition unit acquires an inputted statement in a language of the translation source. The second acquisition unit acquires user information related to the inputted statement. The control unit acquires a translated statement indicating the translation result of the inputted statement in accordance with the user information in a language of the translation destination on the basis of the inputted statement and the user information. The output unit outputs the translated statement. The user information includes information indicating the role of the user of the translation source and/or the user of the translation destination for the related inputted statement.

Description

翻訳装置、システム、方法及びプログラム並びに学習方法Translation apparatus, system, method, program, and learning method
 本開示は、機械翻訳に基づく翻訳装置、翻訳システム、翻訳方法及びプログラム並びに学習方法に関する。 The present disclosure relates to a translation device based on machine translation, a translation system, a translation method, a program, and a learning method.
 非特許文献1は、多数の言語間の翻訳を、単一のニューラル機械翻訳モデルを用いて可能とする技術を提案している。非特許文献1では、入力文の冒頭において翻訳先の言語を特定するトークンを導入することで、多数の言語にわたり機械翻訳のモデルを共用化している。これにより、ニューラル機械翻訳モデルにおいて学習させていないペアの言語間におけるゼロショット翻訳が達成されている。 Non-Patent Document 1 proposes a technique that enables translation between multiple languages using a single neural machine translation model. In Non-Patent Document 1, a machine translation model is shared across a number of languages by introducing a token that identifies a language to be translated at the beginning of an input sentence. This achieves zero-shot translation between pairs of languages not learned in the neural machine translation model.
 非特許文献2は、ニューラル機械翻訳モデルにおいて敬語を制御する技術を提案している。非特許文献2は、英語のような敬語の概念がない言語からの機械翻訳を行う際に、翻訳先における敬語のレベルを制御するための付帯条件を利用している。付帯条件は、「丁寧」と「形式張らない」と「なし」とにおける何れかに設定されている。 Non-Patent Document 2 proposes a technique for controlling honorifics in a neural machine translation model. Non-Patent Document 2 utilizes an incidental condition for controlling the level of honorifics at the translation destination when performing machine translation from a language that does not have the concept of honorifics such as English. The incidental condition is set to any one of “careful”, “not formal”, and “none”.
 本開示は、機械翻訳においてユーザに応じた訳し分けを行うことができる翻訳装置、システム、方法及びプログラム並びに学習方法を提供する。 This disclosure provides a translation apparatus, system, method, program, and learning method that can perform translation according to a user in machine translation.
 本開示の一態様に係る翻訳装置は、翻訳元のユーザの入力に応じて、機械翻訳の結果を翻訳先のユーザに出力する。翻訳装置は、第1取得部と、第2取得部と、制御部と、出力部とを備える。第1取得部は、翻訳元の言語における入力文を取得する。第2取得部は、入力文に関連するユーザ情報を取得する。制御部は、入力文及びユーザ情報に基づいて、翻訳先の言語においてユーザ情報に応じた入力文の翻訳結果を示す翻訳文を取得する。出力部は、翻訳文を出力する。ユーザ情報は、関連した入力文についての翻訳元のユーザと翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む。 The translation device according to one aspect of the present disclosure outputs the result of machine translation to the user of the translation destination in response to the input of the user of the translation source. The translation apparatus includes a first acquisition unit, a second acquisition unit, a control unit, and an output unit. The first acquisition unit acquires an input sentence in the language of the translation source. The second acquisition unit acquires user information related to the input sentence. Based on the input sentence and the user information, the control unit acquires a translated sentence indicating a translation result of the input sentence corresponding to the user information in the language of the translation destination. The output unit outputs the translated sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence.
 本開示の一態様に係る翻訳システムは、上記の翻訳装置と、機械翻訳機とを備える。機械翻訳機は、翻訳装置において取得された情報に基づき機械翻訳を行って、翻訳文を生成する。 A translation system according to an aspect of the present disclosure includes the above translation device and a machine translator. The machine translator performs machine translation based on information acquired by the translation device, and generates a translated sentence.
 本開示の一態様に係る翻訳方法は、翻訳元のユーザからの入力に応じて、翻訳先のユーザに出力される翻訳結果を生成するように機械翻訳を実行する方法である。本方法は、第1取得部が、翻訳元の言語における入力文を取得するステップと、第2取得部が、入力文に関連するユーザ情報を取得するステップとを含む。ユーザ情報は、関連した入力文についての翻訳元のユーザと翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む。本方法は、制御部が、入力文及びユーザ情報に基づいて、翻訳先の言語においてユーザ情報に応じた入力文の翻訳結果を示す翻訳文を取得するステップと、出力部が、翻訳文を出力するステップとを含む。 The translation method according to an aspect of the present disclosure is a method of executing machine translation so as to generate a translation result output to a translation destination user in response to an input from a translation source user. The method includes a step in which the first acquisition unit acquires an input sentence in the language of the translation source, and a step in which the second acquisition unit acquires user information related to the input sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. In this method, the control unit obtains a translated sentence indicating the translation result of the input sentence according to the user information in the translation destination language based on the input sentence and the user information, and the output unit outputs the translated sentence Including the step of.
 本開示の一態様に係るプログラムは、翻訳元のユーザの入力に応じて、機械翻訳の結果を翻訳先のユーザに出力する処理をコンピュータに実行させるプログラムである。本プログラムは、コンピュータが、翻訳元の言語における入力文を取得するステップと、入力文に関連するユーザ情報を取得するステップとを含む。ユーザ情報は、関連した入力文についての翻訳元のユーザと翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む。本プログラムは、コンピュータが、入力文及びユーザ情報に基づいて、翻訳先の言語においてユーザ情報に応じた入力文の翻訳結果を示す翻訳文を取得するステップと、翻訳文を出力するステップとを含む。 The program according to an aspect of the present disclosure is a program that causes a computer to execute a process of outputting a machine translation result to a translation destination user in response to an input of a translation source user. The program includes a step in which the computer acquires an input sentence in the language of the translation source, and a step in which user information related to the input sentence is acquired. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. The program includes a step in which a computer acquires a translated sentence indicating a translation result of an input sentence corresponding to user information in a translation destination language based on the input sentence and user information, and a step of outputting the translated sentence. .
 本開示の一態様に係る学習方法は、コンピュータの機械学習において、翻訳元のユーザから翻訳先のユーザへの機械翻訳が実現される翻訳モデルを得る方法である。コンピュータの記憶部には、機械学習に基づき翻訳モデルを規定するパラメータ群が格納されている。本方法は、コンピュータが、翻訳元の言語における入力文とユーザ情報とを関連付けた情報を、学習中の翻訳モデルに入力して、当該翻訳モデルに翻訳文を生成させるステップを含む。ユーザ情報は、関連した入力文についての翻訳元のユーザと翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む。本方法は、コンピュータが、生成された翻訳文に応じて、パラメータ群を調整するステップを含む。 The learning method according to an aspect of the present disclosure is a method for obtaining a translation model in which machine translation from a translation source user to a translation destination user is realized in computer machine learning. A parameter group that defines a translation model based on machine learning is stored in the storage unit of the computer. The method includes a step in which a computer inputs information associating an input sentence and user information in the language of the translation source to the translation model being learned, and causes the translation model to generate a translation sentence. The user information includes information indicating the role of at least one of the translation source user and the translation destination user for the related input sentence. The method includes the step of the computer adjusting the parameter group according to the generated translation.
 本開示に係る翻訳装置、システム、方法及びプログラム並びに学習方法によると、機械翻訳においてユーザに応じた訳し分けを行うことができる。 According to the translation apparatus, system, method, program, and learning method according to the present disclosure, translation according to the user can be performed in machine translation.
本開示の実施形態1に係る翻訳システムの概要を示す図The figure which shows the outline | summary of the translation system which concerns on Embodiment 1 of this indication. 実施形態1における翻訳装置の構成を例示するブロック図1 is a block diagram illustrating a configuration of a translation apparatus according to a first embodiment. 実施形態1における翻訳サーバの構成を例示するブロック図1 is a block diagram illustrating a configuration of a translation server according to the first embodiment. 翻訳システムによる翻訳方法を例示する図Diagram illustrating translation method by translation system 実施形態1に係る翻訳装置の処理を例示するフローチャート6 is a flowchart illustrating processing of the translation apparatus according to the first embodiment. 翻訳システムの使用例を説明するための図Diagram for explaining the usage example of translation system 実施形態1の学習方法における訓練データを説明するための図The figure for demonstrating the training data in the learning method of Embodiment 1. 実施形態1に係る翻訳モデルの学習方法の処理を例示するフローチャート6 is a flowchart illustrating processing of a translation model learning method according to the first embodiment; 翻訳装置における表示部の表示例を示す図The figure which shows the example of a display of the display part in a translation apparatus 実施形態2に係る翻訳システムの概要を示す図The figure which shows the outline | summary of the translation system which concerns on Embodiment 2. 実施形態2に係る翻訳装置の処理を例示するフローチャートFlowchart illustrating processing of translation apparatus according to embodiment 2
 以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.
 なお、出願人は、当業者が本開示を十分に理解するために添付図面および以下の説明を提供するのであって、これらによって特許請求の範囲に記載の主題を限定することを意図するものではない。 The applicant provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and is not intended to limit the subject matter described in the claims. Absent.
(実施形態1)
 以下、図面を用いて、本開示の実施形態1を説明する。
(Embodiment 1)
Hereinafter, Embodiment 1 of the present disclosure will be described with reference to the drawings.
1.構成
1-1.システム概要
 実施形態1に係る翻訳システムについて、図1を用いて説明する。図1は、本実施形態に係る翻訳システム1の概要を示す図である。
1. Configuration 1-1. System Overview A translation system according to Embodiment 1 will be described with reference to FIG. FIG. 1 is a diagram showing an outline of a translation system 1 according to the present embodiment.
 本実施形態に係る翻訳システム1は、図1に示すように、翻訳装置2と、各種サーバ3,11,12とを備える。翻訳システム1は、互いに異なる言語を用いるユーザ5a,5b間で対話可能にするように、翻訳装置2から一方のユーザの発話を翻訳元として入力し、他方のユーザのための翻訳先の言語への機械翻訳が行われる。 The translation system 1 according to the present embodiment includes a translation device 2 and various servers 3, 11, and 12 as shown in FIG. 1. The translation system 1 inputs the utterance of one user as a translation source from the translation device 2 so as to enable dialogue between the users 5a and 5b using languages different from each other, and changes to the translation destination language for the other user. Machine translation is performed.
 本実施形態の翻訳システム1は、例えば空港、ホテルおよび飲食店などの種々の業種における各種案内を含んだ接客等の場面に適用可能である。以下では、接客を行うホストの役割のユーザ5aを「ホスト5a」と略記し、接客を受けるゲストの役割のユーザ5aを「ゲスト5b」と略記する。本実施形態の翻訳システム1は、各種の場面におけるホスト5a及びゲスト5b間の対話として適切な機械翻訳の訳し分けを実現する。 The translation system 1 of the present embodiment is applicable to scenes such as customer service including various types of guidance in various industries such as airports, hotels, and restaurants. In the following description, the user 5a in the role of the host that serves customers is abbreviated as “host 5a”, and the user 5a in the role of guest that receives customers is abbreviated as “guest 5b”. The translation system 1 of the present embodiment realizes appropriate translation of machine translation as dialogue between the host 5a and the guest 5b in various scenes.
 本実施形態において、翻訳装置2は、インターネット等の通信ネットワーク10を介して各種サーバ3,11,12とデータ通信を行う。翻訳システム1は、翻訳装置2を複数、含んでもよい。この場合、適宜、各翻訳装置2が送信するデータに自装置の識別情報を含めて、各種サーバ3,11,12は受信した識別情報が示す翻訳装置2にデータを送信できる。 In this embodiment, the translation apparatus 2 performs data communication with the various servers 3, 11, and 12 via the communication network 10 such as the Internet. The translation system 1 may include a plurality of translation devices 2. In this case, the server 3, 11 and 12 can appropriately transmit the data to the translation device 2 indicated by the received identification information by including the identification information of the own device in the data transmitted by each translation device 2.
 翻訳システム1の各種サーバ3,11,12は、例えばAPSサーバであり、翻訳サーバ3、音声認識サーバ11及び音声合成サーバ12を含む。翻訳サーバ3は、本実施形態の翻訳方法において機械翻訳を実行する機械翻訳機の一例である。音声認識サーバ11は、機械翻訳の対象となる入力文の音声認識機能を有する。音声合成サーバ12は、機械翻訳の結果を示す翻訳文の音声合成機能を有する。以下、翻訳システム1の構成の詳細を説明する。 The various servers 3, 11, 12 of the translation system 1 are, for example, APS servers, and include the translation server 3, the speech recognition server 11, and the speech synthesis server 12. The translation server 3 is an example of a machine translator that executes machine translation in the translation method of the present embodiment. The speech recognition server 11 has a speech recognition function for an input sentence to be machine-translated. The speech synthesis server 12 has a speech synthesis function for translated sentences indicating the result of machine translation. Details of the configuration of the translation system 1 will be described below.
1-2.翻訳装置の構成
 本実施形態の翻訳システム1における翻訳装置2の構成について、図1,図2を参照して説明する。図2は、翻訳装置2の構成を例示するブロック図である。
1-2. Configuration of Translation Device The configuration of the translation device 2 in the translation system 1 of the present embodiment will be described with reference to FIGS. FIG. 2 is a block diagram illustrating the configuration of the translation apparatus 2.
 翻訳装置2は、例えばタブレット端末、スマートフォン又はPCなどの情報端末で構成される。図2に例示する翻訳装置2は、制御部20と、記憶部21と、操作部22と、表示部23と、機器インタフェース24と、ネットワークインタフェース25とを備える。以下、インタフェースを「I/F」と略記する。また、例えば翻訳装置2は、2つのマイク26a,26bと、スピーカ27とを備える。 The translation device 2 is composed of an information terminal such as a tablet terminal, a smartphone or a PC. The translation device 2 illustrated in FIG. 2 includes a control unit 20, a storage unit 21, an operation unit 22, a display unit 23, a device interface 24, and a network interface 25. Hereinafter, the interface is abbreviated as “I / F”. For example, the translation apparatus 2 includes two microphones 26 a and 26 b and a speaker 27.
 本実施形態の翻訳装置2では、図1に示すように、2つのマイク26a,26bのうちの一方はホスト5aが使用するホスト用のマイク26aであり、他方はゲスト5bが使用するゲスト用のマイク26bである。各マイク26a,26bは、それぞれ音声を収音し、音声データを入力する入力デバイスである。各マイク26a,26bは、本実施形態における取得部の一例である。 In the translation apparatus 2 of the present embodiment, as shown in FIG. 1, one of the two microphones 26a and 26b is a host microphone 26a used by the host 5a, and the other is a guest microphone used by the guest 5b. This is the microphone 26b. Each of the microphones 26a and 26b is an input device that collects sound and inputs sound data. Each microphone 26a, 26b is an example of an acquisition unit in the present embodiment.
 スピーカ27は、音声データを音声出力する出力デバイスであり、本実施形態における出力部の一例である。図1,図2では、スピーカ27が、ホスト5a及びゲスト5b間で共用される場合を例示している。翻訳装置2は、ホスト用のスピーカとゲスト用のスピーカとを別体で備えてもよい。マイク26a,26b及びスピーカ27は、翻訳装置2を構成する情報端末とは外付けで設けられてもよいし、当該情報端末に内蔵されてもよい。 The speaker 27 is an output device that outputs audio data as audio, and is an example of an output unit in the present embodiment. 1 and 2 illustrate a case where the speaker 27 is shared between the host 5a and the guest 5b. The translation apparatus 2 may include a host speaker and a guest speaker separately. The microphones 26a and 26b and the speaker 27 may be provided externally to the information terminal that constitutes the translation device 2, or may be incorporated in the information terminal.
 制御部20は、例えばソフトウェアと協働して所定の機能を実現するCPU又はMPUを含み、翻訳装置2の全体動作を制御する。制御部20は、記憶部21に格納されたデータ及びプログラムを読み出して種々の演算処理を行い、各種の機能を実現する。例えば、制御部20は、本実施形態の翻訳方法における翻訳装置2の処理を実現するための命令群を含んだプログラムを実行する。上記のプログラムは、通信ネットワーク10等から提供されてもよいし、可搬性を有する記録媒体に格納されていてもよい。 The control unit 20 includes, for example, a CPU or MPU that realizes a predetermined function in cooperation with software, and controls the overall operation of the translation apparatus 2. The control unit 20 reads out data and programs stored in the storage unit 21 and performs various arithmetic processes to realize various functions. For example, the control unit 20 executes a program including an instruction group for realizing the processing of the translation apparatus 2 in the translation method of the present embodiment. The above program may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.
 なお、制御部20は、所定の機能を実現するように設計された専用の電子回路又は再構成可能な電子回路などのハードウェア回路であってもよい。制御部20は、CPU、MPU、GPU、GPGPU、TPU、マイコン、DSP、FPGA及びASIC等の種々の半導体集積回路で構成されてもよい。 Note that the control unit 20 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The control unit 20 may be configured by various semiconductor integrated circuits such as a CPU, MPU, GPU, GPGPU, TPU, microcomputer, DSP, FPGA, and ASIC.
 記憶部21は、翻訳装置2の機能を実現するために必要なプログラム及びデータを記憶する記憶媒体である。記憶部21は、図2に示すように、格納部21a及び一時記憶部21bを含む。 The storage unit 21 is a storage medium that stores programs and data necessary to realize the functions of the translation apparatus 2. As shown in FIG. 2, the storage unit 21 includes a storage unit 21a and a temporary storage unit 21b.
 格納部21aは、所定の機能を実現するためのパラメータ、データ及び制御プログラム等を格納する。格納部21aは、例えばHDD又はSSDで構成される。例えば、格納部21aは、上記のプログラムなどを格納する。 The storage unit 21a stores parameters, data, a control program, and the like for realizing a predetermined function. The storage unit 21a is configured with, for example, an HDD or an SSD. For example, the storage unit 21a stores the above program.
 一時記憶部21bは、例えばDRAM又はSRAM等のRAMで構成され、データを一時的に記憶(即ち保持)する。例えば、一時記憶部21bは、入力文及び翻訳文並びに、後述するユーザ情報などを保持する。また、一時記憶部21bは、制御部20の作業エリアとして機能してもよく、制御部20の内部メモリにおける記憶領域で構成されてもよい。 The temporary storage unit 21b is configured by a RAM such as a DRAM or an SRAM, for example, and temporarily stores (that is, holds) data. For example, the temporary storage unit 21b holds an input sentence, a translated sentence, user information described later, and the like. The temporary storage unit 21 b may function as a work area for the control unit 20 or may be configured by a storage area in the internal memory of the control unit 20.
 操作部22は、ユーザが操作を行うユーザインタフェースである。図1では、操作部22が表示部23と共にタッチパネルを構成する例を示している。操作部22はタッチパネルに限らず、例えば、キーボード、タッチパッド、ボタン及びスイッチ等であってもよい。操作部22は、ユーザの操作によって入力される諸情報を取得する取得部の一例である。 The operation unit 22 is a user interface that is operated by a user. FIG. 1 shows an example in which the operation unit 22 forms a touch panel together with the display unit 23. The operation unit 22 is not limited to a touch panel, and may be a keyboard, a touch pad, buttons, switches, or the like, for example. The operation unit 22 is an example of an acquisition unit that acquires various information input by a user operation.
 表示部23は、例えば、液晶ディスプレイ又は有機ELディスプレイで構成される出力部の一例である。表示部23は、例えば翻訳文をユーザに出力する情報の出力表示を行う。また、表示部23は、操作部22を操作するための各種アイコン及び操作部22から入力された情報など、各種の情報を表示してもよい。 The display unit 23 is an example of an output unit configured with, for example, a liquid crystal display or an organic EL display. The display unit 23 performs output display of information for outputting a translated sentence to the user, for example. The display unit 23 may display various information such as various icons for operating the operation unit 22 and information input from the operation unit 22.
 機器I/F24は、翻訳装置2に外部機器を接続するための回路である。機器I/F24は、所定の通信規格にしたがい通信を行う通信部の一例である。所定の規格には、USB、HDMI(登録商標)、IEEE1395、WiFi、Bluetooth(登録商標)等が含まれる。機器I/F24は、翻訳装置2において外部機器に対し、諸情報を受信する取得部あるいは送信する出力部を構成してもよい。 The device I / F 24 is a circuit for connecting an external device to the translation device 2. The device I / F 24 is an example of a communication unit that performs communication according to a predetermined communication standard. The predetermined standard includes USB, HDMI (registered trademark), IEEE 1395, WiFi, Bluetooth (registered trademark), and the like. The device I / F 24 may constitute an acquisition unit that receives various information or an output unit that transmits information to an external device in the translation apparatus 2.
 ネットワークI/F25は、無線または有線の通信回線を介して翻訳装置2を通信ネットワーク10に接続するための回路である。ネットワークI/F25は所定の通信規格に準拠した通信を行う通信部の一例である。所定の通信規格には、IEEE802.3,IEEE802.11a/11b/11g/11ac等の通信規格が含まれる。ネットワークI/F25は、翻訳装置2において通信ネットワーク10を介して、諸情報を受信する取得部あるいは送信する出力部を構成してもよい。 The network I / F 25 is a circuit for connecting the translation apparatus 2 to the communication network 10 via a wireless or wired communication line. The network I / F 25 is an example of a communication unit that performs communication based on a predetermined communication standard. The predetermined communication standard includes communication standards such as IEEE802.3, IEEE802.11a / 11b / 11g / 11ac. The network I / F 25 may constitute an acquisition unit that receives various information or an output unit that transmits the information via the communication network 10 in the translation apparatus 2.
 以上のような翻訳装置2の構成は一例であり、翻訳装置2の構成はこれに限らない。例えば、翻訳装置2は、ホスト用のマイク26aとゲスト用のマイク26bとを備えなくてもよく、例えばホスト5aとゲスト5b間で共用されるマイクが用いられてもよい。また、翻訳装置2は、情報端末に限らない各種のコンピュータで構成されてもよい。 The configuration of the translation device 2 as described above is an example, and the configuration of the translation device 2 is not limited to this. For example, the translation apparatus 2 does not have to include the host microphone 26a and the guest microphone 26b. For example, a microphone shared between the host 5a and the guest 5b may be used. Moreover, the translation apparatus 2 may be comprised with the various computer which is not restricted to an information terminal.
 また、翻訳装置2における取得部は、制御部20等における各種ソフトウェアとの協働によって実現されてもよい。翻訳装置2における取得部は、各種記憶媒体(例えば格納部21a)に格納された諸情報を制御部20の作業エリア(例えば一時記憶部21b)に読み出すことによって、諸情報の取得を行うものであってもよい。上述した各種取得部は、それぞれ、翻訳元の入力文を取得する第1取得部であってもよいし、入力文に関連するユーザ情報を取得する第2取得部であってもよい。第1及び第2取得部が1つのハードウェア要素で兼用されてもよい。 Further, the acquisition unit in the translation apparatus 2 may be realized by cooperation with various software in the control unit 20 or the like. The acquisition unit in the translation device 2 acquires various information by reading various information stored in various storage media (for example, the storage unit 21a) into the work area (for example, the temporary storage unit 21b) of the control unit 20. There may be. Each of the various acquisition units described above may be a first acquisition unit that acquires an input sentence as a translation source, or may be a second acquisition unit that acquires user information related to the input sentence. The first and second acquisition units may be combined with one hardware element.
1-3.サーバ構成
 本実施形態の翻訳システム1における各種サーバ3,11,12のハードウェア構成の一例として、翻訳サーバ3の構成を、図3を参照して説明する。図3は、本実施形態における翻訳サーバ3の構成を例示するブロック図である。
1-3. Server Configuration As an example of the hardware configuration of the various servers 3, 11, and 12 in the translation system 1 of the present embodiment, the configuration of the translation server 3 will be described with reference to FIG. 3. FIG. 3 is a block diagram illustrating the configuration of the translation server 3 in this embodiment.
 図3に例示する翻訳サーバ3は、演算処理部30と、記憶部31と、通信部32とを備える。翻訳サーバ3は、一つ又は複数のコンピュータで構成される。 The translation server 3 illustrated in FIG. 3 includes an arithmetic processing unit 30, a storage unit 31, and a communication unit 32. The translation server 3 is composed of one or a plurality of computers.
 演算処理部30は、例えばソフトウェアと協働して所定の機能を実現するCPU及びGPU等を含み、翻訳サーバ3の動作を制御する。演算処理部30は、記憶部31に格納されたデータ及びプログラムを読み出して種々の演算処理を行い、各種の機能を実現する。 The arithmetic processing unit 30 includes, for example, a CPU and a GPU that realize predetermined functions in cooperation with software, and controls the operation of the translation server 3. The arithmetic processing unit 30 reads out data and programs stored in the storage unit 31 and performs various arithmetic processes to realize various functions.
 例えば、演算処理部30は、本実施形態の翻訳方法において機械翻訳を実行する翻訳モデル35のプログラムを実行する。翻訳モデル35は、例えば各種のニューラルネットワークで構成される。例えば、翻訳モデル35は、多言語間で共有されるニューラル機械翻訳モデルであってもよい(例えば非特許文献1参照)。演算処理部30は、翻訳モデル35の機械学習を行うためのプログラムを実行してもよい。上記の各プログラムは、通信ネットワーク10等から提供されてもよいし、可搬性を有する記録媒体に格納されていてもよい。 For example, the arithmetic processing unit 30 executes a program of the translation model 35 that executes machine translation in the translation method of the present embodiment. The translation model 35 is composed of various neural networks, for example. For example, the translation model 35 may be a neural machine translation model shared between multiple languages (see, for example, Non-Patent Document 1). The arithmetic processing unit 30 may execute a program for performing machine learning of the translation model 35. Each of the above programs may be provided from the communication network 10 or the like, or may be stored in a portable recording medium.
 なお、演算処理部30は、所定の機能を実現するように設計された専用の電子回路又は再構成可能な電子回路などのハードウェア回路であってもよい。演算処理部30は、CPU、GPU、TPU、MPU、マイコン、DSP、FPGA及びASIC等の種々の半導体集積回路で構成されてもよい。 Note that the arithmetic processing unit 30 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. The arithmetic processing unit 30 may be configured by various semiconductor integrated circuits such as a CPU, GPU, TPU, MPU, microcomputer, DSP, FPGA, and ASIC.
 記憶部31は、翻訳サーバ3の機能を実現するために必要なプログラム及びデータを記憶する記憶媒体であり、例えばHDD又はSSDを含む。また、記憶部31は、例えばDRAM又はSRAM等を含み、演算処理部30の作業エリアとして機能してもよい。記憶部31は、例えば翻訳モデル35のプログラム、及び機械学習に基づき翻訳モデル35を規定する種々のパラメータ群を記憶する。パラメータ群は、例えばニューラルネットワークの各種重みパラメータを含む。 The storage unit 31 is a storage medium that stores programs and data necessary for realizing the functions of the translation server 3, and includes, for example, an HDD or an SSD. The storage unit 31 may include a DRAM or SRAM, for example, and may function as a work area for the arithmetic processing unit 30. The storage unit 31 stores, for example, a program of the translation model 35 and various parameter groups that define the translation model 35 based on machine learning. The parameter group includes various weight parameters of a neural network, for example.
 通信部32は、所定の通信規格にしたがい通信を行うためのI/F回路であり、通信ネットワーク10又は外部機器等に翻訳サーバ3を通信接続する。所定の通信規格には、IEEE802.3,IEEE802.11a/11b/11g/11ac、USB、HDMI、IEEE1395、WiFi、Bluetooth等が含まれる。 The communication unit 32 is an I / F circuit for performing communication according to a predetermined communication standard, and communicatively connects the translation server 3 to the communication network 10 or an external device. The predetermined communication standards include IEEE 802.3, IEEE 802.11a / 11b / 11g / 11ac, USB, HDMI, IEEE 1395, WiFi, Bluetooth, and the like.
 以上のような翻訳サーバ3と同様の構成において、例えば翻訳モデル35の代わりに音声認識機能又は音声合成機能のためのプログラム等をそれぞれ適宜、導入することにより、音声認識サーバ11及び音声合成サーバ12が構成できる。翻訳システム1における各種サーバ3,11,12は上記の構成に限定されず、種々の構成を有してもよい。本実施形態の翻訳方法は、クラウドコンピューティングにおいて実行されてもよい。また、各種サーバ3,11,12の機能を実現するハードウェア資源が共用されてもよい。 In the same configuration as the translation server 3 as described above, for example, a program for a speech recognition function or a speech synthesis function is appropriately introduced instead of the translation model 35, so that the speech recognition server 11 and the speech synthesis server 12 are introduced. Can be configured. The various servers 3, 11, 12 in the translation system 1 are not limited to the above configuration, and may have various configurations. The translation method of the present embodiment may be executed in cloud computing. Further, hardware resources that realize the functions of the various servers 3, 11, and 12 may be shared.
 また、翻訳システム1において、音声認識サーバ11と音声合成サーバ12とは省略されてもよい。例えば、翻訳装置2が、音声認識機能を有してもよく、例えばマイク26a,26bによって生成された音声データを音声認識して、テキストデータに変換してもよい。また、翻訳装置2は、音声合成機能を有してもよく、例えば機械翻訳に基づくテキストデータを音声合成して、スピーカ27から音声出力してもよい。 In the translation system 1, the speech recognition server 11 and the speech synthesis server 12 may be omitted. For example, the translation apparatus 2 may have a voice recognition function. For example, voice data generated by the microphones 26a and 26b may be voice-recognized and converted into text data. The translation device 2 may have a speech synthesis function. For example, the translation device 2 may synthesize text data based on machine translation and output the speech from the speaker 27.
2.動作
 以上のように構成される翻訳システム1の動作について、以下説明する。
2. Operation The operation of the translation system 1 configured as described above will be described below.
2-1.翻訳方法について
 本実施形態に係る翻訳システム1による翻訳方法について、図1,図4を参照して説明する。図4は、翻訳システム1による翻訳方法を例示する図である。
2-1. About Translation Method A translation method by the translation system 1 according to the present embodiment will be described with reference to FIGS. FIG. 4 is a diagram illustrating a translation method by the translation system 1.
 本実施形態に係る翻訳システム1は、ホスト5a及びゲスト5bの対話時において、一方が発話する毎に発話者の言語を翻訳元の言語として、翻訳先の言語への機械翻訳を実行する。翻訳元の言語は、例えば発話者の発話から音声認識されてもよいし、翻訳装置2の操作等により設定されてもよい。また、翻訳先の言語は、例えば発話者でない相手方のユーザに応じて適宜、設定される。ホスト5a及びゲスト5bにおいて、発話者は翻訳元のユーザの一例であり、相手方は翻訳先のユーザの一例である。 The translation system 1 according to the present embodiment executes machine translation into the language of the translation destination, using the language of the speaker as the translation source language each time one of the speakers speaks during the dialogue between the host 5a and the guest 5b. For example, the language of the translation source may be recognized by speech from the utterance of the speaker, or may be set by operating the translation device 2 or the like. Moreover, the language of a translation destination is suitably set according to the other user who is not a speaker, for example. In the host 5a and the guest 5b, the speaker is an example of a translation source user, and the counterpart is an example of a translation destination user.
 図4(a)の例では、ホスト5aが発話者として「フライトはいつですか。」という日本語の入力文51を発話している。本実施形態の翻訳装置2は、入力文51の発話音声を入力すると、例えば図1に示す音声認識サーバ11の音声認識を利用して、入力文51を認識できる。この際、本実施形態の翻訳装置2は、入力文51に加えて、発話者等のユーザに関する情報を取得できる。 In the example of FIG. 4 (a), the host 5a utters the Japanese input sentence 51 "When is the flight?" As a speaker. When the utterance voice of the input sentence 51 is input, the translation device 2 of the present embodiment can recognize the input sentence 51 by using, for example, voice recognition of the voice recognition server 11 shown in FIG. At this time, the translation apparatus 2 of the present embodiment can acquire information related to the user such as the speaker in addition to the input sentence 51.
 本実施形態では、図1に示すように、翻訳サーバ3が、翻訳装置2によって取得された情報に基づき機械翻訳を実行して、翻訳先の言語における入力文の翻訳結果を示す翻訳文を生成する。本実施形態の翻訳システム1は、入力文の機械翻訳において発話者がホスト5aとゲスト5bの何れであるかを示すユーザ情報を用いることにより、図4(a),図4(b)に例示するように訳し分けを実現する。 In the present embodiment, as shown in FIG. 1, the translation server 3 executes machine translation based on information acquired by the translation device 2 and generates a translation that indicates the translation result of the input sentence in the translation destination language. To do. The translation system 1 of this embodiment uses user information indicating whether the speaker is a host 5a or a guest 5b in machine translation of an input sentence, and is exemplified in FIGS. 4 (a) and 4 (b). Realize the translation as you do.
 図4(a)の例では、翻訳元の日本語から翻訳先を英語とする機械翻訳に基づいて、「When is your flight ?」という翻訳文61がスピーカ27から出力されている。本実施形態の翻訳システム1では、音声合成サーバ12により、翻訳文61を音声合成できる。 In the example of FIG. 4A, a translated sentence 61 “When is your flight?” Is output from the speaker 27 based on the machine translation from the translation source Japanese to the translation destination English. In the translation system 1 according to the present embodiment, the speech synthesis server 12 can synthesize a translated sentence 61.
 図4(b)は、図4(a)と同じ言語及び内容の入力文51が、ゲスト5bによって発話された例を示している。図4(b)の例において、翻訳装置2は、図4(a)の翻訳文61とは異なる内容の「When is my flight ?」という翻訳文62を出力している。 FIG. 4B shows an example in which an input sentence 51 having the same language and content as FIG. 4A is spoken by the guest 5b. In the example of FIG. 4B, the translation device 2 outputs a translated sentence 62 “When is my flight?” Having a content different from that of the translated sentence 61 of FIG.
 図4(a),図4(b)の例では、空港のカウンターにおいて空港スタッフ等のホスト5aと、搭乗予定のゲスト5bとの間で対話が行われる場面を想定している。このような場面では、フライトの搭乗者はゲスト5bであるということに適した翻訳結果が、適切であると考えられる。図4(a)の例では、発話者がホスト5aであることから、翻訳文61中の「your」はゲスト5bを指すと解され、適切な翻訳結果が得られている。一方、発話者がゲスト5bである場合に図4(a)の翻訳文61と同じ翻訳文が出力されると、翻訳文中の「your」がホスト5aを指すこととなり、不適切になってしまう。 In the example of FIGS. 4A and 4B, it is assumed that a dialogue is performed between a host 5a such as an airport staff and a guest 5b scheduled to board at an airport counter. In such a scene, it is considered that a translation result suitable for the flight passenger being the guest 5b is appropriate. In the example of FIG. 4A, since the speaker is the host 5a, it is understood that “your” in the translated sentence 61 indicates the guest 5b, and an appropriate translation result is obtained. On the other hand, when the speaker is the guest 5b and the same translated sentence as the translated sentence 61 in FIG. 4A is output, “your” in the translated sentence indicates the host 5a and becomes inappropriate. .
 ここで、従来の機械翻訳の手法では、図4(a),図4(b)のような2通りの翻訳文61,62が想定される入力文51に対して、基本的に1通りの翻訳文しか訳出されず、何れが適切かの判定等もできなかった。このように、従来技術では、同一の入力文に対する翻訳文の訳し分けを実現することは困難であった。 Here, in the conventional machine translation technique, there is basically one way for the input sentence 51 that assumes two kinds of translated sentences 61 and 62 as shown in FIGS. 4 (a) and 4 (b). Only the translation was translated, and it was not possible to determine which was appropriate. As described above, in the prior art, it has been difficult to realize the translation of the translated sentence for the same input sentence.
 これに対して、本実施形態の翻訳システム1では、ホスト5aとゲスト5bの対話中に逐次、発話者を示すユーザ情報を翻訳装置2で取得して、対応する入力文51の機械翻訳に用いることで、適切な訳し分けが実現される。以下、翻訳システム1の翻訳方法における動作の詳細を説明する。 On the other hand, in the translation system 1 of the present embodiment, user information indicating the speaker is sequentially acquired by the translation device 2 during the dialogue between the host 5a and the guest 5b and used for machine translation of the corresponding input sentence 51. In this way, appropriate translation can be realized. Details of the operation of the translation system 1 in the translation method will be described below.
2-1-1.翻訳装置の動作
 以上のような翻訳方法における翻訳装置2の動作について、図5を用いて説明する。
2-1-1. Operation of Translation Device The operation of the translation device 2 in the above translation method will be described with reference to FIG.
 図5は、本実施形態に係る翻訳装置2の処理を例示するフローチャートである。図5に示すフローチャートの各処理は、翻訳装置2の制御部20によって実行される。本フローチャートは、例えば、ホスト5a及びゲスト5bにおける一方が所望の入力文を発話したときに開始される。 FIG. 5 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment. Each process of the flowchart shown in FIG. 5 is executed by the control unit 20 of the translation apparatus 2. This flowchart is started when, for example, one of the host 5a and the guest 5b utters a desired input sentence.
 まず、翻訳装置2の制御部20は、ホスト用のマイク26a又はゲスト用のマイク26bから、発話者による発話音声の音声データを入力する(S1)。発話音声の音声データは、発話者の発話による入力文を示す情報の一例である。制御部20は、2つのマイク26a,26bにおける音量に基づいて、或いは発話者の各種操作に応じて、一方のマイクを選択してもよい。 First, the control unit 20 of the translation apparatus 2 inputs the voice data of the voices spoken by the speaker from the host microphone 26a or the guest microphone 26b (S1). The voice data of the uttered voice is an example of information indicating an input sentence by the utterance of the speaker. The control unit 20 may select one of the microphones based on the volume of the two microphones 26a and 26b or according to various operations of the speaker.
 次に、制御部20は、例えばネットワークI/F25を介して、発話音声が示す入力文を取得する(S2)。具体的に、翻訳装置2は、通信ネットワーク10を介して音声認識サーバ11に、発話音声の音声データを送信する。音声認識サーバ11は、翻訳装置2からの音声データに基づき音声認識の処理を実行し、音声認識結果のテキストデータを生成して、翻訳装置2に送信する。翻訳装置2のネットワークI/F25は、音声認識サーバ11から、生成されたテキストデータの入力文を受信する。 Next, the control unit 20 acquires an input sentence indicated by the uttered voice via, for example, the network I / F 25 (S2). Specifically, the translation device 2 transmits the voice data of the uttered voice to the voice recognition server 11 via the communication network 10. The speech recognition server 11 executes speech recognition processing based on the speech data from the translation device 2, generates text data as a speech recognition result, and transmits it to the translation device 2. The network I / F 25 of the translation device 2 receives the input sentence of the generated text data from the speech recognition server 11.
 また、制御部20は、例えば各マイク26a,26bを取得部として発話者情報を特定するための処理を実行する(S3~S5)。発話者情報は、現在の発話者として「ホスト」又は「ゲスト」を示すユーザ情報の一例である。なお、ステップS2とステップS3~S5との間の処理の順序は特に限定されず、一方が先に行われてもよいし、並列に実行されてもよい。 Further, the control unit 20 executes processing for specifying the speaker information using the microphones 26a and 26b as acquisition units, for example (S3 to S5). The speaker information is an example of user information indicating “host” or “guest” as the current speaker. Note that the order of processing between step S2 and steps S3 to S5 is not particularly limited, and one may be performed first or may be performed in parallel.
 例えば、制御部20は、発話音声がホスト用のマイク26aから入力されたか否かを判断する(S3)。図4(a)の例では、ホスト用のマイク26aからの発話音声の入力に基づいて(S3でYES)、制御部20は、発話者情報を「ホスト」に設定する(S4)。一方、図4(b)の例では、ゲスト用のマイク26bからの発話音声の入力に基づいて(S3でNO)、制御部20は、発話者情報を「ゲスト」に設定する(S5)。 For example, the control unit 20 determines whether or not an utterance voice is input from the host microphone 26a (S3). In the example of FIG. 4A, based on the input of the uttered voice from the host microphone 26a (YES in S3), the control unit 20 sets the speaker information to “host” (S4). On the other hand, in the example of FIG. 4B, based on the input of the utterance voice from the guest microphone 26b (NO in S3), the control unit 20 sets the speaker information to “guest” (S5).
 次に、制御部20は、取得した入力文と発話者情報とを、互いに関連付けして翻訳サーバ3に送信する(S6)。例えば、制御部20は、発話者情報における「ホスト」又は「ゲスト」を示すタグ情報を用いて入力文をタグ付けして、ネットワークI/F25から翻訳サーバ3に送信する。翻訳サーバ3に送信される情報には、翻訳先の言語の指定情報などが含まれてもよい。 Next, the control unit 20 associates the acquired input sentence and speaker information with each other and transmits them to the translation server 3 (S6). For example, the control unit 20 tags the input sentence using tag information indicating “host” or “guest” in the speaker information, and transmits the tag to the translation server 3 from the network I / F 25. The information transmitted to the translation server 3 may include designation information for the language of the translation destination.
 翻訳サーバ3は、翻訳装置2から発話者情報が関連付けされた入力文を受信すると、例えば学習済みの翻訳モデル35に基づいて機械翻訳を実行する。これにより、翻訳サーバ3は、受信した入力文について、関連付けされた発話者情報が示す「ホスト」又は「ゲスト」に応じた翻訳結果を示すように、翻訳文を生成する。翻訳サーバ3は、生成した翻訳文を、テキストデータ等で翻訳装置2に送信する。翻訳装置2において、制御部20は、ネットワークI/F25を介して、翻訳サーバ3から翻訳文を示す情報を受信する(S7)。 When the translation server 3 receives the input sentence associated with the speaker information from the translation device 2, the translation server 3 executes machine translation based on the learned translation model 35, for example. As a result, the translation server 3 generates a translated sentence for the received input sentence so as to indicate a translation result corresponding to “host” or “guest” indicated by the associated speaker information. The translation server 3 transmits the generated translated sentence to the translation apparatus 2 using text data or the like. In the translation apparatus 2, the control unit 20 receives information indicating the translated sentence from the translation server 3 via the network I / F 25 (S7).
 次に、制御部20は、例えば翻訳文の音声出力などの翻訳結果の出力を実行する(S8)。例えば、制御部20は、翻訳文のテキストデータを音声合成サーバ12に送信し、音声合成サーバ12に翻訳文の音声合成の処理を行わせる。制御部20は、音声合成サーバ12から処理結果の音声データを受信して、スピーカ27による音声出力を制御する。音声出力に加えて又は代えて、制御部20は、翻訳文のテキスト画像等を表示部23に出力表示させてもよい。 Next, the control unit 20 outputs a translation result such as a voice output of a translated sentence (S8). For example, the control unit 20 transmits the text data of the translated sentence to the speech synthesis server 12 and causes the speech synthesis server 12 to perform speech synthesis processing of the translated sentence. The control unit 20 receives the voice data as a processing result from the voice synthesis server 12 and controls the voice output from the speaker 27. In addition to or instead of the voice output, the control unit 20 may output and display a text image of the translated sentence on the display unit 23.
 制御部20は、翻訳結果を出力する(S8)と、本フローチャートによる処理を終了する。ステップS8の処理により、翻訳装置2のユーザに翻訳結果が出力される。 When the control unit 20 outputs the translation result (S8), the process according to this flowchart is terminated. The translation result is output to the user of the translation apparatus 2 by the process of step S8.
 以上の処理によると、翻訳装置2におけるホスト用のマイク26a及びゲスト用のマイク26bは、対応する発話者の発話音声を入力することにより、入力文及び発話者情報の取得部として機能する(S1~S5)。翻訳装置2において、入力文の発話音声が入力される毎に対応する発話者情報が取得され(S1~S5)、発話者がホスト5aの場合とホスト5aの場合とで訳し分けるように、適切な翻訳文を出力できる(S6~S8)。 According to the above processing, the host microphone 26a and the guest microphone 26b in the translation apparatus 2 function as an input sentence and speaker information acquisition unit by inputting a corresponding speaker's speech (S1). To S5). In the translation device 2, the corresponding speaker information is acquired every time the utterance voice of the input sentence is input (S1 to S5), and appropriate so that the speaker is translated into the case of the host 5a and the case of the host 5a. Can output a translated sentence (S6 to S8).
2-1-2.ホストとゲストの訳し分けについて
 例えば、本実施形態の翻訳システム1によると、図4(a),図4(b)に示すように、発話による入力文51が同じであっても、発話者情報が「ホスト」であるか「ゲスト」であるかに基づき、別々の翻訳文61,62が翻訳装置2から出力される。図4(b)の例の翻訳文62は、図4(a)の翻訳文61中の「your」の代わりに「my」を含んでおり、ゲスト5bの発話時にゲスト5bを指す文言が、適切に訳出できている。
2-1-2. For example, according to the translation system 1 of the present embodiment, as shown in FIGS. 4A and 4B, even if the input sentence 51 by the utterance is the same, the speaker information Different translations 61 and 62 are output from the translation device 2 based on whether “I” is “host” or “guest”. The translated sentence 62 in the example of FIG. 4B includes “my” instead of “your” in the translated sentence 61 of FIG. 4A, and a word indicating the guest 5 b when the guest 5 b speaks, Translated properly.
 図4(a),図4(b)の例の訳し分けは、翻訳文61,62中の「your」又は「my」といった話題の主体が、入力文51では明示されないという曖昧性に起因している。このような主体の曖昧性は、例えば翻訳元が日本語である場合には頻繁に生じ得ると考えられる。本実施形態の翻訳システム1によると、入力文において主体が曖昧であっても、発話者情報を取得することにより、当該主体を黙示的に特定して、翻訳文の訳し分けを適切に行うことができる。特に、ホスト5aとゲスト5bのような相互の役回りが明確なユーザ間の関係においては、対話における話題の主体は発話者から相対的に推定し得る。 The translation of the example of FIGS. 4A and 4B is due to the ambiguity that the subject of the topic such as “your” or “my” in the translated sentences 61 and 62 is not clearly shown in the input sentence 51. ing. Such ambiguity of the subject may occur frequently when the translation source is Japanese, for example. According to the translation system 1 of the present embodiment, even if the subject is ambiguous in the input sentence, by acquiring the speaker information, the subject is identified implicitly, and the translation is appropriately translated. Can do. In particular, in a relationship between users such as the host 5a and the guest 5b whose roles are clear, the subject of the topic in the dialogue can be estimated relatively from the speaker.
 また、以上のような翻訳システム1における発話者情報を用いた訳し分けは、種々の曖昧な入力文に適用可能である。曖昧な入力文の別例について、図6(a),図6(b)を用いて説明する。 Also, the above-described translation using the speaker information in the translation system 1 can be applied to various ambiguous input sentences. Another example of an ambiguous input sentence will be described with reference to FIGS. 6 (a) and 6 (b).
 図6(a),図6(b)は、翻訳先が日本語の場合における翻訳システム1の使用例を示す。図6(a)は、発話者がホスト5aの使用例であり、図6(b)は、発話者がゲスト5bの使用例である。 FIGS. 6A and 6B show examples of using the translation system 1 when the translation destination is Japanese. FIG. 6A shows an example in which the speaker uses the host 5a, and FIG. 6B shows an example in which the speaker uses the guest 5b.
 図6(a),図6(b)では、空港のカウンターでホスト5aとゲスト5b間で手荷物の受け渡しが行われる場面において、「Do you have the bag ?」という英語の入力文52が、ホスト5a又はゲスト5bから発話された場合を例示している。このような場面において、上記の入力文52の内容は、ホスト5aからゲスト5bに言う場合とゲスト5bからホスト5aに言う場合とで、日本語的には異なる意味合いを有し、敬語の観点からも別の言葉遣いを用いることが適切と考えられる。そこで、本実施形態の翻訳システム1は、上述した処理によって発話者情報に基づく訳し分けを行う。 6 (a) and 6 (b), in a scene where baggage is delivered between the host 5a and the guest 5b at the airport counter, an English input sentence 52 "Do you have the bag?" The case where the speech is made from 5a or the guest 5b is illustrated. In such a situation, the contents of the above-mentioned input sentence 52 have different meanings in Japanese when the host 5a says to the guest 5b and when the guest 5b tells the host 5a, and from the viewpoint of honorifics However, it is considered appropriate to use another language. Therefore, the translation system 1 according to the present embodiment performs translation based on the speaker information by the above-described processing.
 具体的に、上記の入力文52に対して、図6(a)の例では、翻訳装置2は、「おかばんをお持ちでしょうか。」という日本語の翻訳文63を出力している。翻訳文63は、発話者が手荷物を預かる側のホスト5aがゲスト5bをもてなす発言として、自然な日本語と考えられる。また、翻訳文63は、発話者がゲスト5bのような相手の行為および所有物を敬う尊敬語的な表現の「お」を含み、適切な言葉遣いになっている。 Specifically, in the example of FIG. 6A, the translation device 2 outputs a Japanese translation sentence 63 “Do you have a bag?” For the input sentence 52 described above. The translated sentence 63 is considered to be natural Japanese as an utterance that the host 5a on the side of the baggage of the speaker treats the guest 5b. Moreover, the translated sentence 63 includes a respected expression “o” in which the speaker respects the actions and possessions of the other party such as the guest 5b, and is appropriately worded.
 一方、図6(b)の例では、翻訳装置2は、「かばんはありますか。」という日本語の翻訳文64を出力している。翻訳文64は、発話者が手荷物を預けた側のゲスト5bがホスト5aに確認を促す発言として、自然な日本語と考えられる。また、翻訳文64の言葉遣いは、ですます調でありながら過度な敬語を含まず、発話者がゲスト5bである場合には適切と考えられる。 On the other hand, in the example of FIG. 6B, the translation device 2 outputs a Japanese translation 64 "Do you have a bag?" The translated sentence 64 is considered to be natural Japanese as a remark that the guest 5b on the side where the speaker has checked his / her baggage prompts the host 5a to confirm. In addition, the language of the translated sentence 64 is considered to be appropriate when the speaker is the guest 5b without including excessive honorifics although it is increasingly toned.
 また、日本語には対話中のユーザ間の立場に依存するような強い敬語の概念が存在することから、翻訳先が日本語の場合には、入力文のみからでは翻訳文として適切な言葉遣いが曖昧になり得る。このような言葉遣いの曖昧性についても、本実施形態の翻訳方法によると、発話者情報に基づいて、発話者と相手方の立場に応じた適切な言葉遣いの翻訳文を得ることができる。 In addition, because there is a strong honorific concept in Japanese that depends on the position of the users during the conversation, if the translation destination is Japanese, the appropriate wording as a translated sentence from the input sentence alone Can be ambiguous. With regard to such ambiguity of wording, according to the translation method of the present embodiment, it is possible to obtain a translation with appropriate wording according to the positions of the speaker and the other party based on the speaker information.
 以上の例では、日本語が翻訳元または翻訳先で、英語が翻訳先または翻訳元である場合の訳し分けについて説明したが、本実施形態の翻訳方法は特に日本語や英語に限らず、種々の言語に適用可能である。発話者情報を用いた機械翻訳により、各種の言語における種々の常識に応じて、発話者が入力文の内容を翻訳先の言語で相手方に話した場合に適切となるように、翻訳文を訳し分けすることができる。 In the above example, the translation is described when Japanese is the translation source or translation destination and English is the translation destination or translation source. However, the translation method of the present embodiment is not limited to Japanese or English, and various Applicable to any language. Machine translation using speaker information translates the translation so that it is appropriate when the speaker speaks the content of the input sentence to the other party in the target language according to various common sense in various languages. Can be divided.
2-2.翻訳モデルの学習方法について
 以上のような翻訳システム1の翻訳方法における機械翻訳は、例えば機械学習によって実現することができる。本実施形態における学習方法について、図7,図8を参照して説明する。
2-2. About Translation Model Learning Method Machine translation in the translation method of the translation system 1 as described above can be realized by machine learning, for example. The learning method in the present embodiment will be described with reference to FIGS.
 図7は、本実施形態の学習方法における訓練データD1を説明するための図である。図8は、本実施形態に係る翻訳モデル35の学習方法の処理を例示するフローチャートである。 FIG. 7 is a diagram for explaining the training data D1 in the learning method of the present embodiment. FIG. 8 is a flowchart illustrating the process of the translation model 35 learning method according to this embodiment.
 本実施形態では、訓練データD1を用いた学習方法において、機械翻訳機の一例である翻訳サーバ3の翻訳モデル35の機械学習を行う例を説明する。訓練データD1は、例えば翻訳元及び翻訳先の言語間の対訳コーパスを構成する。図7(a)は、翻訳元が日本語であり、翻訳先が英語である場合を例示している。図7(b)は、翻訳元が英語であり、翻訳先が日本語である場合を例示している。 In the present embodiment, an example of performing machine learning of the translation model 35 of the translation server 3 which is an example of a machine translator in the learning method using the training data D1 will be described. The training data D1 constitutes a bilingual corpus between the translation source language and the translation destination language, for example. FIG. 7A illustrates a case where the translation source is Japanese and the translation destination is English. FIG. 7B illustrates a case where the translation source is English and the translation destination is Japanese.
 訓練データD1は、例えば、図7(a),図7(b)に示すように「発話者情報」と「原言語文」と「目的言語文」とを対応付けて記録する。「原言語文」は、翻訳モデル35に学習させるための入力文の例文であり、翻訳元の言語で記述される。「目的言語文」は、対応する「原言語文」を翻訳先の言語に翻訳する際に「発話者情報」に基づいた翻訳文の正解を示す。本実施形態の訓練データD1では、例えば「ホスト」又は「ゲスト」のタグ付けにより、発話者情報が、原言語文及び目的言語文の組に関連付けされている。各々の目的言語文は、対応する発話者情報において入力文の内容が発話された場合に自然な表現を含む。 Training data D1 records “speaker information”, “source language sentence”, and “target language sentence” in association with each other, for example, as shown in FIGS. 7 (a) and 7 (b). The “source language sentence” is an example sentence of an input sentence for causing the translation model 35 to learn, and is described in the language of the translation source. The “target language sentence” indicates the correct answer of the translated sentence based on the “speaker information” when the corresponding “source language sentence” is translated into the language of the translation destination. In the training data D1 of the present embodiment, the speaker information is associated with a set of source language sentences and target language sentences by tagging “host” or “guest”, for example. Each target language sentence includes a natural expression when the content of the input sentence is uttered in the corresponding speaker information.
 例えば、図7(a)に示すように、訓練データD1において、「いつ出発しますか。」という日本語の原言語文は、発話者情報が「ホスト」のとき目的言語文「When do you start?」に対応付けされている。また、上記と同じ内容の原言語文は、発話者情報が「ゲスト」のとき目的言語文「When do we start?」に対応付けされている。上記の2つの目的言語文には、発話者情報の違いに応じて異なる主語を含んでいる。なお、訓練データD1には、「ホスト」及び「ゲスト」の一方のみに対応付けされた原言語文が含まれてもよい。 For example, as shown in FIG. 7A, in the training data D1, the Japanese source language sentence “When do you leave?” Is the target language sentence “When do you” when the speaker information is “host”. is associated with “start?”. The source language sentence having the same content as described above is associated with the target language sentence “When do we start?” When the speaker information is “guest”. The above two target language sentences include different subjects depending on the difference in the speaker information. The training data D1 may include a source language sentence associated with only one of “host” and “guest”.
 以上のような訓練データD1を用いた学習方法の処理を図8に例示する。図8に示すフローチャートの各処理は、例えば翻訳サーバ3の演算処理部30によって実行される。本フローチャートは、例えば記憶部31に、訓練データD1が格納されると共に、翻訳モデル35のパラメータ群は初期値に設定された状態、つまり学習対象の翻訳モデル35を準備した状態で開始する。 The processing of the learning method using the training data D1 as described above is illustrated in FIG. Each process of the flowchart shown in FIG. 8 is executed by, for example, the arithmetic processing unit 30 of the translation server 3. The flowchart starts with, for example, the training data D1 stored in the storage unit 31 and the parameter group of the translation model 35 set to initial values, that is, the translation model 35 to be learned is prepared.
 まず、演算処理部30は、記憶部31における訓練データD1を参照して、訓練データD1において対応付けされた原言語文及び発話者情報を、学習対象の翻訳モデル35に入力する(S11)。発話者情報は、例えばタグ情報として原言語文に関連付けされて、翻訳モデル35に入力される。翻訳先の言語は、例えば予め指定される。 First, the arithmetic processing unit 30 refers to the training data D1 in the storage unit 31 and inputs the source language sentence and the speaker information associated with the training data D1 to the translation model 35 to be learned (S11). The speaker information is input to the translation model 35 in association with the source language sentence as tag information, for example. The language of the translation destination is designated in advance, for example.
 次に、演算処理部30は、学習中の翻訳モデル35において、入力した情報に基づく機械翻訳を実行する(S12)。ステップS12において、演算処理部30は、現状のパラメータ群に従って、翻訳モデル35に翻訳文を生成させる。 Next, the arithmetic processing unit 30 executes machine translation based on the input information in the translation model 35 being learned (S12). In step S12, the arithmetic processing unit 30 causes the translation model 35 to generate a translation sentence according to the current parameter group.
 次に、演算処理部30は、学習中の翻訳モデル35による翻訳結果の翻訳文と、対応する目的言語文との間の誤差に基づいて、パラメータ群を調整する(S13)。ステップS13の処理は、訓練データD1において、翻訳モデル35に入力した原言語文に対応付けされた目的言語文を参照し、誤差逆伝播法などに従って行われる。 Next, the arithmetic processing unit 30 adjusts the parameter group based on the error between the translated sentence obtained by the translation model 35 being learned and the corresponding target language sentence (S13). The processing in step S13 is performed according to the error back propagation method or the like with reference to the target language sentence associated with the source language sentence input to the translation model 35 in the training data D1.
 次に、演算処理部30は、所定の学習終了条件に基づいて、翻訳モデル35の学習が完了したか否かを判断する(S14)。学習終了条件は、例えば学習の回数などにより予め設定される。演算処理部30は、翻訳モデル35の学習が完了していない場合(S14でNO)、ステップS11以降の処理を行う。ステップS11~S13の処理を繰り返す毎に、翻訳モデル35のパラメータ群が更新される。 Next, the arithmetic processing unit 30 determines whether learning of the translation model 35 is completed based on a predetermined learning end condition (S14). The learning end condition is set in advance according to, for example, the number of learnings. If the learning of the translation model 35 has not been completed (NO in S14), the arithmetic processing unit 30 performs the processes after step S11. Each time the processes of steps S11 to S13 are repeated, the parameter group of the translation model 35 is updated.
 演算処理部30は、翻訳モデル35の学習が完了すると(S14でYES)、最終的なパラメータ群の値を記憶部31に記録して、学習済みの翻訳モデル35を規定するパラメータ群を決定する(S15)。 When the learning of the translation model 35 is completed (YES in S14), the arithmetic processing unit 30 records the final parameter group values in the storage unit 31, and determines the parameter group that defines the learned translation model 35. (S15).
 演算処理部30は、学習済みの翻訳モデル35のパラメータ群を決定することにより(S15)、図8のフローチャートによる処理を終了する。 The arithmetic processing unit 30 determines the parameter group of the learned translation model 35 (S15), and ends the process according to the flowchart of FIG.
 以上の処理によると、発話者情報を含めた訓練データD1による機械学習により、発話者情報に応じた翻訳文の訳し分けを学習させた翻訳モデル35を得られる。本実施形態の学習方法によると、ホスト5aとゲスト5b間の適切な訳し分けを行うための、各種場面で自然と考えられる常識を獲得した翻訳モデル35を生成することができる。 According to the above processing, the translation model 35 in which the translation of the translation sentence according to the speaker information is learned can be obtained by machine learning using the training data D1 including the speaker information. According to the learning method of the present embodiment, it is possible to generate a translation model 35 that has acquired common sense that is considered natural in various scenes in order to perform appropriate translation between the host 5a and the guest 5b.
 以上の処理におけるステップS12と同様の処理は、学習済みの翻訳モデル35において、図5のステップS6,S7間に行われる。機械学習によって翻訳モデル35が獲得した常識に基づいて、翻訳文の訳し分けを適切にすることができる。 The same processing as step S12 in the above processing is performed between steps S6 and S7 in FIG. 5 in the learned translation model 35. Based on common sense acquired by the translation model 35 by machine learning, it is possible to appropriately translate the translated sentence.
 以上の説明では、翻訳サーバ3の演算処理部30が学習方法の処理を実行する例を説明した。学習方法の処理は、翻訳サーバ3とは別の各種コンピュータにおいて行われてもよい。生成された学習済みの翻訳モデル35は適宜、提供可能である。 In the above description, the example in which the arithmetic processing unit 30 of the translation server 3 executes the processing of the learning method has been described. The processing of the learning method may be performed on various computers different from the translation server 3. The generated learned translation model 35 can be provided as appropriate.
 以上の説明では、図7(a),図7(b)に示すように、「発話者情報」に基づいて「原言語文」と「目的言語文」とを対応付ける訓練データD1を例示した。訓練データD1はこれに限らず、例えば原言語文と目的言語文との区別に代えて、各種の言語において対訳となる例文を、発話者情報に基づいて対応付けて記録してもよい。翻訳モデル35の学習時に適宜、翻訳元の言語を指定して、訓練データD1から学習させる原言語文が選択されてもよい。 In the above description, as shown in FIGS. 7A and 7B, the training data D1 associating the “source language sentence” with the “target language sentence” based on the “speaker information” is exemplified. The training data D1 is not limited to this. For example, instead of distinguishing between the source language sentence and the target language sentence, example sentences that are translated in various languages may be recorded in association with each other based on the speaker information. A source language sentence to be learned from the training data D1 may be selected by appropriately specifying a translation source language when learning the translation model 35.
3.まとめ
 以上のように、本実施形態において、翻訳装置2は、ホスト5a及びゲスト5b等において、発話者(即ち翻訳元のユーザ)の入力に応じて、機械翻訳の結果を相手方(即ち翻訳先のユーザ)に出力する。翻訳装置2は、第1及び第2取得部の一例としての2つのマイク26a,26bと、制御部20と、出力部の一例のスピーカ27とを備える。第1取得部として各マイク26a,26bは、翻訳元の言語における入力文を取得する。第2取得部として各マイク26a,26bは、入力文に関連するユーザ情報の一例である発話者情報を取得する。制御部20は、入力文及び発話者情報に基づいて、翻訳先の言語においてユーザ情報に応じた入力文の翻訳結果を示す翻訳文を取得する。スピーカ27は、音声出力により翻訳文を出力する。発話者情報は、関連した入力文についての発話者の役割を示す情報を含む。
3. Summary As described above, in this embodiment, the translation device 2 uses the host 5a, the guest 5b, and the like to send the machine translation result to the counterpart (ie, the translation destination) according to the input of the speaker (ie, the translation source user). Output to the user. The translation device 2 includes two microphones 26a and 26b as an example of first and second acquisition units, a control unit 20, and a speaker 27 as an example of an output unit. As the first acquisition unit, each of the microphones 26a and 26b acquires an input sentence in the language of the translation source. Each microphone 26a, 26b as the second acquisition unit acquires speaker information, which is an example of user information related to the input sentence. Based on the input sentence and the speaker information, the control unit 20 obtains a translated sentence indicating the translation result of the input sentence corresponding to the user information in the translation destination language. The speaker 27 outputs a translated sentence by voice output. The speaker information includes information indicating the role of the speaker regarding the related input sentence.
 以上の翻訳装置2によると、例えばホスト5a及びゲスト5b間の対話時に、発話者による入力文の発話に応じて取得される発話者情報に基づいて、機械翻訳において発話者に応じた訳し分けを行うことができる。 According to the translation apparatus 2 described above, for example, during the dialogue between the host 5a and the guest 5b, based on the speaker information acquired according to the utterance of the input sentence by the speaker, the translation according to the speaker is performed in the machine translation. It can be carried out.
 本実施形態において、発話者情報における発話者と相手方とのうちの少なくとも一方の役割は、「ホスト」と「ゲスト」の少なくとも一方を含む。これにより、ホスト5aとゲスト5bによる種々の場面において、各々の役回りとして適切な訳し分けを実現することができる。 In the present embodiment, the role of at least one of the speaker and the other party in the speaker information includes at least one of “host” and “guest”. Thereby, in various scenes by the host 5a and the guest 5b, it is possible to realize appropriate translation as each role.
 また、本実施形態における第1及び第2取得部は、複数のマイク26a,26bに限らず、それぞれマイク26a,26bの一方、操作部22、ネットワークI/F25及び機器I/F24のうちの少なくとも一つを含んでもよい。このような一例について、図9を用いて説明する。 In addition, the first and second acquisition units in the present embodiment are not limited to the plurality of microphones 26a and 26b, but are at least one of the microphones 26a and 26b, the operation unit 22, the network I / F 25, and the device I / F 24, respectively. One may be included. Such an example will be described with reference to FIG.
 図9は、翻訳装置2における表示部23の表示例を示す。本例において、操作部22と共にタッチパネルを構成する表示部23は、ホスト用の発話アイコン23aと、ゲスト用の発話アイコン23bと、入力文領域23cと、翻訳文領域23dとを表示している。 FIG. 9 shows a display example of the display unit 23 in the translation apparatus 2. In this example, the display unit 23 that constitutes the touch panel together with the operation unit 22 displays a host utterance icon 23a, a guest utterance icon 23b, an input sentence region 23c, and a translated sentence region 23d.
 各発話アイコン23a,23bは、対応するユーザが発話を開始するためのタッチ操作を操作部22に入力するためのアイコンである。入力文領域23cには、発話に応じて入力文の画像が表示される。翻訳文領域23dには、入力文の翻訳結果に応じて翻訳文の画像が表示される。 The utterance icons 23 a and 23 b are icons for inputting a touch operation for the corresponding user to start utterance to the operation unit 22. In the input sentence area 23c, an image of the input sentence is displayed according to the utterance. In the translated sentence area 23d, an image of the translated sentence is displayed according to the translation result of the input sentence.
 本例では、発話アイコン23a,23bを用いて、操作部22が第2取得部として機能する。例えば、翻訳装置2は、2つの発話アイコン23a,23bの一方がタッチされたときに、図5のフローチャートのような処理を開始する。この場合、例えば制御部20は、図5のステップS3~S5の代わりに、操作部22がホスト用の発話アイコン23aの操作を入力すると「ホスト」とし、ゲスト用の発話アイコン23bの操作を入力すると「ゲスト」とするように、発話者情報を設定することができる。 In this example, the operation unit 22 functions as a second acquisition unit using the speech icons 23a and 23b. For example, the translation device 2 starts processing as shown in the flowchart of FIG. 5 when one of the two utterance icons 23a and 23b is touched. In this case, for example, instead of steps S3 to S5 in FIG. 5, the control unit 20 sets “host” when the operation unit 22 inputs the operation of the host utterance icon 23a, and inputs the operation of the guest utterance icon 23b. Then, the speaker information can be set so as to be “guest”.
 また、発話者情報の取得は、例えば通信I/F25又は機器I/F24を介して得られる情報、或いは入力文の音声認識の結果に基づいて行われてもよい。例えば、音声認識で認識された言語が、ホスト5aとゲスト5bの各々に対応する言語の何れに該当するのかを示す情報に基づいて、発話者情報の「ホスト」又は「ゲスト」が設定されてもよい。 In addition, the acquisition of the speaker information may be performed based on, for example, information obtained via the communication I / F 25 or the device I / F 24 or the result of speech recognition of the input sentence. For example, “host” or “guest” of the speaker information is set based on information indicating which of the languages recognized by the speech recognition corresponds to each of the host 5a and the guest 5b. Also good.
 本実施形態において、翻訳装置2は、外部の翻訳サーバ3に通信する通信部の一例としてネットワークI/F25をさらに備える。制御部20は、ネットワークI/F25を介して、入力文および発話者情報を翻訳サーバ3に送信し、翻訳サーバ3から、送信した入力文の翻訳文を受信する。これにより、例えば発話者情報に基づく翻訳モデル35の学習結果を適用でき、翻訳サーバ3が翻訳装置2からの発話者情報に応じて適切な翻訳文を生成することができる。 In this embodiment, the translation apparatus 2 further includes a network I / F 25 as an example of a communication unit that communicates with an external translation server 3. The control unit 20 transmits the input sentence and the speaker information to the translation server 3 via the network I / F 25, and receives the translated sentence of the transmitted input sentence from the translation server 3. Thereby, for example, the learning result of the translation model 35 based on the speaker information can be applied, and the translation server 3 can generate an appropriate translation sentence according to the speaker information from the translation device 2.
 本実施形態における翻訳システム1は、翻訳装置2と、機械翻訳機の一例である翻訳サーバ3とを備える。翻訳サーバ3は、翻訳装置2において取得された情報に基づき機械翻訳を行って、翻訳文を生成する。翻訳システム1において、発話者情報を用いることにより、機械翻訳において発話者に応じた訳し分けを行うことができる。 The translation system 1 in this embodiment includes a translation device 2 and a translation server 3 that is an example of a machine translator. The translation server 3 performs machine translation based on the information acquired by the translation device 2 and generates a translation. By using the speaker information in the translation system 1, it is possible to perform translation according to the speaker in machine translation.
 本実施形態における翻訳方法は、発話者からの入力に応じて、相手方に出力される翻訳結果を生成するように機械翻訳を実行する。本方法は、第1取得部が、翻訳元の言語における入力文を取得するステップ(S1,S2)と、第2取得部が、入力文に関連する発話者情報を取得するステップ(S3~S5)とを含む。発話者情報は、関連した入力文についての発話者と相手方とのうちの少なくとも一方の役割を示す情報を含む。本方法は、制御部が、入力文及び発話者情報に基づいて、翻訳先の言語において発話者情報に応じた入力文の翻訳結果を示す翻訳文を取得するステップ(S6,S7)と、出力部が、翻訳文を出力するステップ(S8)を含む。本方法によると、発話者情報を用いることにより、機械翻訳において発話者に応じた訳し分けを行うことができる。 The translation method in the present embodiment executes machine translation so as to generate a translation result output to the other party in response to an input from the speaker. In this method, the first acquisition unit obtains an input sentence in a source language (S1, S2), and the second acquisition unit obtains speaker information related to the input sentence (S3 to S5). ). The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. In this method, the control unit obtains a translated sentence indicating a translation result of the input sentence according to the speaker information in the translation destination language based on the input sentence and the speaker information (S6, S7), and an output Includes a step (S8) of outputting a translated sentence. According to this method, by using the speaker information, it is possible to perform translation according to the speaker in machine translation.
 本実施形態におけるプログラムは、発話者の入力に応じて、機械翻訳の結果を相手方に出力する処理を翻訳装置2等のコンピュータに実行させる。本プログラムは、コンピュータが、翻訳元の言語における入力文を取得するステップ(S1,S2)と、入力文に関連する発話者情報を取得するステップ(S3~S5)とを含む。発話者情報は、関連した入力文についての発話者と相手方とのうちの少なくとも一方の役割を示す情報を含む。本プログラムは、コンピュータが、入力文及び発話者情報に基づいて、翻訳先の言語において発話者情報に応じた入力文の翻訳結果を示す翻訳文を取得するステップ(S6,S7)と、翻訳文を出力するステップ(S8)とを含む。本プログラムによると、発話者情報を用いることにより、機械翻訳において発話者に応じた訳し分けを行うことができる。 The program in the present embodiment causes a computer such as the translation device 2 to execute a process of outputting the result of machine translation to the other party in accordance with the input of the speaker. The program includes a step (S1, S2) in which the computer acquires an input sentence in the language of the translation source, and a step (S3-S5) in which speaker information related to the input sentence is acquired. The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. In this program, the computer acquires a translated sentence indicating a translation result of the input sentence according to the speaker information in the language to be translated based on the input sentence and the speaker information (S6, S7); (S8). According to this program, by using the speaker information, it is possible to perform translation according to the speaker in machine translation.
 本実施形態における学習方法は、翻訳サーバ3等のコンピュータの機械学習において、発話者から相手方への機械翻訳が実現される翻訳モデル35を得る、翻訳モデル35の生成方法である。コンピュータの記憶部31には、機械学習に基づき翻訳モデル35を規定するパラメータ群が格納されている。本方法は、コンピュータが、翻訳元の言語における入力文と発話者情報とを関連付けた情報を、学習中の翻訳モデル35に入力して、当該翻訳モデル35に翻訳文を生成させるステップ(S11,S12)を含む。発話者情報は、関連した入力文についての発話者と相手方とのうちの少なくとも一方の役割を示す情報を含む。本方法は、コンピュータが、生成された翻訳文に応じて、パラメータ群を調整するステップ(S13)を含む。本方法によると、機械翻訳において発話者に応じた訳し分けを行うことを学習させた翻訳モデル35を生成することができる。 The learning method in the present embodiment is a method for generating a translation model 35 that obtains a translation model 35 in which machine translation from a speaker to a partner is realized in machine learning of a computer such as the translation server 3. The storage unit 31 of the computer stores a parameter group that defines the translation model 35 based on machine learning. In this method, the computer inputs information relating the input sentence and the speaker information in the language of the translation source to the translation model 35 being learned, and causes the translation model 35 to generate a translation sentence (S11, S12). The speaker information includes information indicating the role of at least one of the speaker and the other party regarding the related input sentence. The method includes a step (S13) in which the computer adjusts the parameter group according to the generated translation. According to this method, it is possible to generate a translation model 35 learned to perform translation according to the speaker in machine translation.
(実施形態2)
 以下、図面を用いて、実施形態2を説明する。実施形態1では、ホスト及びゲストの双方の発話について学習させた機械翻訳機を用いる例を説明した。実施形態2では、ホスト用とゲスト用の複数の機械翻訳機を用いる例を説明する。
(Embodiment 2)
The second embodiment will be described below with reference to the drawings. In the first embodiment, the example using the machine translator that has learned the utterances of both the host and the guest has been described. In the second embodiment, an example in which a plurality of machine translators for hosts and guests is used will be described.
 以下、実施形態1に係る翻訳システム1と同様の構成、動作の説明は適宜、省略して、本実施形態に係る翻訳システムを説明する。 Hereinafter, description of the configuration and operation similar to those of the translation system 1 according to the first embodiment will be omitted as appropriate, and the translation system according to the present embodiment will be described.
 図10は、実施形態2に係る翻訳システム1Aの概要を示す図である。本実施形態に係る翻訳システム1Aは、図10に示すように、実施形態1と同様の構成において1つの翻訳サーバ3の代わりに、ホスト用の翻訳サーバ3aと、ゲスト用の翻訳サーバ3bとを備える。2つの翻訳サーバ3a,3bは、本実施形態における複数の機械翻訳機の一例である。 FIG. 10 is a diagram showing an outline of the translation system 1A according to the second embodiment. As shown in FIG. 10, the translation system 1A according to the present embodiment includes a host translation server 3a and a guest translation server 3b instead of one translation server 3 in the same configuration as in the first embodiment. Prepare. The two translation servers 3a and 3b are an example of a plurality of machine translators in the present embodiment.
 各翻訳サーバ3a,3bは、それぞれ例えば実施形態1の翻訳サーバ3と同様に構成される。例えば、ホスト用の翻訳サーバ3aは、ホストの発話についての機械学習が為された翻訳モデルを有する。また、ゲスト用の翻訳サーバ3bは、ゲストの発話についての機械学習が為された翻訳モデルを有する。 Each translation server 3a, 3b is configured in the same manner as the translation server 3 of the first embodiment, for example. For example, the host translation server 3a has a translation model in which machine learning about the utterance of the host is performed. The guest translation server 3b has a translation model in which machine learning about a guest's utterance is performed.
 図11は、本実施形態に係る翻訳装置2の処理を例示するフローチャートである。本実施形態において、翻訳装置2の制御部20は、例えば図5のステップS6の代わりにステップS6A,S6Bを行う。 FIG. 11 is a flowchart illustrating the processing of the translation apparatus 2 according to this embodiment. In this embodiment, the control part 20 of the translation apparatus 2 performs step S6A, S6B instead of step S6 of FIG. 5, for example.
 具体的に、制御部20は、発話者情報が「ホスト」である場合(S4)、ネットワークI/Fからホスト用の翻訳サーバ3aに、取得した入力文を送信する(S6A)。一方、発話者情報が「ゲスト」である場合(S5)、翻訳装置2は、ネットワークI/Fからゲスト用の翻訳サーバ3bに、取得した入力文を送信する(S6B)。これにより、翻訳装置2は、2つの翻訳サーバ3a,3bから発話者情報に応じて選択した翻訳サーバから翻訳結果の翻訳文を受信する(S7)。 Specifically, when the speaker information is “host” (S4), the control unit 20 transmits the acquired input sentence from the network I / F to the host translation server 3a (S6A). On the other hand, when the speaker information is “guest” (S5), the translation apparatus 2 transmits the acquired input sentence from the network I / F to the guest translation server 3b (S6B). Thereby, the translation apparatus 2 receives the translation sentence of a translation result from the translation server selected according to speaker information from the two translation servers 3a and 3b (S7).
 以上のように、本実施形態の翻訳装置2において、通信I/F25は、外部の複数の機械翻訳機である各翻訳サーバ3a,3bに通信する。制御部20は、通信I/F25を介して、発話者情報などのユーザ情報に応じて別々の機械翻訳機に、入力文を送信する(S6A,S6B)。制御部20は、通信I/F25を介して、当該機械翻訳機から、送信した入力文の翻訳文を受信する(S7)。これによっても、発話者による入力文の発話に応じて取得される発話者情報に基づいて、機械翻訳において発話者に応じた訳し分けを行うことができる。 As described above, in the translation apparatus 2 of the present embodiment, the communication I / F 25 communicates with the translation servers 3a and 3b, which are a plurality of external machine translators. The control unit 20 transmits an input sentence to different machine translators according to user information such as speaker information via the communication I / F 25 (S6A, S6B). The control unit 20 receives the translated sentence of the transmitted input sentence from the machine translator via the communication I / F 25 (S7). Also according to this, based on the speaker information acquired according to the utterance of the input sentence by the speaker, translation according to the speaker can be performed in the machine translation.
(他の実施形態)
 以上のように、本出願において開示する技術の例示として、実施形態1~2を説明した。しかしながら、本開示における技術は、これに限定されず、適宜、変更、置換、付加、省略などを行った実施の形態にも適用可能である。また、上記各実施形態で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。そこで、以下、他の実施形態を例示する。
(Other embodiments)
As described above, Embodiments 1 and 2 have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can also be applied to an embodiment in which changes, substitutions, additions, omissions, and the like are appropriately performed. Moreover, it is also possible to combine each component demonstrated by each said embodiment into a new embodiment. Accordingly, other embodiments will be exemplified below.
 上記の実施形態1,2では、ユーザ情報の一例として発話者情報を説明した。ユーザ情報は発話者情報に限らず、例えば発話者の相手方についての情報であってもよい。例えば、発話者がホスト5a又はゲスト5bである場合に、それぞれ相手方のユーザ情報「ゲスト」又は「ホスト」が取得されることにより、実施形態1と同様に適切な訳し分けを実現できる。また、ユーザ情報は、発話者と相手方(即ち翻訳元及び翻訳先のユーザ)の双方についての情報を含んでもよい。 In the first and second embodiments, the speaker information is described as an example of the user information. The user information is not limited to the speaker information, but may be information about the other party of the speaker, for example. For example, when the speaker is the host 5a or the guest 5b, by acquiring the user information “guest” or “host” of the other party, appropriate translation can be realized as in the first embodiment. Further, the user information may include information about both the speaker and the other party (that is, the translation source and translation destination users).
 また、上記の各実施形態では、ユーザ情報が示すユーザの役割として「ホスト」と「ゲスト」を例示した。ユーザ情報における役割はこれに限らず、例えば「教師」と「生徒」、或いは「上司」と「部下」など、相互に関係するような種々の役割であってもよい。これによっても、ユーザ情報が示す役割に応じて、適切な訳し分けを実現することができる。 Further, in each of the above embodiments, “host” and “guest” are exemplified as user roles indicated by the user information. The role in the user information is not limited to this, and may be various roles such as “teacher” and “student” or “superior” and “subordinate”. This also makes it possible to realize appropriate translation according to the role indicated by the user information.
 また、ユーザ情報は、上記のような役割に加えて、さらに他のユーザに関する付加情報を含んでもよい。例えば、ユーザ情報は、翻訳元のユーザの性別及び年代、翻訳先のユーザの性別及び年代、並びに翻訳元及び翻訳先のユーザ間の対話の場面のうちの少なくとも一つを示す情報を含んでもよい。例えば、各種の付加情報に応じた追加のタグ付けが、入力文及び訓練データD1等に為されてもよい。これにより、各種の付加情報に応じて、例えば大人/子供、男/女に応じた言葉遣いなどの適切な訳し分けを行える。 Further, the user information may include additional information related to other users in addition to the above role. For example, the user information may include information indicating at least one of the gender and age of the translation source user, the gender and age of the translation destination user, and the scene of the interaction between the translation source and translation destination users. . For example, additional tagging according to various types of additional information may be performed on the input sentence, the training data D1, and the like. Thereby, according to various kinds of additional information, for example, wording according to an adult / child or a man / woman can be appropriately translated.
 また、ユーザ情報は、入力文の機械翻訳を行う前に、入力文を訂正する処理に用いられてもよい。例えば、制御部20又は演算処理部30は、入力文が主体の曖昧性を有する場合に、ユーザ情報に基づいて曖昧な主体を補完するように、入力文を訂正してもよい。これによっても、訂正後の入力文を機械翻訳することにより、ユーザ情報に応じて適切な主体を含めた訳し分けを実現できる。 Further, the user information may be used for a process of correcting the input sentence before machine translation of the input sentence. For example, when the input sentence has an ambiguity of the subject, the control unit 20 or the arithmetic processing unit 30 may correct the input sentence so as to complement the ambiguous subject based on the user information. Also by this, by translating the corrected input sentence, it is possible to realize the translation including the proper subject according to the user information.
 また、上記の各実施形態では、入力文が音声入力される翻訳システム1,1Aについて説明した。本実施形態において、入力文は音声入力されなくてもよく、例えばテキスト入力されてもよい。例えば、翻訳元のユーザは、発話する代わりに、操作部22の操作によってテキストの入力文を翻訳装置2に入力してもうよい。本実施形態の翻訳システムは、音声認識機能を省略できる。また、本実施形態の翻訳システムは、音声合成機能を省略してもよく、例えば表示部23の表示によって翻訳文を出力してもよい。 Also, in each of the above-described embodiments, the translation systems 1 and 1A in which an input sentence is input by voice have been described. In the present embodiment, the input sentence may not be input by voice, for example, may be input by text. For example, the translation source user may input a text input sentence into the translation device 2 by operating the operation unit 22 instead of speaking. The translation system of this embodiment can omit the voice recognition function. Moreover, the translation system of this embodiment may abbreviate | omit a speech synthesis function, for example, may output a translation sentence by the display of the display part 23. FIG.
 また、上記の各実施形態では、翻訳装置2の外部の機械翻訳機において入力文の機械翻訳が行われる例を説明した。本実施形態では、翻訳装置2の内部で機械翻訳が行われてもよい。例えば、翻訳装置2の記憶部21に翻訳モデル35と同様のプログラムを格納しておき、制御部20が当該プログラムを実行することにより、取得した入力文及び発話者情報に応じた翻訳文を生成してもよい。 Also, in each of the above embodiments, an example in which machine translation of an input sentence is performed in a machine translator outside the translation apparatus 2 has been described. In the present embodiment, machine translation may be performed inside the translation apparatus 2. For example, a program similar to the translation model 35 is stored in the storage unit 21 of the translation device 2, and the control unit 20 executes the program to generate a translation according to the acquired input sentence and speaker information. May be.
 また、上記の各実施形態において、機械翻訳機の翻訳モデル35が、ニューラルネットワークで構成される例を説明した。本実施形態における機械翻訳機の翻訳モデルはこれに限らず、例えば確率的なモデルで構成されてもよい。また、本実施形態の機械翻訳機及び翻訳モデルは、必ずしも機械学習に基づくものでなくてもよい。 In each of the above embodiments, the example in which the translation model 35 of the machine translator is configured by a neural network has been described. The translation model of the machine translator in the present embodiment is not limited to this, and may be constituted by a probabilistic model, for example. Further, the machine translator and the translation model of the present embodiment are not necessarily based on machine learning.
 以上のように、本開示における技術の例示として、実施の形態を説明した。そのために、添付図面および詳細な説明を提供した。 As described above, the embodiments have been described as examples of the technology in the present disclosure. For this purpose, the accompanying drawings and detailed description are provided.
 したがって、添付図面および詳細な説明に記載された構成要素の中には、課題解決のために必須な構成要素だけでなく、上記技術を例示するために、課題解決のためには必須でない構成要素も含まれ得る。そのため、それらの必須ではない構成要素が添付図面や詳細な説明に記載されていることをもって、直ちに、それらの必須ではない構成要素が必須であるとの認定をするべきではない。 Accordingly, among the components described in the accompanying drawings and the detailed description, not only the components essential for solving the problem, but also the components not essential for solving the problem in order to illustrate the above technique. May also be included. Therefore, it should not be immediately recognized that these non-essential components are essential as those non-essential components are described in the accompanying drawings and detailed description.
 また、上述の実施の形態は、本開示における技術を例示するためのものであるから、特許請求の範囲またはその均等の範囲において、種々の変更、置換、付加、省略などを行うことができる。 In addition, since the above-described embodiments are for illustrating the technique in the present disclosure, various modifications, substitutions, additions, omissions, and the like can be made within the scope of the claims or an equivalent scope thereof.
 本開示に係る翻訳装置、システム、方法及びプログラム並びに学習方法は、各種の場面における機械翻訳に適用可能である。 The translation device, system, method, program, and learning method according to the present disclosure can be applied to machine translation in various scenes.

Claims (10)

  1.  翻訳元のユーザの入力に応じて、機械翻訳の結果を翻訳先のユーザに出力する翻訳装置であって、
     翻訳元の言語における入力文を取得する第1取得部と、
     前記入力文に関連するユーザ情報を取得する第2取得部と、
     前記入力文及び前記ユーザ情報に基づいて、翻訳先の言語において前記ユーザ情報に応じた前記入力文の翻訳結果を示す翻訳文を取得する制御部と、
     前記翻訳文を出力する出力部とを備え、
     前記ユーザ情報は、関連した入力文についての前記翻訳元のユーザと前記翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む
    翻訳装置。
    A translation device that outputs a result of machine translation to a translation destination user in response to an input of a translation source user,
    A first acquisition unit for acquiring an input sentence in the language of the translation source;
    A second acquisition unit for acquiring user information related to the input sentence;
    Based on the input sentence and the user information, a control unit that acquires a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
    An output unit for outputting the translated sentence,
    The said user information is a translation apparatus containing the information which shows the role of at least one of the said translation origin user and the said translation destination user regarding a related input sentence.
  2.  前記ユーザ情報における前記翻訳元のユーザと前記翻訳先のユーザとの役割において、一方の役割はホストであり、他方の役割はゲストである
    請求項1に記載の翻訳装置。
    2. The translation apparatus according to claim 1, wherein one of the roles of the translation source user and the translation destination user in the user information is a host and the other role is a guest.
  3.  前記ユーザ情報は、前記翻訳元のユーザの性別及び年代、前記翻訳先のユーザの性別及び年代、並びに前記翻訳元及び翻訳先のユーザ間の対話の場面のうちの少なくとも一つを示す情報を含む
    請求項1又は2に記載の翻訳装置。
    The user information includes information indicating at least one of the gender and age of the translation source user, the gender and age of the translation destination user, and a scene of the dialogue between the translation source and translation destination users. The translation apparatus according to claim 1 or 2.
  4.  前記第1及び第2取得部は、それぞれ一つ又は複数のマイク、操作部、ネットワークインタフェース及び機器インタフェースのうちの少なくとも一つを含む
    請求項1~3のいずれか1項に記載の翻訳装置。
    The translation apparatus according to any one of claims 1 to 3, wherein each of the first and second acquisition units includes at least one of one or a plurality of microphones, an operation unit, a network interface, and a device interface.
  5.  外部の機械翻訳機に通信する通信部をさらに備え、
     前記制御部は、前記通信部を介して、前記入力文および前記ユーザ情報を前記機械翻訳機に送信し、前記機械翻訳機から、前記入力文の翻訳文を受信する
    請求項1~4のいずれか1項に記載の翻訳装置。
    A communication unit that communicates with an external machine translator;
    5. The control unit according to claim 1, wherein the control unit transmits the input sentence and the user information to the machine translator via the communication unit, and receives a translated sentence of the input sentence from the machine translator. The translation apparatus according to claim 1.
  6.  外部の複数の機械翻訳機に通信する通信部をさらに備え、
     前記制御部は、前記通信部を介して、複数の機械翻訳機から前記ユーザ情報に応じて選択した機械翻訳機に、前記入力文を送信し、当該機械翻訳機から、前記入力文の翻訳文を受信する
    請求項1~4のいずれか1項に記載の翻訳装置。
    A communication unit that communicates with a plurality of external machine translators;
    The control unit transmits the input sentence to a machine translator selected according to the user information from a plurality of machine translators via the communication unit, and the translated sentence of the input sentence is transmitted from the machine translator. The translation device according to any one of claims 1 to 4, wherein
  7.  請求項1~6のいずれか1項に記載の翻訳装置と、
     前記翻訳装置において取得された情報に基づき機械翻訳を行って、前記翻訳文を生成する機械翻訳機と
    を備えた翻訳システム。
    The translation device according to any one of claims 1 to 6,
    A translation system comprising: a machine translator that performs machine translation based on information acquired by the translation device and generates the translated sentence.
  8.  翻訳元のユーザからの入力に応じて、翻訳先のユーザに出力される翻訳結果を生成するように機械翻訳を実行する翻訳方法であって、
     第1取得部が、翻訳元の言語における入力文を取得するステップと、
     第2取得部が、前記入力文に関連するユーザ情報を取得するステップであって、前記ユーザ情報は、関連した入力文についての前記翻訳元のユーザと前記翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む、ステップと、
     制御部が、前記入力文及び前記ユーザ情報に基づいて、翻訳先の言語において前記ユーザ情報に応じた前記入力文の翻訳結果を示す翻訳文を取得するステップと、
     出力部が、前記翻訳文を出力するステップとを含む
    翻訳方法。
    A translation method for executing machine translation so as to generate a translation result output to a translation destination user in response to an input from a translation source user,
    A first obtaining unit obtaining an input sentence in the language of the translation source;
    The second acquisition unit is a step of acquiring user information related to the input sentence, wherein the user information is at least one of the translation source user and the translation destination user for the related input sentence A step containing information indicating the role of
    A control unit, based on the input sentence and the user information, obtaining a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
    A translation method comprising: an output unit outputting the translated sentence.
  9.  翻訳元のユーザの入力に応じて、機械翻訳の結果を翻訳先のユーザに出力する処理をコンピュータに実行させるプログラムであって、
     前記コンピュータが、
       翻訳元の言語における入力文を取得するステップと、
       前記入力文に関連するユーザ情報を取得するステップであって、前記ユーザ情報は、関連した入力文についての前記翻訳元のユーザと前記翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む、ステップと、
       前記入力文及び前記ユーザ情報に基づいて、翻訳先の言語において前記ユーザ情報に応じた前記入力文の翻訳結果を示す翻訳文を取得するステップと、
       前記翻訳文を出力するステップとを含む
    プログラム。
    A program for causing a computer to execute a process of outputting a result of machine translation to a translation destination user according to an input of a translation source user,
    The computer is
    Obtaining an input sentence in the source language;
    Obtaining user information related to the input sentence, wherein the user information is information indicating a role of at least one of the translation source user and the translation destination user for the related input sentence; Including, steps,
    Based on the input sentence and the user information, obtaining a translated sentence indicating a translation result of the input sentence according to the user information in a language to be translated;
    Outputting the translated sentence.
  10.  コンピュータの機械学習において、翻訳元のユーザから翻訳先のユーザへの機械翻訳が実現される翻訳モデルを得る学習方法であって、
     前記コンピュータの記憶部には、機械学習に基づき前記翻訳モデルを規定するパラメータ群が格納されており、
     前記コンピュータが、
       翻訳元の言語における入力文とユーザ情報とを関連付けた情報を、学習中の翻訳モデルに入力して、当該翻訳モデルに翻訳文を生成させるステップであって、前記ユーザ情報は、関連した入力文についての前記翻訳元のユーザと前記翻訳先のユーザとのうちの少なくとも一方の役割を示す情報を含む、ステップと、
       生成された翻訳文に応じて、前記パラメータ群を調整するステップとを含む
    学習方法。
    In computer machine learning, a learning method for obtaining a translation model in which machine translation from a translation source user to a translation destination user is realized,
    The storage unit of the computer stores a parameter group that defines the translation model based on machine learning,
    The computer is
    Inputting information that associates an input sentence and user information in the source language to a translation model that is being learned, and generating the translation sentence in the translation model, wherein the user information is related input sentences Including information indicating the role of at least one of the user of the translation source and the user of the translation destination for:
    Adjusting the parameter group according to the generated translation.
PCT/JP2018/038704 2018-05-25 2018-10-17 Translation device, system, method, program, and learning method WO2019225028A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-100795 2018-05-25
JP2018100795A JP2021144256A (en) 2018-05-25 2018-05-25 Translation device, system, method, program, and learning method

Publications (1)

Publication Number Publication Date
WO2019225028A1 true WO2019225028A1 (en) 2019-11-28

Family

ID=68617267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/038704 WO2019225028A1 (en) 2018-05-25 2018-10-17 Translation device, system, method, program, and learning method

Country Status (2)

Country Link
JP (1) JP2021144256A (en)
WO (1) WO2019225028A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539199A (en) * 2020-04-17 2020-08-14 中移(杭州)信息技术有限公司 Text error correction method, device, terminal and storage medium
US20210365644A1 (en) * 2020-05-21 2021-11-25 International Business Machines Corporation Adaptive language translation using context features

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7333377B2 (en) 2021-12-14 2023-08-24 楽天グループ株式会社 Information processing device, information processing method and program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011040056A1 (en) * 2009-10-02 2011-04-07 独立行政法人情報通信研究機構 Speech translation system, first terminal device, speech recognition server device, translation server device, and speech synthesis server device
JP2013164515A (en) * 2012-02-10 2013-08-22 Toshiba Corp Voice translation device, voice translation method, and voice translation program
JP2017199363A (en) * 2016-04-21 2017-11-02 国立研究開発法人情報通信研究機構 Machine translation device and computer program for machine translation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011040056A1 (en) * 2009-10-02 2011-04-07 独立行政法人情報通信研究機構 Speech translation system, first terminal device, speech recognition server device, translation server device, and speech synthesis server device
JP2013164515A (en) * 2012-02-10 2013-08-22 Toshiba Corp Voice translation device, voice translation method, and voice translation program
JP2017199363A (en) * 2016-04-21 2017-11-02 国立研究開発法人情報通信研究機構 Machine translation device and computer program for machine translation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539199A (en) * 2020-04-17 2020-08-14 中移(杭州)信息技术有限公司 Text error correction method, device, terminal and storage medium
CN111539199B (en) * 2020-04-17 2023-08-18 中移(杭州)信息技术有限公司 Text error correction method, device, terminal and storage medium
US20210365644A1 (en) * 2020-05-21 2021-11-25 International Business Machines Corporation Adaptive language translation using context features
US11947925B2 (en) * 2020-05-21 2024-04-02 International Business Machines Corporation Adaptive language translation using context features

Also Published As

Publication number Publication date
JP2021144256A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN105493027B (en) User interface for real-time language translation
JP6058039B2 (en) Device and method for extracting information from dialogue
US10860289B2 (en) Flexible voice-based information retrieval system for virtual assistant
US20140316764A1 (en) Clarifying natural language input using targeted questions
KR102429407B1 (en) User-configured and customized interactive dialog application
WO2019225028A1 (en) Translation device, system, method, program, and learning method
US11430425B2 (en) Speech generation using crosslingual phoneme mapping
JPWO2018055983A1 (en) Translation apparatus, translation system, and evaluation server
US11538476B2 (en) Terminal device, server and controlling method thereof
US11227116B2 (en) Translation device, translation method, and program
US11403470B2 (en) Translation device
JP2023007369A (en) Translation method, classification model training method, apparatus, device and storage medium
KR20190074508A (en) Method for crowdsourcing data of chat model for chatbot
WO2024069978A1 (en) Generation device, learning device, generation method, training method, and program
JP6110539B1 (en) Speech translation device, speech translation method, and speech translation program
JP6383748B2 (en) Speech translation device, speech translation method, and speech translation program
JP2021125164A (en) Information processing apparatus, chat bot assisting program, and chat bot assisting method
JP6334589B2 (en) Fixed phrase creation device and program, and conversation support device and program
JP6198879B1 (en) Speech translation device, speech translation method, and speech translation program
JP6985311B2 (en) Dialogue implementation programs, devices and methods that control response utterance generation by aizuchi determination
US11842206B2 (en) Generating content endorsements using machine learning nominator(s)
WO2021161856A1 (en) Information processing device and information processing method
CN112334974B (en) Speech generation using cross-language phoneme mapping
Hovde et al. Aural Language Translation with Augmented Reality Glasses
KR20220110408A (en) Method for providing multi-language translation through multimedia application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920148

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920148

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP