WO2022130940A1 - 提示装置 - Google Patents

提示装置 Download PDF

Info

Publication number
WO2022130940A1
WO2022130940A1 PCT/JP2021/043451 JP2021043451W WO2022130940A1 WO 2022130940 A1 WO2022130940 A1 WO 2022130940A1 JP 2021043451 W JP2021043451 W JP 2021043451W WO 2022130940 A1 WO2022130940 A1 WO 2022130940A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
sentence
words
language
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/043451
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
聡一朗 村上
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Docomo Inc
Original Assignee
NTT Docomo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NTT Docomo Inc filed Critical NTT Docomo Inc
Priority to JP2022569827A priority Critical patent/JPWO2022130940A1/ja
Publication of WO2022130940A1 publication Critical patent/WO2022130940A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • the present invention relates to a presentation device that presents candidate words constituting a translated sentence consisting of a second language of an original sentence consisting of a first language.
  • the present invention has been made in view of the above problems, and is used in a system for creating a translation of a second language corresponding to the original text of the first language by accepting a designated input for the presented word.
  • the purpose is to present word candidates that enable composition with various expressions.
  • the presentation device is a presentation device that presents word candidates constituting a translated sentence consisting of a second language of an original sentence consisting of a first language, and is input. Multiple translations consisting of the second language based on the translations output from the machine translation engine by inputting the original text into the machine translation engine that outputs the translations consisting of the second language based on the sentences in the first language. It includes a generation unit that generates a sentence, and a presentation unit that arranges and presents words included in a plurality of translated sentences by a predetermined arrangement method.
  • a plurality of translations of the second language corresponding to the original text of the first language are generated by using the machine translation engine, and words included in the generated plurality of translations are presented. Therefore, words other than the words contained in one specific translated sentence that can constitute a model answer can be presented as word candidates. By accepting the designated input for the word candidates presented in this way, it becomes possible to compose a composition with various expressions.
  • FIG. 1 is a diagram showing an apparatus configuration of a system including a presenting apparatus and a functional configuration of the presenting apparatus according to the present embodiment.
  • the presentation device 10 is a device that presents word candidates constituting a translated sentence composed of a second language of an original sentence composed of a first language.
  • the presentation device 10 is configured to be accessible to a storage means such as the original text storage unit 20.
  • the original text storage unit 20 may be configured inside the presentation device 10, or may be configured as another device accessible from the presentation device to the outside of the presentation device 10, as shown in FIG.
  • the presentation device 10 is configured to be able to communicate with the terminal 30 via a network.
  • the terminal 30 is a device operated by the user.
  • the terminal 30 displays the information transmitted from the presentation device 10, accepts the input by the user, and transmits the input information to the presentation device 10.
  • the presenting device 10 and the terminal 30 are shown as separate devices, but the terminal 30 is the presenting device 10 because all the functional units of the presenting device 10 are configured in the terminal 30. A part of the functional parts of the presentation device 10 may be configured in the terminal 30.
  • the presentation device 10 functionally includes an acquisition unit 11, a generation unit 12, a presentation unit 13, a designated input reception unit 14, a problem sentence presentation unit 15, and an essay unit 16.
  • Each of these functional units 11 to 16 may be configured in one device (computer), or may be distributed and configured in a plurality of devices.
  • the block diagram shown in FIG. 1 shows a block of functional units. These functional blocks (components) are realized by any combination of at least one of hardware and software. Further, the method of realizing each functional block is not particularly limited. That is, each functional block may be realized using one physically or logically coupled device, or two or more physically or logically separated devices can be directly or indirectly (eg, for example). , Wired, wireless, etc.) and may be realized using these plurality of devices. The functional block may be realized by combining the software with the one device or the plurality of devices.
  • Functions include judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, solution, selection, selection, establishment, comparison, assumption, expectation, and assumption. Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, etc., but limited to these I can't.
  • a functional block (configuration unit) that makes transmission function is called a transmitting unit (transmitting unit) or a transmitter (transmitter).
  • the realization method is not particularly limited.
  • the presentation device 10 in the embodiment of the present invention may function as a computer.
  • FIG. 2 is a diagram showing an example of the hardware configuration of the presentation device 10 according to the present embodiment.
  • the presentation device 10 may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.
  • the word “device” can be read as a circuit, device, unit, etc.
  • the hardware configuration of the presentation device 10 may be configured to include one or more of the devices shown in the figure, or may be configured not to include some of the devices.
  • the processor 1001 For each function in the presentation device 10, by loading predetermined software (program) on hardware such as the processor 1001 and the memory 1002, the processor 1001 performs an calculation, communication by the communication device 1004, memory 1002, and storage 1003. It is realized by controlling the reading and / or writing of the data in.
  • predetermined software program
  • the processor 1001 operates, for example, an operating system to control the entire computer.
  • the processor 1001 may be configured by a central processing unit (CPU: Central Processing Unit) including an interface with peripheral devices, a control device, an arithmetic unit, a register, and the like.
  • CPU Central Processing Unit
  • each of the functional units 11 to 16 shown in FIG. 1 may be realized by the processor 1001.
  • the processor 1001 reads a program (program code), a software module and data from the storage 1003 and / or the communication device 1004 into the memory 1002, and executes various processes according to these.
  • a program program code
  • a program that causes a computer to execute at least a part of the operations described in the above-described embodiment is used.
  • each functional unit 11 to 16 of the presentation device 10 may be realized by a control program stored in the memory 1002 and operated by the processor 1001.
  • Processor 1001 may be mounted on one or more chips.
  • the program may be transmitted from the network via a telecommunication line.
  • the memory 1002 is a computer-readable recording medium, and is composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). May be done.
  • the memory 1002 may be referred to as a register, a cache, a main memory (main storage device), or the like.
  • the memory 1002 can store a program (program code), a software module, or the like that can be executed to carry out the presentation method according to the embodiment of the present invention.
  • the storage 1003 is a computer-readable recording medium, and is, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, an optical magnetic disk (for example, a compact disk, a digital versatile disk, or a Blu-ray). It may consist of at least one (registered trademark) disk), smart card, flash memory (eg, card, stick, key drive), floppy (registered trademark) disk, magnetic strip, and the like.
  • the storage 1003 may be referred to as an auxiliary storage device.
  • the storage medium described above may be, for example, a database, server or other suitable medium containing memory 1002 and / or storage 1003.
  • the communication device 1004 is hardware (transmission / reception device) for communicating between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like.
  • the input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that accepts an input from the outside.
  • the output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that outputs to the outside.
  • the input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).
  • each device such as the processor 1001 and the memory 1002 is connected by the bus 1007 for communicating information.
  • the bus 1007 may be composed of a single bus or may be composed of different buses between the devices.
  • the presentation device 10 includes hardware such as a microprocessor, a digital signal processor (DSP: Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). It may be configured, and some or all of each functional block may be realized by the hardware.
  • the processor 1001 may be implemented on at least one of these hardware.
  • the acquisition unit 11 acquires the original text from the original text storage unit 20.
  • the original sentence is a sentence composed of a first language, and is a Japanese sentence when the presentation device 10 is applied to Japanese-English translation.
  • FIG. 3 is a diagram showing an example of the original text stored in the original text storage unit 20.
  • the source text storage unit 20 stores a plurality of source texts associated with the IDs that identify each source text, for example, "She wants some milk. (Kanojo ha gyunyu wo ikuraka hoshigatteimasu.)", " I am running in the park. (Watashi ha koen no naka wo hasshitteimasu.) ”I remember sentences in the first language.
  • the generation unit 12 generates a plurality of translations consisting of a second language based on the parallel translation output from the machine translation engine by inputting the original text into the machine translation engine.
  • the machine translation engine outputs a bilingual sentence consisting of a second language based on the input sentence of the first language.
  • the presentation unit 13 arranges and presents words included in a plurality of translated sentences by a predetermined arrangement method. Specifically, the presentation unit 13 displays the words included in the plurality of translated sentences on the display of the terminal 30.
  • FIG. 4 is a diagram showing an outline of the process leading to the presentation of word candidates based on the original text in the presentation device 10.
  • the generation unit 12 inputs the original text sf0 "She wants some milk. (Kanojo ha gyunyu wo ikuraka hoshigatteimasu.)" In the machine translation engine mt0.
  • the generation unit 12 generates a plurality of translated sentences tf based on the bilingual sentence output from the machine translation engine mt0.
  • the machine translation engine mt0 is a neural machine translation engine including a neural network, and has a likelihood of showing plausibility as a parallel translation by a beam search method (second). The translation is output together with the probability of).
  • the generation unit 12 generates a plurality of translated sentences tf, for example, based on the parallel translations up to the top four of the likelihoods.
  • the presentation unit 13 arranges and presents the words ws included in the plurality of translated sentences tf by a predetermined arrangement method. As shown in FIG. 4, the presentation unit 13 uses the word ws (needs /./She/some/get/to/wants/is/milk/wanting) included in the four translations included in the plurality of translations tf. ) Is displayed on the display of the terminal 30.
  • the presentation unit 13 may randomly arrange and present the words included in the plurality of translated sentences tf.
  • the terminal 30 By causing the terminal 30 to randomly arrange and present the words included in the plurality of translated sentences tf, it is impossible to recognize the order of the words in the translated sentence based on the order of the presented words. Can be done.
  • the translated sentence of the model answer of the original sentence sf0 (likely). Only the words used in (She wants some milk.) (Translation with the highest degree) could be presented.
  • the word ws the words included in the plurality of translated sentences tf are presented, so that the words can be composed by various expressions. Can present candidates.
  • the machine translation engine of the present embodiment may be a neural machine translation engine including a trained language model.
  • the language model includes a neural network, and based on the input of a sentence in the first language, one or more words are used in the bilingual sentence for each time step corresponding to each word in the word string constituting the bilingual sentence. It is output sequentially with the probability of showing the certainty as.
  • the generation unit 12 acquires a bilingual sentence consisting of a word string of sequentially output words.
  • the machine translation engine may be configured by an encoder / decoder model including a language model.
  • FIG. 5 is an example of a machine translation engine and is a diagram showing an example of a neural machine translation engine including an encoder / decoder model.
  • the machine translation engine mt1 includes an encoder decoder model including an encoder en1 and a decoder de1.
  • the neural network constituting the encoder-decoder model is not limited, but may be, for example, a recurrent neural network (RNN) or a neural network called a transformer.
  • the neural network constituting the machine translation engine mt1 has been learned by machine learning using learning data consisting of a pair of a sentence in the first language and a bilingual translation in the second language.
  • the generation unit 12 inputs the original text sf1 of the first language into the encoder en1. Specifically, the generation unit 12 divides the original text sf1 into words by morphological analysis or the like, embedds the divided words into corresponding word vectors, and sequentially inputs the word vectors to the encoder en1.
  • the encoder en1 outputs a vector (for example, output of an intermediate layer, source target attention, etc.) indicating a calculation result based on the first data a to the decoder de1.
  • the decoder de1 is based on the input of the vector from the encoder en1 and the predetermined start symbol ss (vector) indicating the start of the output, and in each time step ts, the sequence wt1, wt2, wt3, ..., wtn of the word group wt. Are output sequentially. More specifically, the decoder de1 outputs a plurality of words cw at each time step ts together with a probability wp indicating the certainty (probability) as a word used in a bilingual sentence.
  • a bilingual translation is composed of a series of words sequentially output in each time step up to that point.
  • the word cw having the highest probability wp in each time step is used as the input of the decoder de1 in the next time step.
  • the word cw11 has the highest probability wp11, so that the word cw11 is input to the decoder de1 in the time step ts2.
  • the word group wt output in a certain time step ts by a known method such as beam search and Top-K sampling shows the certainty as a word constituting a bilingual sentence.
  • a plurality of words are extracted and retained based on a probability of 1, and each of the retained words is used as an input to the decoder de1 in the next time step ts, and a word group wt based on each input of the plurality of words. Is obtained.
  • FIG. 6 is a diagram schematically showing the concept of a known beam search method.
  • Beam search is a known search algorithm for extracting plausible word combinations as bilingual translations.
  • the beam width bw that defines the search width is 3, circles represent words, and the numbers in the circles represent the word likelihood scores (first probability wp). ..
  • the generation unit 12 extracts and holds the top three words cw of the score from the output word group wt1 in the time step ts1 of the decoder de1.
  • the generation unit 12 inputs each of the words cw1 held in the time step ts1 to the decoder de1 and acquires the word group wt2.
  • the generation unit 12 extracts and holds the words constituting the top three ranks of the likelihood score (second probability) of the word string consisting of the words extracted and held in the time step up to that point.
  • the generation unit 12 inputs each of the words held in the previous time step to the decoder de1, and similarly extracts and holds the words.
  • the generation unit 12 extracts and holds the number of words defined as the beam width bw from the top of the score based on the score of the likelihood of the word string, and sets the words held in the previous time step. Based on this, the extraction and retention of the word output from the decoder de1 is repeated, and the word for the word string is extracted until the terminal symbol se is output in the search of each word string.
  • the generation unit 12 has an upper predetermined number of word string likelihood scores (second probability indicating the certainty of the word string as a bilingual sentence) obtained by applying the beam search method to the encoder-decoder model.
  • a bilingual sentence consisting of a word string is output as a plurality of translated sentences.
  • the second probability for the word string composed of words sequentially extracted from the beginning of the sentence is calculated, and a predetermined number of parallel translations higher than the second probability are output. Therefore, as a translation sentence corresponding to the original sentence.
  • a word string with a high probability of can be output as a translated sentence. Further, since the words included in the output plurality of translated sentences are presented, it is possible to present the word candidates corresponding to the translated sentences of various expressions.
  • the generation unit 12 may generate and output a bilingual sentence consisting of a word string obtained by applying a known Top-K sampling method to the encoder / decoder model as a translated sentence. That is, in a certain time step ts, the generation unit 12 is based on those probabilities from among the upper predetermined number of words having a probability wp (probability, first probability) indicating the certainty as a word constituting the bilingual sentence. Perform weighted random sampling (weighted random sampling) and retain the extracted words. In the next time step ts, the generation unit 12 inputs the word held in the previous time step ts into the decoder de1, and is the upper predetermined number of the first probability among the word group wt output from the decoder de1. Extract and retain the word.
  • wp probability, first probability
  • the generation unit 12 sequentially extracts a predetermined number of words having a higher probability of the first probability, and outputs a plurality of bilingual sentences composed of word strings of the extracted words as a plurality of translated sentences. Therefore, a word string having a high likelihood as a translated sentence corresponding to the original sentence can be output as a translated sentence. Further, by adjusting (increasing) a predetermined number of words to be extracted in each time step, it is possible to obtain a translated sentence having a low likelihood, and the words included in such a translated sentence are presented. This makes it possible to present candidate words containing errors.
  • the designated input receiving unit 14 accepts the designated input for designating at least one of the plurality of words presented by the presenting unit 13.
  • the generation unit 12 inputs a designated word, which is a word accepted by the designated input receiving unit 14, into the decoder of the encoder decoder model, and following the input of the designated word, a plurality of designated words are input based on the first probability in each time step of the decoder.
  • the output words are sequentially extracted, and a plurality of translated sentences are generated based on the word string consisting of the sequentially extracted words.
  • FIG. 7 is a diagram showing an outline of a process in which a proposed word candidate is dynamically changed and output according to a designated input word by a neural machine translation engine including an encoder / decoder model.
  • the generation unit 12 uses the original text sf2 (he was injured in a soccer match. By inputting to the encoder en2, a plurality of translated sentences are generated based on the bilingual text output from the machine translation engine mt2.
  • the presentation unit 13 displays the words included in the plurality of translated sentences generated by the generation unit 12 on, for example, the display of the terminal 30.
  • the designated input receiving unit 14 designates the words “He” and "got”. Accepts specified input.
  • the generation unit 12 uses the designated word iw (“He”, “got”), which is a word received by the designated input reception unit 14, in the predetermined time step ts corresponding to the designated word iw, in the previous time step ts. Input to the decoder de2 instead of the output word.
  • the generation unit 12 sets the first probability (probability as a word constituting a bilingual sentence) in each time step ts of the decoder de2 following the input of the designated word iw. Based on this, a plurality of output words are sequentially extracted, and a plurality of translated sentences tf2 are generated based on a word string consisting of the sequentially extracted words.
  • each translated sentence included in the translated sentence tf2 is a part of the sentence following the designated word iw (“He”, “got”) in the bilingual sentence of the original sentence sf2.
  • the presentation unit 13 presents a word included in a plurality of translations generated by the generation unit 12 each time the designated word iw is received by the designated input reception unit 14. In the example shown in FIG. 7, the presentation unit 13 causes the word ws2 included in each translated sentence included in the translated sentence tf2 to be displayed on the display of the terminal 30, for example.
  • the machine translation engine mt0 may be a statistical machine translation engine that outputs a bilingual text corresponding to the original text sf0 by a known statistical machine translation.
  • the statistical machine translation engine includes a translation model and a language model.
  • the translation model in the statistical machine translation engine is a trained model machine-learned by a corpus consisting of words and phrases in the first language and bilingual translations in the second language, and the words that make up the sentences in the first language.
  • the word translation candidate which is a candidate for translation in the second language, is output together with a third probability indicating the certainty of the translation.
  • the language model in the statistical machine translation engine is a trained model whose context is machine-learned by a corpus consisting of sentences in the second language, and is based on the word translation candidates output from the translation model. Generate a bilingual sentence while selecting natural words as sentences. The language model outputs the likelihood (fourth probability) as a word constituting the bilingual sentence of each word selected in the generated bilingual sentence together with each word.
  • the generation unit 12 inputs the original text sf0 into the machine translation engine mt0 composed of a statistical machine translation engine, and generates a plurality of translations based on the third probability and the fourth probability.
  • a unique translation can be obtained by using the word having the highest score in the third and fourth probabilities and the sequence of words.
  • the generation unit 12 in the process of searching each of the translation model and the language model, the generation unit 12 has not only the word with the highest score of the third and fourth probabilities but also the upper predetermined number.
  • the generation unit 12 intentionally replaces the word or the like extracted based on the score with another word or the like (similar word or the like preset) in the process of searching each of the translation model and the language model. May generate a plurality of translated sentences.
  • the question sentence presentation unit 15 presents a question sentence composed of the original sentence of the first language.
  • the composition unit 16 accepts the designated input for the word presented by the presentation unit 13, generates a composition sentence in which the designated input words are arranged in the order of the designated input, and presents the composition sentence.
  • FIGS. 8 to 10 are diagrams showing an example of a display screen that presents a created sentence generated based on a designated input received in response to the presentation of a word candidate.
  • the question sentence presenting unit 15 has a question sentence qf composed of the original sentence of the first language (I want you to continue to be happy (watashi ha anata ni). shiawasenamamade itehoshii.)) Is presented.
  • the problem sentence presentation unit 15 causes the problem sentence qf to be displayed on the display screen D1 on the display of the terminal 30.
  • the presentation unit 13 causes the display screen D1 to display the word ws31 included in the plurality of translated sentences generated by the generation unit 12 based on the original sentence presented as the problem sentence qf.
  • the designated input receiving unit 14 can receive the designated input for the presented word ws31. In the example shown in FIG. 8, the designated input receiving unit 14 can receive the designated input for the word “I” in the word ws31.
  • the display screen D2 shown in FIG. 9 is an example in which the designated input receiving unit 14 receives the designated input for the word “want” following the designated input for the word “I”.
  • the composition unit 16 generates a creation unit cf2 in which the designated input words “I” and “want” are arranged in the order of the designated input, and displays the composition unit cf2 on the display screen D2.
  • the presentation unit 13 presents the word ws32 included in the plurality of translated sentences regenerated by the generation unit 12 in response to the designated input of the words “I” and “want”.
  • the display screen D3 shown in FIG. 10 is an example in which the designated input receiving unit 14 accepts designated inputs for the words “I”, “want”, “you”, “to”, and “keep”.
  • the composition unit 16 has a creation unit cf3 in which the designated input words "I”, “want”, “you”, “to”, and “keep” are arranged in the order of the designated input. Generate and display on the display screen D3.
  • the presentation unit 13 has a plurality of translated sentences regenerated by the generation unit 12 in response to the designated input of the words "I”, “want”, “you", “to”, and “keep”.
  • the word ws33 contained in is presented.
  • word ws34 a plausible translation following the word string is regenerated with the word string consisting of the designated input words as a constraint, and the words are presented based on the regenerated translation. By doing so, the displayed word can be dynamically changed according to the specified input of the word.
  • FIG. 11 is a flowchart showing the processing contents of the presentation method in the presentation device 10.
  • step S1 the acquisition unit 11 acquires the original text from, for example, the original text storage unit 20.
  • step S2 the generation unit 12 generates a plurality of translated sentences composed of a second language based on the bilingual sentence output from the machine translation engine by inputting the original sentence into the machine translation engine.
  • step S3 the presentation unit 13 arranges and presents the words included in the plurality of translated sentences by a predetermined arrangement method. Specifically, the presentation unit 13 displays the words included in the plurality of translated sentences on the display of the terminal 30.
  • step S4 the designated input receiving unit 14 receives a designated input that designates at least one of the plurality of words presented by the presenting unit 13.
  • step S5 a created sentence formed by arranging the designated input words in the order of the designated input is generated, and the generated created sentence is presented.
  • step S6 the presentation device 10 determines whether or not the sentence creation is completed.
  • the end of sentence creation may be determined, for example, based on input by the user, or may be determined based on the presence of a predetermined designated input (eg, a period). If it is determined that the statement has been created, the process ends. On the other hand, if it is not determined that the sentence creation is completed, the process returns to step S4.
  • FIG. 12 is a flowchart showing another example of the processing content of the presentation method in the presentation device. Since the processing of steps S11 to S15 in the flowchart of FIG. 12 is the same as the processing of steps S1 to S5 in the flowchart of FIG. 11, the description thereof will be omitted.
  • step S16 the presentation device 10 determines whether or not the sentence creation is completed. If it is determined that the statement has been created, the process ends. On the other hand, if it is not determined that the sentence creation is completed, the process proceeds to step S17.
  • step S17 the generation unit 12 inputs the designated word received in step S14 to the decoder in place of the word output in the previous time step ts in the predetermined time step corresponding to the designated word. Then, the generation unit 12 sequentially acquires the words output in each time step of the decoder following the input of the designated word, and regenerates a plurality of translated sentences based on the word string consisting of the sequentially acquired words. ..
  • step S18 the presentation unit 13 presents the words included in the plurality of translated sentences regenerated by the generation unit 12. Then, the process returns to step S14.
  • FIG. 13 is a diagram showing the structure of the presentation program.
  • the presentation program P1 includes a main module m10, an acquisition module m11, a generation module m12, a presentation module m13, a designated input reception module m14, a question sentence presentation module m15, and an essay module m16 that comprehensively control the presentation processing in the presentation device 10. It is composed of. Then, each module m11 to m16 realizes each function for the acquisition unit 11, the generation unit 12, the presentation unit 13, the designated input reception unit 14, the problem sentence presentation unit 15, and the composition unit 16.
  • the presentation program P1 may be transmitted via a transmission medium such as a communication line, or may be stored in the recording medium M1 as shown in FIG.
  • a plurality of translations of the second language corresponding to the original text of the first language are generated and generated by using the machine translation engine. Words contained in multiple translations are presented. Therefore, words other than the words contained in one specific translated sentence that can constitute a model answer can be presented as word candidates. By accepting the designated input for the word candidates presented in this way, it is possible to compose a composition with various expressions.
  • the machine translation engine includes a neural network, and based on the input of a sentence in the first language, one for each time step corresponding to each word in the word string constituting the bilingual sentence. It is a neural machine translation engine that includes a language model that sequentially outputs the above words with a probability of showing the certainty as a word used in a bilingual sentence, and the generator inputs the original sentence to the machine translation engine. A plurality of translated sentences may be generated based on one or more words output in each time step and a first probability indicating the certainty of each word as a word constituting a bilingual sentence.
  • the translated sentence is generated by referring to the first probability of each of the plurality of words output in each time step, the likelihood as the translated sentence corresponding to the acquired original sentence is high. Multiple translations can be generated.
  • the generation unit uses a beam search method to calculate each word in the word string constituting the bilingual sentence based on the first probability of each word. Based on the second probability indicating the certainty as a bilingual sentence, it may be sequentially extracted from the beginning of the sentence, and the higher predetermined number of bilingual sentences of the second probability may be output as a plurality of translated sentences.
  • a second probability regarding a word string composed of words sequentially extracted from the beginning of a sentence is calculated, and a predetermined number of parallel translations higher than the second probability are output.
  • a word string with high probability as a sentence can be output as a translated sentence. Further, since the words included in the output plurality of translated sentences are presented, it is possible to present the word candidates corresponding to the translated sentences of various expressions.
  • the generation unit sequentially extracts the upper predetermined number of words of the first probability in each time step by the method of Top-K sampling, and the word consisting of the extracted words.
  • a plurality of parallel translations composed of columns may be output as a plurality of translations.
  • a predetermined number of words having a higher probability of the first probability are extracted in each time step, and a bilingual sentence composed based on the extracted words is output. Therefore, a word string having a high likelihood as a translated sentence corresponding to the original sentence can be output as a translated sentence. Further, by adjusting (increasing) a predetermined number of words to be extracted in each time step, it is possible to present candidate words containing an error.
  • the machine translation engine includes an encoder-decoder model, and the presentation device accepts a designated input for designating at least one word among a plurality of words presented by the presentation unit. Further provided with an input receiving unit, the generating unit inputs a designated word, which is a word accepted by the designated input receiving unit, into the decoder of the encoder decoder model, and following the input of the designated word, the first step in each time step of the decoder. Multiple output words are sequentially extracted based on the probability of, and a plurality of translated sentences are generated based on the word string consisting of the sequentially extracted words. It may be possible to present a word contained in a plurality of translated sentences generated by the generator each time.
  • a plurality of plausible translation sentences following the word string are generated with the word string consisting of the designated input words as a constraint. Therefore, the word candidates following the word string can be dynamically changed and presented according to the designated input.
  • the machine translation engine uses the word translation candidate, which is a candidate for translation in the second language of the words constituting the sentence in the first language, as a third translation showing the certainty of the translation. It is a statistical machine translation engine that includes a translation model that outputs with probability and a language model that outputs a fourth probability that indicates the certainty of a word that constitutes a translation sentence of a word translation candidate, and the generator is a third. A plurality of translations may be generated based on the probability of and the fourth probability.
  • a plurality of translated sentences corresponding to the original text are generated by statistical machine translation. Therefore, it is possible to present a word included in a plurality of translated sentences having a high likelihood as a translated sentence corresponding to the acquired original sentence.
  • the problem sentence presentation unit that presents the problem sentence composed of the original sentence of the first language and the designated input for the word presented by the presentation unit are accepted, and the designated input word is input. It may be further provided with a composition unit that generates a composition sentence arranged in the order of designated input and presents the composition sentence.
  • the presentation unit may present words included in a plurality of translated sentences in a random arrangement.
  • Each aspect / embodiment described in the present specification includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), W-CDMA. (Registered Trademark), GSM (Registered Trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, UWB (Ultra-WideBand), It may be applied to Bluetooth®, other systems that utilize suitable systems and / or next-generation systems that are extended based on them.
  • the input / output information and the like may be saved in a specific location (for example, memory) or may be managed by a management table. Information to be input / output may be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.
  • the determination may be made by a value represented by 1 bit (0 or 1), by a boolean value (Boolean: true or false), or by comparing numerical values (for example, a predetermined value). It may be done by comparison with the value).
  • the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit one, but is performed implicitly (for example, the notification of the predetermined information is not performed). May be good.
  • software, instructions, etc. may be transmitted and received via a transmission medium.
  • the software may use wired technology such as coaxial cable, fiber optic cable, twist pair and digital subscriber line (DSL) and / or wireless technology such as infrared, wireless and microwave to website, server, or other.
  • wired technology such as coaxial cable, fiber optic cable, twist pair and digital subscriber line (DSL)
  • DSL digital subscriber line
  • wireless technology such as infrared, wireless and microwave to website, server, or other.
  • the information, signals, etc. described in this disclosure may be represented using any of a variety of different techniques.
  • data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.
  • system and "network” used herein are used interchangeably.
  • information, parameters, etc. described in the present specification may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information. ..
  • determining and “determining” used in this disclosure may include a wide variety of actions.
  • “Judgment” and “decision” are, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigation (investigating), search (looking up, search, inquiry). It may include (eg, searching in a table, database or another data structure), ascertaining as “judgment” or “decision”.
  • judgment and “decision” are receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and access. It may include (for example, accessing data in memory) to be regarded as “judgment” or “decision”.
  • judgment and “decision” are considered to be “judgment” and “decision” when the things such as solving, selecting, choosing, establishing, and comparing are regarded as “judgment” and “decision”. Can include. That is, “judgment” and “decision” may include considering some action as “judgment” and “decision”. Further, “judgment (decision)” may be read as “assuming", “expecting”, “considering” and the like.
  • any reference to that element does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be adopted there, or that the first element must somehow precede the second element.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
PCT/JP2021/043451 2020-12-15 2021-11-26 提示装置 Ceased WO2022130940A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022569827A JPWO2022130940A1 (https=) 2020-12-15 2021-11-26

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020207474 2020-12-15
JP2020-207474 2020-12-15

Publications (1)

Publication Number Publication Date
WO2022130940A1 true WO2022130940A1 (ja) 2022-06-23

Family

ID=82059042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/043451 Ceased WO2022130940A1 (ja) 2020-12-15 2021-11-26 提示装置

Country Status (2)

Country Link
JP (1) JPWO2022130940A1 (https=)
WO (1) WO2022130940A1 (https=)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770137A (zh) * 2004-11-01 2006-05-10 英业达股份有限公司 语言学习系统及方法
JP2008197604A (ja) * 2007-02-08 2008-08-28 Shoichi Watanabe 英作文自動正誤判別指導システムあるいはプログラム
JP2012118883A (ja) * 2010-12-02 2012-06-21 Nec Corp 翻訳装置、翻訳システム、翻訳方法および翻訳プログラム
JP2018005218A (ja) * 2016-07-07 2018-01-11 三星電子株式会社Samsung Electronics Co.,Ltd. 自動通訳方法及び装置
WO2019225154A1 (ja) * 2018-05-23 2019-11-28 株式会社Nttドコモ 作成文章評価装置
CN111507113A (zh) * 2020-03-18 2020-08-07 北京捷通华声科技股份有限公司 一种机器辅助人工翻译的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770137A (zh) * 2004-11-01 2006-05-10 英业达股份有限公司 语言学习系统及方法
JP2008197604A (ja) * 2007-02-08 2008-08-28 Shoichi Watanabe 英作文自動正誤判別指導システムあるいはプログラム
JP2012118883A (ja) * 2010-12-02 2012-06-21 Nec Corp 翻訳装置、翻訳システム、翻訳方法および翻訳プログラム
JP2018005218A (ja) * 2016-07-07 2018-01-11 三星電子株式会社Samsung Electronics Co.,Ltd. 自動通訳方法及び装置
WO2019225154A1 (ja) * 2018-05-23 2019-11-28 株式会社Nttドコモ 作成文章評価装置
CN111507113A (zh) * 2020-03-18 2020-08-07 北京捷通华声科技股份有限公司 一种机器辅助人工翻译的方法和装置

Also Published As

Publication number Publication date
JPWO2022130940A1 (https=) 2022-06-23

Similar Documents

Publication Publication Date Title
US10796105B2 (en) Device and method for converting dialect into standard language
US10061769B2 (en) Machine translation method for performing translation between languages
US9824085B2 (en) Personal language model for input method editor
US11507746B2 (en) Method and apparatus for generating context information
JP7222082B2 (ja) 認識誤り訂正装置及び訂正モデル
KR101326354B1 (ko) 문자 변환 처리 장치, 기록 매체 및 방법
JP7062056B2 (ja) 作成文章評価装置
JPWO2020021845A1 (ja) 文書分類装置及び学習済みモデル
US11507549B2 (en) Data normalization system
US20190317993A1 (en) Effective classification of text data based on a word appearance frequency
CN116579327A (zh) 文本纠错模型训练方法、文本纠错方法、设备及存储介质
JP6817690B2 (ja) 抽出装置、抽出方法とそのプログラム、及び、支援装置、表示制御装置
US12190073B2 (en) Internal state modifying device
JP7836795B2 (ja) 質問生成装置
WO2022130940A1 (ja) 提示装置
JP2020177387A (ja) 文出力装置
JP7575894B2 (ja) 作成文章評価装置
WO2023079911A1 (ja) 文生成モデル生成装置、文生成モデル及び文生成装置
US20230015324A1 (en) Retrieval device
US20210142010A1 (en) Learning method, translation method, information processing apparatus, and recording medium
JP6895580B2 (ja) 対話システム
US12333267B2 (en) Text generation model generating device, text generation model, and text generating device
JP7722457B2 (ja) 文書作成支援装置、文書作成支援方法、及びプログラム
JP2024163496A (ja) 関係性判定装置
US10546061B2 (en) Predicting terms by using model chunks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21906303

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022569827

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21906303

Country of ref document: EP

Kind code of ref document: A1