WO2022264404A1 - 翻訳方法、翻訳プログラム及び情報処理装置 - Google Patents

翻訳方法、翻訳プログラム及び情報処理装置 Download PDF

Info

Publication number
WO2022264404A1
WO2022264404A1 PCT/JP2021/023207 JP2021023207W WO2022264404A1 WO 2022264404 A1 WO2022264404 A1 WO 2022264404A1 JP 2021023207 W JP2021023207 W JP 2021023207W WO 2022264404 A1 WO2022264404 A1 WO 2022264404A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
partial
sentences
information
translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/023207
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
正弘 片岡
清司 大倉
浩太 夏目
量 松村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2021/023207 priority Critical patent/WO2022264404A1/ja
Priority to JP2023528915A priority patent/JPWO2022264404A1/ja
Publication of WO2022264404A1 publication Critical patent/WO2022264404A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models

Definitions

  • the present invention relates to translation methods and the like.
  • machine translation technology has been developed for translating sentences in a first language into sentences in a second language different from the first language.
  • machine translation using NN is being put into practical use. be.
  • machine translation using NN is referred to as neural machine translation.
  • a learning model is generated by machine learning using learning data that defines the relationship between the information of the sentences in the first language and the information of the sentences in the second language.
  • machine translation of a sentence in a second language is performed by inputting information on a sentence in the first language to be translated into a learning model that has undergone machine learning.
  • each term in the second language will be The order of appearance of the translation result is bad, and the translated text may be difficult to understand.
  • an object of the present invention is to provide a translation method, a translation program, and an information processing device capable of generating an easy-to-understand translated text.
  • the computer executes the following processing.
  • the computer obtains information on a plurality of partial sentences and the order of the plurality of partial sentences included in the sentence to be translated, information on a plurality of partial translated sentences included in the sentence resulting from the translation of the sentence to be translated, and a plurality of partial translations.
  • a translation learning model learned based on the relationship with the order of sentences is stored.
  • the computer receives a new third sentence to be translated, the computer identifies information on a plurality of partial sentences included in the third sentence.
  • the computer sequentially inputs information on the identified partial sentences to the translation learning model, thereby controlling the order of the information on the plurality of partial translated sentences corresponding to the information on the identified partial sentences.
  • FIG. 1 is a diagram for explaining processing in the learning phase of the information processing apparatus according to the first embodiment.
  • FIG. 2 is a diagram for explaining an example of the structure of a target sentence.
  • FIG. 3 is a diagram showing the relationship between Japanese sentences and English sentences.
  • FIG. 4 is a diagram for explaining analysis phase processing of the information processing apparatus according to the first embodiment.
  • FIG. 5 is a functional block diagram showing the configuration of the information processing apparatus according to the first embodiment.
  • FIG. 6 is a diagram showing an example of the data structure of the parallel translation table.
  • FIG. 7 is a diagram showing an example of the data structure of a compressed file table.
  • FIG. 8 is a diagram (1) showing an example of the transposed index table.
  • FIG. 9 is a diagram showing an example of the data structure of a Japanese sentence transposed index.
  • FIG. 10 is a diagram showing an example of the data structure of dictionary information.
  • FIG. 11 is a flow chart showing processing of the learning phase of the information processing apparatus according to the first embodiment.
  • FIG. 12 is a flowchart illustrating analysis phase processing of the information processing apparatus according to the first embodiment.
  • FIG. 13 is a diagram for explaining processing in the learning phase of the information processing apparatus according to the second embodiment.
  • FIG. 14 is a diagram for explaining analysis phase processing of the information processing apparatus according to the second embodiment.
  • FIG. 15 is a diagram illustrating an example of the configuration of an information processing apparatus according to the second embodiment.
  • FIG. 16 is a diagram (2) showing an example of the transposed index table.
  • FIG. 17 is a diagram showing an example of the data structure of an alternative term vector table.
  • FIG. 11 is a flow chart showing processing of the learning phase of the information processing apparatus according to the first embodiment.
  • FIG. 12 is a flowchart illustrating analysis phase processing of the information processing apparatus according to the first embodiment.
  • FIG. 13 is
  • FIG. 18 is a flow chart showing processing of the learning phase of the information processing apparatus according to the second embodiment.
  • FIG. 19 is a flowchart illustrating analysis phase processing of the information processing apparatus according to the second embodiment.
  • FIG. 20 is a diagram illustrating an example of a hardware configuration of a computer that implements functions similar to those of the information processing apparatus of the embodiment.
  • FIG. 1 is a diagram for explaining the processing of the learning phase of the information processing apparatus of the first embodiment.
  • the information processing device performs machine learning of the first learning model 70a using the first learning data 65a.
  • the information processing device also uses the second learning data 65b to perform machine learning of the second learning model 70b.
  • the first learning model 70a and the second learning model 70b correspond to CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), autoencoder, and the like.
  • the first learning data 65a defines the relationship between the vector of the target sentence and the vector of the Japanese term.
  • the target sentence vector corresponds to the input data
  • the Japanese term vector corresponds to the correct label.
  • a sentence in each language such as Japanese and English includes multiple terms, each term includes multiple sentences, and each sentence includes multiple words.
  • FIG. 2 is a diagram for explaining an example of the structure of a target sentence.
  • object sentence 20 includes terms 21 , 22 , 23 , and 24 .
  • Item 21 includes sentences 21a and 21b.
  • Term 22 includes sentences 22a and 22b.
  • Term 23 includes sentences 23a and 23b.
  • Term 24 includes sentences 24a and 24b. Although illustration is omitted, each sentence 21a, 21b, 22a, 22b, 23a, 23b, 24a, 24b includes a plurality of words.
  • the information processing device assigns a vector to each word included in the target sentence 20 and multiplies the vectors of the words included in the sentence to calculate the vector of the sentence.
  • the information processing device calculates the vector of the term by multiplying the vectors of the sentences included in the term.
  • the information processing device calculates the vector of the target sentence 20 by integrating the vectors of the terms included in the target sentence 20 .
  • the vector of the target sentence of the first learning data 65a shown in FIG. For other target sentences, the relationship between the vector of the target sentence and the vector of the Japanese term (a plurality of terms) is registered in the first learning data 65a.
  • the information processing device executes learning by error backpropagation so that the output when the vector of the target sentence is input to the first learning model 70a approaches the vector of each Japanese term.
  • the information processing device repeats the above process based on the relationship between the vector of the target sentence and the vectors of the plurality of Japanese terms included in the first learning data 65a, thereby obtaining the parameters of the first learning model 70a. (perform machine learning).
  • the second learning data 65b defines the relationship between the vector of Japanese term(s) and the vector of English term(s).
  • a vector of Japanese terms (a plurality of Japanese terms) corresponds to input data, and a vector of English terms (a plurality of English terms) corresponds to correct labels.
  • Japanese papers (equivalent to sentences) often have translated English papers, but when comparing Japanese papers and English papers that are parallel translations, the order of the terms is reversed. Sometimes.
  • Fig. 3 is a diagram showing the relationship between Japanese sentences and English sentences.
  • Japanese sentence 30 includes terms 30a, 30b, and 30c.
  • the English sentence 35 includes terms 35a, 35b and 35c.
  • Item 30a starts with a word such as "first,” and describes the content corresponding to "reason 1.”
  • Item 30b starts with a word such as "finally,” and describes the content corresponding to "reason n.”
  • Item 30c describes the content corresponding to the "conclusion”.
  • the English sentence 35 is a translated sentence based on the Japanese sentence 30, but the order of the items has been changed.
  • sentence 35 in English includes terms 35a, 35b, and 35c.
  • Item 35a starts with a word such as "Generally” and describes the content corresponding to "Conclusion”.
  • Item 35b starts with a word such as "Because” and describes the content corresponding to "Reason 1”.
  • Item 35c starts with a word such as "Finally” and describes the content corresponding to "Reason n”.
  • the terms 30a and 35b correspond, the terms 30b and 35c correspond, the terms 30c and 35a correspond, and the Japanese sentence 30 and the English sentence 35
  • the vector of term 30a (reason 1)
  • the vector of term 30b (reason n)
  • the vector of term 30c (conclusion)
  • the vector of term 35a (conclusion)
  • the vector of term 35b (reason 1)
  • the vector of term 30c (reason n)
  • the information processing device performs error backpropagation so that the output when the vectors of the Japanese terms are sequentially input to the second learning model 70b from the top is output in the order of the vectors of the English terms set as the correct labels.
  • Perform learning by The information processing device repeats the above process based on the relationship between the vector of the target sentence and the vectors of the plurality of Japanese terms included in the first learning data 65a, thereby obtaining the parameters of the first learning model 70a. (perform machine learning).
  • FIG. 4 is a diagram for explaining the analysis phase processing of the information processing apparatus of the first embodiment.
  • the information processing device acquires an analysis query 80.
  • the analytical query 80 includes a Japanese sentence to be translated.
  • the information processing device converts the analytical query 80 into a vector "Vob80".
  • the information processing apparatus calculates a sentence vector by multiplying word vectors to calculate a sentence vector, multiplying sentence vectors to calculate a term vector, and calculating a term vector.
  • the vector of the sentence is calculated by multiplying the vector of .
  • the information processing device inputs the vector "Vob80" of the analysis query 80 to the first learning model 70a, so that the vectors "Vsb80-r1", “Vsb80-r2", . . , “Vsb80-rn” and “Vsb80-con” are specified.
  • the term corresponding to the vector "Vsb80-r1" corresponds to the vector of terms corresponding to Reason 1 in the Japanese text.
  • a term corresponding to the vector “Vsb80-rn” corresponds to a vector of terms corresponding to the reason n in the Japanese sentence.
  • the term corresponding to the vector "Vsb80-con” corresponds to the vector of terms corresponding to the conclusion of the Japanese sentence.
  • the information processing device sequentially inputs the vectors of the terms specified using the first learning model 70a to the second learning model 70b, thereby generating the vectors "Vsb90-con”, "Vsb90- r1”, . . . , “Vsb90-rn” are specified in order.
  • the term corresponding to the vector "Vsb90-con” corresponds to the vector of terms corresponding to the conclusion of the English sentence.
  • the term corresponding to the vector "Vsb90-r1” corresponds to the vector of the term corresponding to Reason 1 in the English sentence.
  • a term corresponding to the vector “Vsb90-rn” corresponds to a vector of terms corresponding to the reason n in the English sentence.
  • the information processing device extracts the English sentences corresponding to "Vsb90-con”, “Vsb90-r1”, ..., “Vsb90-rn” from the parallel translation table 141 and outputs them as translations.
  • the information processing apparatus learns the first learning model 70a and the second learning model 70b in advance.
  • the information processing device Upon receiving the analysis query 80, the information processing device inputs the vector of the analysis query 80 to the first learning model 70a, thereby calculating vectors corresponding to a plurality of terms in the Japanese sentence.
  • the information processing device sequentially inputs vectors corresponding to a plurality of terms of the Japanese text to the second learning model 70b, thereby obtaining vectors of the terms of the English text corresponding to the terms of the Japanese text, Calculates multiple vectors whose output order is controlled.
  • the information processing device acquires the term of the English sentence corresponding to the calculated vector from the parallel translation table 141 and outputs it as a translation result. In this way, after adjusting the order of the items in the Japanese sentences to the order of the items in the English sentences specific to English, the translated contents themselves can be translated by using the items included in the parallel translation table 141. can generate
  • FIG. 5 is a functional block diagram showing the configuration of the information processing apparatus according to the first embodiment.
  • this information processing apparatus 100 has a communication section 110 , an input section 120 , a display section 130 , a storage section 140 and a control section 150 .
  • the communication unit 110 is connected to an external device or the like by wire or wirelessly, and transmits and receives information to and from the external device or the like.
  • the communication unit 110 is implemented by a NIC (Network Interface Card) or the like.
  • the communication unit 110 may be connected to a network (not shown).
  • the input unit 120 is an input device that inputs various types of information to the information processing device 100 .
  • the input unit 120 corresponds to a keyboard, mouse, touch panel, or the like.
  • the user may operate the input unit 120 to input an analysis query or the like.
  • the display unit 130 is a display device that displays information output from the control unit 150 .
  • the display unit 130 corresponds to a liquid crystal display, an organic EL (Electro Luminescence) display, a touch panel, or the like. For example, a translation result corresponding to the analysis query is displayed on display unit 130 .
  • the storage unit 140 has a parallel translation table 141, a compressed file table 142, an inverted index table 143, and dictionary information 144.
  • the storage unit 140 also has first learning data 65 a , second learning data 65 b , first learning model 70 a , second learning model 70 b and analysis query 80 .
  • the storage unit 140 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk.
  • the bilingual table 141 is a table that holds a plurality of sets of Japanese sentences and English sentences resulting from the Japanese translation.
  • FIG. 6 is a diagram showing an example of the data structure of the parallel translation table. As shown in FIG. 6, the bilingual table 141 has item numbers, Japanese sentences, and English sentences. The item number is the item number of the record of the bilingual table 141 .
  • a Japanese sentence is Japanese text data and has a plurality of terms.
  • An English sentence is English text data and has a plurality of terms.
  • the compressed file table 142 has compressed files of Japanese sentences and compressed files of English sentences.
  • FIG. 7 is a diagram showing an example of the data structure of a compressed file table. As shown in FIG. 7, the compressed file table 142 has a Japanese sentence compressed file 142a and an English sentence compressed file 142b.
  • the compressed Japanese text file 142a is a file in which each word contained in each Japanese text is converted into a code.
  • the compressed English sentence file 142b is a file in which each English word contained in each English sentence is converted into a code.
  • the transposed index table 143 has transposed indexes for Japanese sentences and transposed indexes for English sentences.
  • FIG. 8 is a diagram (1) showing an example of the transposed index table.
  • the transposed index table 143 includes a Japanese text transposed index 143a, an English text transposed index 143b, a Japanese text transposed index 143c, an English text transposed index 143d, a Japanese text transposed index 143e, and an English text transposed index.
  • the Japanese sentence transposed index 143a associates a Japanese sentence vector (hereinafter referred to as a Japanese sentence vector) with an offset indicating the position of the encoded sentence, which is the sentence corresponding to the Japanese sentence vector.
  • the encoded Japanese text is registered in the compressed Japanese text file 142a.
  • An encoded sentence has multiple encoded words, and the offset of the first word of the encoded sentence is the position of the encoded sentence.
  • the offset corresponds to the position from the beginning of the compressed Japanese text file 142a. Assume that the offset of the top word of the compressed Japanese text file 142a is "0".
  • FIG. 9 is a diagram showing an example of the data structure of the Japanese text transposed index.
  • the horizontal axis of the Japanese text transposed index 143a is the axis corresponding to the offset.
  • the horizontal axis of the transposed index 143a is the axis corresponding to the Japanese sentence vector.
  • the Japanese sentence transposed index 143a is indicated by a bitmap of "0" or "1". 0”.
  • the English sentence transposed index 143b associates an English sentence vector (hereinafter referred to as an English sentence vector) with an offset indicating the position of the encoded sentence, which is the sentence corresponding to the English sentence vector.
  • the encoded English text is registered in the compressed English text file 142b.
  • An encoded sentence has multiple encoded words, and the offset of the first word of the encoded sentence is the position of the encoded sentence. The offset corresponds to the position from the beginning of the compressed English text file 142b. Assume that the offset of the top word of the compressed English text file 142b is "0".
  • the English sentence transposition index 143b has a horizontal axis corresponding to the offset and a vertical axis corresponding to the English sentence vector. Illustration of the English sentence transposition index 143b is omitted.
  • the Japanese term transposition index 143c is a vector of terms included in a Japanese sentence (hereinafter referred to as a Japanese term vector) and an offset indicating the position of a term corresponding to the Japanese term vector and encoded. and
  • the encoded Japanese text is registered in the compressed Japanese text file 142a.
  • An encoded term has multiple encoded words, and the offset of the first word of the encoded term is the position of the encoded term. The offset corresponds to the position from the beginning of the compressed Japanese text file 142a.
  • the Japanese term transposed index 143c has a horizontal axis corresponding to the offset and a vertical axis corresponding to the Japanese term vector. Illustration of the Japanese term transposition index 143c is omitted.
  • the English term transposed index 143d associates an English term vector (hereinafter referred to as an English term vector) with an offset indicating the position of the encoded term, which is the term corresponding to the English term vector.
  • the encoded English terms are registered in the compressed English text file 142b.
  • An encoded term has multiple encoded words, and the offset of the first word of the encoded term is the position of the encoded term. The offset corresponds to the position from the beginning of the compressed English text file 142b.
  • the English term transposed index 143d has a horizontal axis corresponding to the offset and a vertical axis corresponding to the English term vector. Illustration of the English term transposed index 143d is omitted.
  • the Japanese sentence transposition index 143e associates a sentence vector included in a Japanese sentence (hereinafter referred to as a Japanese sentence vector) with an offset indicating the position of the encoded sentence corresponding to the Japanese sentence vector.
  • the encoded Japanese text is registered in the compressed Japanese text file 142a.
  • An encoded Japanese sentence has multiple encoded words, and the offset of the first word of the encoded sentence is the position of the encoded sentence. The offset corresponds to the position from the beginning of the compressed Japanese text file 142a.
  • the Japanese sentence transposed index 143e has a horizontal axis corresponding to the offset and a vertical axis corresponding to the Japanese sentence vector. Illustration of the Japanese sentence transposition index 143e is omitted.
  • the English sentence transposition index 143f associates an English sentence vector (hereinafter referred to as an English sentence vector) with an offset indicating the position of the encoded sentence corresponding to the English sentence vector.
  • the encoded English sentences are registered in the compressed English sentence file 142b.
  • An encoded sentence has multiple encoded words, and the offset of the first word of the encoded sentence is the position of the encoded sentence. The offset corresponds to the position from the beginning of the compressed English text file 142b.
  • the English sentence transposition index 143f has a horizontal axis corresponding to the offset and a vertical axis corresponding to the English sentence vector. Illustration of the English sentence transposition index 143f is omitted.
  • the Japanese word transposed index 143g is a word vector included in a Japanese sentence (hereinafter referred to as a Japanese word vector) and an offset indicating the position of an encoded word in a sentence corresponding to the Japanese word vector. and
  • the encoded Japanese text is registered in the compressed Japanese text file 142a.
  • the offset corresponds to the position from the beginning of the compressed Japanese text file 142a.
  • the Japanese word transposed index 143g has a horizontal axis corresponding to the offset and a vertical axis corresponding to the Japanese word vector. Illustration of the Japanese word transposition index 143g is omitted.
  • the English word transposed index 143h associates an English word vector (hereinafter referred to as an English word vector) with an offset indicating the position of the encoded word that is a sentence corresponding to the English word vector.
  • the encoded English words are registered in the compressed English text file 142b.
  • the offset corresponds to the position from the beginning of the compressed English text file 142b.
  • the English word transposed index 143h has a horizontal axis corresponding to the offset and a vertical axis corresponding to the English word vector. Illustration of the English word transposition index 143h is omitted.
  • the dictionary information 144 is dictionary information that defines compression codes corresponding to Japanese words.
  • FIG. 10 is a diagram showing an example of the data structure of dictionary information. As shown in FIG. 10, the dictionary information 144 associates words (Japanese words or English words) with compression codes and vectors. It is assumed that vectors corresponding to compression codes are assigned in advance by Poincare embedding or the like. Note that the compression code vector may be specified based on other conventional techniques.
  • the first learning data 65a corresponds to the first learning data 65a described in FIG.
  • the first learning data 65a defines the relationship between the vector of the target sentence and the vector of the Japanese term.
  • the target sentence vector corresponds to the input data
  • the Japanese term vector corresponds to the correct label.
  • the second learning data 65b corresponds to the second learning data 65b described in FIG.
  • the second learning data 65b defines the relationship between the vector of Japanese term(s) and the vector of English term(s).
  • a vector of Japanese terms (a plurality of Japanese terms) corresponds to input data, and a vector of English terms (a plurality of English terms) corresponds to correct labels.
  • the first learning model 70a is a learning model that undergoes machine learning based on the first learning data 65a.
  • the second learning model 70b is a learning model that undergoes machine learning based on the second learning data 65b.
  • the analysis query 80 is a query specified from the outside.
  • the analysis query 80 is set with a Japanese sentence to be translated.
  • the control unit 150 has a preprocessing unit 151 , a learning unit 152 and a translation unit 153 .
  • the control unit 150 is implemented by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Also, the control unit 150 may be executed by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the preprocessing unit 151 Based on the bilingual table 141 and the dictionary information 144, the preprocessing unit 151 generates the transposed index table 143, the first learning data 65a, and the second learning data 65b in the process of generating the compressed file table 142. An example of the processing of the preprocessing unit 151 will be described below.
  • the preprocessing unit 151 refers to the bilingual table 141 and acquires a set of Japanese text data and English text data from the unselected item number record.
  • the preprocessing unit 151 executes morphological analysis on the text data of Japanese sentences to identify multiple words, full stops, line breaks, etc., and multiple terms included in the text data. Identify sentences. For example, the preprocessing unit 151 identifies a group of words from a line feed to the next line feed in text data of Japanese sentences as a Japanese term. The preprocessing unit 151 identifies a group of words from one period to the next as a Japanese sentence.
  • the preprocessing unit 151 may further use a predetermined conjunction to identify the starting position of the term. For example, conjunctions such as “first”, “finally”, and “that is,” or character strings corresponding to the conjunctions are specified as the starting position of the Japanese clause. The preprocessing unit 151 identifies conjunctions such as “generally”, “because”, and “finally” or character strings corresponding to conjunctions as the start position of the English term.
  • the preprocessing unit 151 identifies the compression code of the word and the vector assigned to the word based on the word of the text data of the Japanese sentence and the dictionary information 144 .
  • the preprocessing unit 151 calculates a Japanese sentence vector of each sentence, a Japanese term vector of each term, and a Japanese sentence vector of the text data (sentence) based on the word vector of the text data of the Japanese sentence.
  • the preprocessing unit 151 also converts the words of the text data of the Japanese sentences into compression codes and registers them in the compressed Japanese sentence file 142a.
  • the preprocessing unit 151 sets the relationship between the Japanese sentence vector and the offset of the Japanese sentence on the compressed Japanese sentence file 142a in the Japanese sentence transposition index 143a.
  • the preprocessing unit 151 sets the relationship between the Japanese term vector and the offset of each term on the compressed Japanese text file 142a in the Japanese term transposition index 143c.
  • the preprocessing unit 151 sets the relationship between the Japanese sentence vector and the offset of each sentence on the compressed Japanese sentence file 142a in the Japanese sentence transposition index 143e.
  • the preprocessing unit 151 sets the relationship between the Japanese word vector and the offset of each sentence in the compressed Japanese sentence file 142a in the Japanese word transposition index 143g.
  • the preprocessing unit 151 registers, in the first learning data 65a, the relationship between the Japanese sentence vector specified from the text data of the Japanese sentence and the multiple Japanese term vectors.
  • the preprocessing unit 151 executes morphological analysis on text data of English sentences to identify a plurality of words, periods, line breaks, etc. Identify the containing sentence. For example, preprocessing unit 151 identifies a group of words from a line feed to the next line feed in text data of an English sentence as an English term. Preprocessing unit 151 identifies a group of words from one period to the next as an English sentence.
  • the preprocessing unit 151 identifies the compression codes of the words and the vectors assigned to the words based on the words of the English text data and the dictionary information 144 .
  • the preprocessing unit 151 calculates an English sentence vector of each sentence, an English term vector of each term, and an English sentence vector of the text data (sentence) based on the word vector of the text data of the English sentence.
  • the preprocessing unit 151 also converts the words of the text data of the English sentences into compressed codes and registers them in the compressed English sentence file 142b.
  • the preprocessing unit 151 sets the relationship between the English sentence vector and the offset of the English sentence on the compressed English sentence file 142b in the English sentence transposed index 143b.
  • the preprocessing unit 151 sets the relationship between the English term vector and the offset of each term on the compressed English text file 142b in the English term permuted index 143d.
  • the preprocessing unit 151 sets the relationship between the English sentence vector and the offset of each sentence in the compressed English sentence file 142b in the English sentence transposed index 143f.
  • the preprocessing unit 151 sets the relationship between the English word vector and the offset of each sentence on the compressed English sentence file 142b in the English word transposed index 143h.
  • the preprocessing unit 151 registers each Japanese term vector specified from the text data of the Japanese sentence in the second learning data 65b in order from the beginning.
  • the preprocessing unit 151 registers each English term vector specified from the text data of the English sentence in the second learning data 65b in order from the beginning.
  • the preprocessing unit 151 repeats the above process based on the text data of the Japanese sentences and the text data of the English sentences of the record of the item number included in the bilingual table 141, thereby obtaining the first learning data 65a. , to generate the second learning data 65b.
  • the preprocessing unit 151 generates the first learning data 65a and the second learning data 65b.
  • the first learning data 65a and the second learning data 65b are received from an external device or the like. and use it.
  • the learning unit 152 performs machine learning of the first learning model 70a based on the first learning data 65a.
  • the learning unit 152 performs machine learning of the second learning model 70b based on the second learning data 65b.
  • the learning unit 152 outputs a vector of each Japanese term (Japanese term vector) when the vector of the target text (Japanese text vector) is input to the first learning model 70a. Perform backpropagation learning to get closer.
  • the learning unit 152 repeats the above process based on the relationship between the vector of the target sentence and the vectors of the plurality of Japanese terms included in the first learning data 65a, thereby obtaining the parameters of the first learning model 70a. (perform machine learning).
  • the learning unit 152 inputs each Japanese term vector (Japanese term vector) to the second learning model 70b in order from the beginning, and outputs each English term set as the correct label. Perform backpropagation learning so that the term vector (English term vector) is output in order.
  • the learning unit 152 repeats the above process based on the relationship between the vector of the target sentence and the vectors of the plurality of Japanese terms included in the first learning data 65a, thereby obtaining the parameters of the first learning model 70a. (perform machine learning).
  • the translation unit 153 When the analysis query 80 is received, the translation unit 153 translates the Japanese sentences included in the analysis query 80 .
  • the translation unit 153 receives the analysis query 80 from the input unit 120 or the communication unit 110 and stores it in the storage unit 140 . An example of the processing of the translation unit 153 will be described below.
  • the translation unit 153 performs morphological analysis on the text data of the Japanese sentences included in the analysis query 80, and divides the text data into a plurality of words.
  • the translation unit 153 identifies the vector of the word based on the word included in the text data and the dictionary information 144 .
  • the translation unit 153 calculates the vector of each sentence by integrating the vector of each word.
  • the translation unit 153 calculates the vector of each term by integrating the vectors of each sentence.
  • the translation unit 153 calculates the vector of the analysis query 80 by accumulating the vectors of each term.
  • the translation unit 153 calculates a plurality of Japanese term vectors included in the Japanese text of the analysis query 80 by inputting the vector of the analysis query 80 into the first learning model 70a.
  • the translation unit 153 inputs the vector “Vob80” of the analysis query 80 to the first learning model 70a, thereby obtaining each Japanese term vector “Vsb80- 1”, “Vsb80-2”, . . . , “Vsb80-n”.
  • the translation unit 153 sequentially calculates English term vectors by sequentially inputting a plurality of Japanese term vectors calculated based on the first learning model 70a to the second learning model 70b. In the example illustrated in FIG. 4, the translation unit 153 sequentially inputs the Japanese term vectors “Vsb80-1”, “Vsb80-2”, . Thus, the English term vectors “Vsb90-1”, “Vsb90-2”, .
  • Translation unit 153 identifies the offset of the encoded term corresponding to each English term vector based on each English term vector identified based on second learning model 70b and English term transposition index 143d. . Based on the specified offset of each term, the translation unit 153 acquires encoded term information (encoded array) from the compressed English text file 142b.
  • the translation unit 153 decodes the encoded item information based on the encoded item information (encoded array) and the dictionary information 144 . For example, the translation unit 153 arranges the decoding result of each English term vector "Vsb90-1", the decoding result of the term “Vsb90-2", ..., the decoding result of the term "Vsb90-n” in this order. By doing so, the information of the translation result is generated. Translation unit 153 causes display unit 130 to display the information of the translation result. Also, the information of the translation result may be transmitted to the external device that is the transmission source of the analysis query 80 .
  • FIG. 11 is a flow chart showing processing of the learning phase of the information processing apparatus according to the first embodiment.
  • the preprocessing unit 151 of the information processing apparatus 100 executes preprocessing based on the bilingual table 141, and converts the compressed file table 142, the transposed index table 143, the first learning data 65a, the second Learning data 65b is generated (step S101).
  • the learning unit 152 of the information processing device 100 executes machine learning of the first learning model 70a based on the first learning data 65a (step S102).
  • the learning unit 152 of the information processing device 100 executes machine learning of the second learning model 70b based on the second learning data 65b (step S103).
  • FIG. 12 is a flowchart showing analysis phase processing of the information processing apparatus according to the first embodiment.
  • the translation unit 153 of the information processing device 100 receives an analysis query 80 and stores it in the storage unit 140 (step S201).
  • the translation unit 153 calculates the vector of the analysis query 80 (step S202).
  • the translation unit 153 calculates a plurality of Japanese term vectors by inputting the vector of the analysis query 80 to the first learning model 70a (step S203).
  • the translation unit 153 sequentially inputs a plurality of Japanese term vectors to the second learning model 70b, and acquires a plurality of English term vectors and their order (step S204).
  • the translation unit 153 identifies the term offset based on the English term vector and the English term transposed index 143d (step S205).
  • the translation unit 153 acquires the code array of each term from the compressed English text file 142b (step S206). The translation unit 153 decodes the code array of each term based on the code array of each term and the dictionary information 144 (step S207).
  • the translation unit 153 generates a translation result by arranging the decoded results in order (step S108).
  • the translation unit 153 outputs the translation result (step S209).
  • the information processing apparatus 100 learns the first learning model 70a and the second learning model 70b in advance.
  • the information processing apparatus 100 inputs the vector of the analysis query 80 to the first learning model 70a, thereby calculating vectors corresponding to a plurality of terms in the Japanese sentence.
  • the information processing apparatus 100 sequentially inputs vectors corresponding to a plurality of terms of the Japanese sentence to the second learning model 70b, thereby obtaining vectors of the terms of the English sentence corresponding to the terms of the Japanese sentence. Calculates a plurality of vectors in which the output order of the vectors of is controlled.
  • the information processing apparatus 100 acquires the term of the English sentence corresponding to the calculated vector from the storage unit 140 and outputs it as a translation result. In this way, after adjusting the order of the items in the Japanese sentences to the order of the items in the English sentences specific to English, the translated contents themselves can be translated by using the items included in the parallel translation table 141. can generate
  • the information processing apparatus 100 sets the relationship between the target sentence vector (Japanese sentence vector) and the Japanese term vector (Japanese term vector) in the first learning model 70a.
  • machine learning was performed, but it is not limited to this.
  • sentence and term vectors of the information processing device 100 may be clustered to group similar Japanese sentence vectors and Japanese sentence vectors.
  • the function that decomposes the target sentence into multiple sections based on machine learning can also be used to convert newspaper editorials, magazine articles, papers, etc.
  • the application can be expanded to decomposition of multiple terms such as
  • Japanese sentence vectors Vob1 and Vob2 belong to the same cluster, and Japanese term vectors (Vsb1-1, Vsb1-2, . . . Vsb1-n) and (Vsb2-1, Vsb2-2, . Vsb2-n) belong to the same cluster.
  • the preprocessing unit 151 may combine the first row record and the second row record of the first learning data 65a into one record. For example, one record may be deleted, or the average vector of vectors belonging to the same cluster may be used as a new vector. By performing clustering to 100,000 or less types by such a method, it is possible to reduce the amount of calculation when performing machine learning on the first learning model 70a.
  • the preprocessing unit 151, the learning unit 152, and the translation unit 153 may use a plurality of vectors belonging to the same cluster as one vector (average vector, etc.) corresponding to the same cluster.
  • the preprocessing unit 151 similarly processes each record of the second learning data 65b, thereby reducing the amount of computation when performing machine learning on the second learning model 70b.
  • the target of clustering is not limited to text character strings, but can be expanded to character strings such as source programs, chemical structural formulas of organic compounds, nucleotide sequences of genomes, and image outline PostScript.
  • FIG. 13 is a diagram for explaining processing in the learning phase of the information processing apparatus according to the second embodiment.
  • the information processing device performs machine learning of the learning model 96 using the learning data 95 .
  • the learning model 96 corresponds to CNN, RNN, and the like.
  • the learning data 95 defines relationships between vectors of terms and vectors of common sentences.
  • a vector of terms indicates a vector of terms included in the target sentence.
  • the description of the terms is the same as the description of the terms described in the first embodiment.
  • a common sentence is a common sentence among a plurality of sentences included in a plurality of terms.
  • a vector of common sentences is a vector of such common sentences.
  • the information processing device executes learning by error backpropagation so that the output when the term vector is input to the learning model 96 approaches the common sentence vector.
  • the information processing device adjusts the parameters of the learning model 96 (performs machine learning) by repeatedly executing the above processing based on the relationship between the vectors of the terms included in the learning data 95 and the vectors of the common sentences. do).
  • FIG. 14 is a diagram for explaining the analysis phase processing of the information processing apparatus of the second embodiment.
  • the information processing device acquires the analysis query 97 .
  • the analytical query 97 includes Japanese terms to be translated.
  • the information processing device receives the analysis query 97, it calculates the vector “Vsb97-1” of the analysis query 97 using the dictionary information.
  • the information processing device inputs the vector “Vsb97-1” to the learning model 96 to calculate the common sentence vector “Vco-1”.
  • the information processing device compares the vector "Vsb97-1" of the analysis query 97 (term) with the vectors of multiple alternative terms contained in the alternative term vector table T1.
  • the alternative term vector table T1 is a table that holds alternative term vectors.
  • the information processing device identifies a vector of similar alternative terms for the vector "Vsb97-1" of the analysis query 97.
  • Vsb1-1 be a vector of alternative terms similar to the vector “Vsb97-1” of analytical query 97 .
  • the vector Vco97-1 output from the learning model 96 is the common sentence vector common to the term of the vector Vsb97-1 and the alternative term of the vector Vsb1-1.
  • the information processing device is a common sentence included in the text data of the term corresponding to the vector “Vsb1-1” of the alternative term, and the vector Vco 97-1 output from the learning model 96 Information on the corresponding common sentence and information on the English translation of the common sentence are output.
  • the information processing apparatus inputs the vector of the analysis query 97 to the learned learning model 96 and calculates the vector of the common sentence corresponding to the term of the analysis query 97 . It also identifies a vector of similar alternative terms based on the vector of analytical queries 97 . As a result, it is possible to calculate a vector of common sentences that are alternative terms that are similar to the terms of the analysis query 97 and that are common between the alternative terms and the terms of the analysis query 97 . By using the calculated common sentence vector, it is possible to extract common sentences of alternative terms similar to the terms of the analysis query 97 and English translations associated with such common sentences.
  • the information processing device may also register the relationship between the vector of the common sentence and the vector of the re-translated sentence in the common sentence/re-translated sentence table 98 .
  • a retranslation vector is calculated by comparing and subtracting the vector of each sentence constituting the alternative term and the vector of the common sentence.
  • FIG. 15 is a diagram showing an example of the configuration of an information processing apparatus according to the second embodiment.
  • the information processing apparatus 200 has a communication section 210, an input section 220, a display section 230, a storage section 240, and a control section 250.
  • FIG. 15 is a diagram showing an example of the configuration of an information processing apparatus according to the second embodiment.
  • the information processing apparatus 200 has a communication section 210, an input section 220, a display section 230, a storage section 240, and a control section 250.
  • FIG. 1 is a diagram showing an example of the configuration of an information processing apparatus according to the second embodiment.
  • the information processing apparatus 200 has a communication section 210, an input section 220, a display section 230, a storage section 240, and a control section 250.
  • FIG. 15 is a diagram showing an example of the configuration of an information processing apparatus according to the second embodiment.
  • the information processing apparatus 200 has a communication section 210, an input section 220, a display section
  • the descriptions of the communication unit 210, the input unit 220, and the display unit 230 are the same as the descriptions of the communication unit 110, the input unit 120, and the display unit 130 described in the first embodiment.
  • the storage unit 240 has a bilingual table 241 , a compressed file table 242 , a transposed index table 243 , dictionary information 244 , learning data 95 , learning models 96 , analysis queries 97 , and a common sentence/retranslation table 98 .
  • the storage unit 240 is realized by, for example, a semiconductor memory device such as a RAM or flash memory, or a storage device such as a hard disk or an optical disk.
  • the bilingual table 241 is a table that holds a plurality of pairs of Japanese sentences and English sentences that are translation results of the Japanese sentences. Other explanations about the translation table 241 are the same as the explanations about the translation table 141 described in the first embodiment.
  • the compressed file table 242 has compressed files of Japanese sentences and compressed files of English sentences. Other explanations regarding the compressed file table 242 are the same as those concerning the compressed file table 142 described in the first embodiment.
  • the transposed index table 243 has transposed indexes for Japanese sentences and transposed indexes for English sentences.
  • FIG. 16 is a diagram (2) showing an example of the transposed index table.
  • the transposed index table 243 includes a Japanese sentence transposed index 243a, an English sentence transposed index 243b, a Japanese sentence transposed index 243c, an English sentence transposed index 243d, a Japanese sentence transposed index 243e, and an English sentence transposed index.
  • Japanese text transposed index 243a English text transposed index 243b, Japanese term transposed index 243c, English term transposed index 243d, Japanese text transposed index 243e, English text transposed index 243f, Japanese word transposed index included in transposed index table 243 243g and the English word transposed index 243h are the same as the explanations of the indexes shown in FIG. 8 of the first embodiment.
  • the dictionary information 244 is dictionary information that defines compression codes corresponding to Japanese words. Other explanations about the dictionary information 244 are the same as the explanations about the dictionary information 144 described in the first embodiment.
  • the alternative term vector table T1 is a table that holds vectors of alternative terms.
  • FIG. 17 is a diagram showing an example of the data structure of an alternative term vector table. As shown in FIG. 17, the alternative term vector table T1 includes a plurality of Japanese term vectors (Japanese term vectors).
  • the learning data 95 corresponds to the learning data 95 described with reference to FIG.
  • the training data 95 defines relationships between vectors of terms and vectors of common sentences.
  • a vector of terms indicates a vector of terms included in the target sentence.
  • the description of the terms is the same as the description of the terms described in the first embodiment.
  • a common sentence is a common sentence among a plurality of sentences included in a plurality of terms.
  • a vector of common sentences is a vector of such common sentences.
  • the learning model 96 is a learning model that undergoes machine learning based on the learning data 95.
  • the analysis query 97 is a query specified from the outside.
  • the analysis query 97 is set with Japanese terms to be translated.
  • the control unit 250 has a preprocessing unit 251 , a learning unit 252 and a translation unit 253 .
  • Control unit 250 is implemented by, for example, a CPU or MPU. Also, the controller 250 may be implemented by an integrated circuit such as an ASIC or FPGA.
  • the preprocessing unit 251 generates the transposed index table 243, the alternative term vector table T1, and the learning data 95 in the process of generating the compressed file table 242 based on the bilingual table 241 and the dictionary information 244. An example of the processing of the preprocessing unit 251 will be described below.
  • the processing by which the preprocessing unit 251 generates the compressed file table 242 and the transposed index table 243 is the same as that of the first embodiment, so description thereof will be omitted.
  • the preprocessing unit 251 When calculating the Japanese term vector and generating the Japanese term transposed index 243c, the preprocessing unit 251 repeatedly executes the process of registering the Japanese term vector in the alternative term vector table T1, thereby obtaining the alternative term vector. Generate a vector table T1. In addition, when the preprocessing unit 251 calculates the vector of the designated alternative term in the case where the alternative term candidate is specified in advance, the preprocessing unit 251 stores the vector of the alternative term in the alternative term vector table T1. You may register.
  • the preprocessing unit 251 accepts specification of Japanese items to be set in the learning data 95 and common sentences included in the Japanese items among the Japanese sentences included in the parallel translation table 241 .
  • the preprocessing unit 251 calculates the Japanese term vector and generates the Japanese term transposed index 243c, and when it calculates the Japanese sentence vector and generates the Japanese sentence transposed index 243e, the Japanese term vector and vectors of common sentences included in the Japanese term are repeatedly executed to register the learning data 95, thereby generating the learning data 95.
  • the learning unit 252 executes machine learning of the learning model 96 based on the learning data 95. As described with reference to FIG. 13, the learning unit 252 executes learning by error backpropagation so that the output when the term vector (Japanese term vector) is input to the learning model 96 approaches the vector of the common sentence. do. The learning unit 252 adjusts the parameters of the learning model 96 (performs machine learning) by repeatedly executing the above processing based on the relationship between the term vector included in the learning data 95 and the common sentence vector. do).
  • the translation unit 253 When the analysis query 97 is received, the translation unit 253 translates the terms included in the analysis query 97 .
  • the translation unit 253 receives the analysis query 97 from the input unit 220 or the communication unit 210 and stores it in the storage unit 240 . An example of the processing of the translation unit 253 will be described below.
  • the translation unit 253 performs morphological analysis on the text data of the Japanese term included in the analysis query 97, and divides the text data into multiple words.
  • the translation unit 253 identifies vectors of words based on the words included in the text data and the dictionary information 244 .
  • the translation unit 253 calculates the vector of each sentence by integrating the vector of each word.
  • the translation unit 253 calculates the vector of the term (analysis query 97) by integrating the vectors of each sentence.
  • the translation unit 253 calculates a vector of common sentences corresponding to the analysis query 97 by inputting the vector of the analysis query 97 into the learning model 96 .
  • the translation unit 253 also compares the vector of the analytical query 97 with the vector of each alternative term included in the alternative term vector table T1 to identify the vector of the alternative term similar to the vector of the analytical query 97.
  • a vector of alternative terms that is similar to the vector of analytical query 97 is referred to as a "similar vector.”
  • the translation unit 253 compares the similarity vector with the Japanese term transposed index 243c to identify the offset of the term of the similarity vector.
  • the translation unit 253 narrows down the offset range of the sentences included in the terms of the similar vector based on the offset of the identified term and the Japanese sentence transposition index 143e, and translates the Japanese sentence vector and the , with vectors of common sentences to identify similar Japanese sentence vectors.
  • a Japanese sentence vector similar to a common sentence vector is referred to as a "similar Japanese sentence vector”.
  • a definition table is set that defines the correspondence between each Japanese sentence vector included in the Japanese sentence transposed index 143e and each English sentence vector included in the English sentence transposed index 143f.
  • the translation unit 253 identifies the English sentence vector corresponding to the similar Japanese sentence vector based on the definition table.
  • an English sentence vector similar to a similar Japanese sentence vector is referred to as a "similar English sentence vector”.
  • the translation unit 253 compares the similar English sentence vector and the English sentence transposed index 243f to identify the sentence offset of the similar English sentence vector.
  • the translation unit 253 acquires encoded English sentence information (encoded array) from the English sentence compressed file in the compressed file table 242 based on the specified offset.
  • the translation unit 253 decodes the encoded sentence information based on the encoded sentence information (encoded array) and the dictionary information 244 to generate translation result information.
  • the translation unit 253 causes the display unit 230 to display the information of the translation result. Also, the information of the translation result may be transmitted to the external device that is the transmission source of the analysis query 97 .
  • FIG. 18 is a flow chart showing processing of the learning phase of the information processing apparatus according to the second embodiment.
  • the preprocessing unit 251 of the information processing device 200 executes preprocessing based on the bilingual table 241 to generate the compressed file table 242, the transposed index table 243, and the learning data 95 (step S301).
  • the learning unit 252 of the information processing device 200 executes machine learning of the learning model 96 based on the learning data 95 (step S302).
  • FIG. 19 is a flowchart showing analysis phase processing of the information processing apparatus according to the second embodiment.
  • the translation unit 253 of the information processing device 200 receives the analysis query 97 and stores it in the storage unit 240 (step S401).
  • the translation unit 253 calculates the vector of the analysis query 97 (step S402).
  • the translation unit 253 inputs the vector of the analysis query 97 to the learning model 96 to calculate the vector of the common sentence (step S403).
  • the translation unit 253 compares the vector of the analysis query 97 with each Japanese term vector in the alternative term vector table T1 to identify similar vectors (step S404). Based on the alternative term vector, the translation unit 253 calculates a similar vector and a retranslated sentence vector, and based on the common sentence vector, the Japanese term transposition index 143c, and the Japanese sentence transposition index 143e, the similar Japanese sentence A vector is identified (step S405).
  • the translation unit 253 identifies a similar English sentence vector corresponding to the similar Japanese sentence vector using the common sentence vector and the re-translated sentence vector (step S406).
  • the translation unit 253 identifies the offset based on the similar English sentence vector and the English sentence transposed index 143f, and acquires the code array of the English sentence from the compressed English sentence file (step S407).
  • the translation unit 253 decodes and converts the code array of each sentence (step S408).
  • the translation unit 253 outputs the translation result (step S409).
  • the information processing apparatus 200 inputs the vector of the analysis query 97 to the learned learning model 96 and calculates the vector of the common sentence corresponding to the terms of the analysis query 97 . It also identifies a vector of similar alternative terms based on the vector of analytical queries 97 . As a result, it is possible to calculate a vector of common sentences that are alternative terms that are similar to the terms of the analysis query 97 and that are common between the alternative terms and the terms of the analysis query 97 . By using the calculated common sentence vector, it is possible to extract common sentences of alternative terms similar to the terms of the analysis query 97 and English translations associated with such common sentences.
  • the information processing device also registers the relationship between the common sentence vector and the re-translated sentence vector in the common sentence/re-translated sentence table 98 .
  • a retranslation vector is calculated by comparing and subtracting the vector of each sentence constituting the alternative term and the vector of the common sentence.
  • FIG. 20 is a diagram illustrating an example of a hardware configuration of a computer that implements functions similar to those of the information processing apparatus of the embodiment.
  • the computer 300 has a CPU 301 that executes various arithmetic processes, an input device 302 that receives data input from the user, and a display 303 .
  • the computer 300 also has a communication device 304 and an interface device 305 for exchanging data with an external device or the like via a wired or wireless network.
  • the computer 300 also has a RAM 306 that temporarily stores various information, and a hard disk device 307 . Each device 301 - 307 is then connected to a bus 308 .
  • the hard disk device 307 has a preprocessing program 307a, a learning program 307b, and a translation program 307c.
  • the CPU 301 reads each program 307 a to 307 c and develops them in the RAM 306 .
  • the preprocessing program 307a functions as a preprocessing process 306a.
  • Learning program 307b functions as learning process 306b.
  • the translation program 307c functions as a translation process 306c.
  • the processing of the preprocessing process 306a corresponds to the processing of the preprocessing unit 151 (251).
  • the processing of the learning process 306b corresponds to the processing of the learning unit 152 (252).
  • the processing of the translation process 306b corresponds to the processing of the translation section 153 (253).
  • each program does not necessarily have to be stored in the hard disk device 307 from the beginning.
  • each program is stored in a “portable physical medium” such as a flexible disk (FD), CD-ROM, DVD, magneto-optical disk, IC card, etc. inserted into the computer 300 .
  • the computer 300 may read and execute each of the programs 307a-307c.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
PCT/JP2021/023207 2021-06-18 2021-06-18 翻訳方法、翻訳プログラム及び情報処理装置 Ceased WO2022264404A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2021/023207 WO2022264404A1 (ja) 2021-06-18 2021-06-18 翻訳方法、翻訳プログラム及び情報処理装置
JP2023528915A JPWO2022264404A1 (https=) 2021-06-18 2021-06-18

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/023207 WO2022264404A1 (ja) 2021-06-18 2021-06-18 翻訳方法、翻訳プログラム及び情報処理装置

Publications (1)

Publication Number Publication Date
WO2022264404A1 true WO2022264404A1 (ja) 2022-12-22

Family

ID=84526992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/023207 Ceased WO2022264404A1 (ja) 2021-06-18 2021-06-18 翻訳方法、翻訳プログラム及び情報処理装置

Country Status (2)

Country Link
JP (1) JPWO2022264404A1 (https=)
WO (1) WO2022264404A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025474A (ja) * 2003-07-01 2005-01-27 Advanced Telecommunication Research Institute International 機械翻訳装置、コンピュータプログラム及びコンピュータ
JP2007317000A (ja) * 2006-05-26 2007-12-06 Nippon Telegr & Teleph Corp <Ntt> 機械翻訳装置、その方法およびプログラム
JP2017142758A (ja) * 2016-02-12 2017-08-17 日本電信電話株式会社 単語並べ替え学習方法、単語並べ替え方法、装置、及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025474A (ja) * 2003-07-01 2005-01-27 Advanced Telecommunication Research Institute International 機械翻訳装置、コンピュータプログラム及びコンピュータ
JP2007317000A (ja) * 2006-05-26 2007-12-06 Nippon Telegr & Teleph Corp <Ntt> 機械翻訳装置、その方法およびプログラム
JP2017142758A (ja) * 2016-02-12 2017-08-17 日本電信電話株式会社 単語並べ替え学習方法、単語並べ替え方法、装置、及びプログラム

Also Published As

Publication number Publication date
JPWO2022264404A1 (https=) 2022-12-22

Similar Documents

Publication Publication Date Title
Torfi et al. Natural language processing advancements by deep learning: A survey
US11176462B1 (en) System and method for prediction of protein-ligand interactions and their bioactivity
CN109145294B (zh) 文本实体识别方法及装置、电子设备、存储介质
JP7621805B2 (ja) テキスト分類情報の半教師あり抽出のためのシステム及び方法
CN114528845B (zh) 异常日志的分析方法、装置及电子设备
CN111581229A (zh) Sql语句的生成方法、装置、计算机设备及存储介质
CN110232192A (zh) 电力术语命名实体识别方法及装置
Polunin et al. JACOBI4 software for multivariate analysis of biological data
CN118313460B (zh) 一种在资源受限环境中使用大模型进行查询的方法
WO2020005616A1 (en) Generation of slide for presentation
Bazaga et al. Translating synthetic natural language to database queries with a polyglot deep learning framework
CN116561275A (zh) 对象理解方法、装置、设备及存储介质
Zanibbi et al. Math search for the masses: Multimodal search interfaces and appearance-based retrieval
Chen et al. Personalized expert recommendation systems for optimized nutrition
Liu et al. Visual context learning based on cross-modal knowledge for continuous sign language recognition
ZAHIDI et al. Comparative study of the most useful Arabic-supporting natural language processing and deep learning libraries
CN120449866B (zh) 基于上下文融合思维链的中文文本纠错方法、装置及设备
Wang et al. Construction of bilingual knowledge graph based on meteorological simulation
WO2022264404A1 (ja) 翻訳方法、翻訳プログラム及び情報処理装置
Andrabi et al. A Comprehensive Study of Machine Translation Tools and Evaluation Metrics
Zhang et al. Multilingual mixture attention interaction framework with adversarial training for cross-lingual SLU
Kalajdjieski et al. Recent advances in sql query generation: A survey
CN115168550A (zh) 一种问句意图匹配方法及终端
Hamed et al. A database for deriving diachronic universals
Pan et al. Learning explicit radical representations for zero-shot Chinese character recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946079

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023528915

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946079

Country of ref document: EP

Kind code of ref document: A1