WO2021235586A1

WO2021235586A1 - Electronic device for translating text sequence and operation method thereof

Info

Publication number: WO2021235586A1
Application number: PCT/KR2020/007017
Authority: WO
Inventors: 수레이만 모하매드 알자파리아야; 콰셈 무헤딘 제이터무헤딘; 자말 아부아마르아날레; 와리드 자이카트루바
Original assignee: 삼성전자 주식회사
Priority date: 2020-05-21
Filing date: 2020-05-29
Publication date: 2021-11-25
Also published as: KR20210144975A

Abstract

Disclosed is a method, in a first electronic device, for translating a text sequence, the method comprising: by encoding a first text group, among at least one text group included in a text sequence of a first language, which does not include a first token indicating the end of the text sequence, acquiring first context information corresponding to the first text group; by decoding the first context information, acquiring a second text group of a second language, corresponding to the first text group; detecting a second token in the second text group; and, on the basis that the second token has been detected, outputting the second text group as a translation result of the first text group.

Description

Electronic device for translating a text sequence and method for operating the same

The present disclosure relates to an electronic device for translating a text sequence of a first language into a second language, and an operating method thereof.

As automatic speech recognition technology and machine translation technology develop, a speech translation service that recognizes a speech signal and automatically translates and outputs the speech signal is being provided.

In a lecture or conversation with a foreigner, if voice translation service is provided, whenever a voice signal is received by the speaker, the translated result of the voice signal should be output as soon as possible so that the listener can easily recognize the conversation or lecture. can

Accordingly, there is a need to provide a method of outputting a translation result for a voice signal as quickly as possible so that a listener can quickly recognize the content of the speaker's utterance according to the continuously received voice signal.

The technical solution according to the present disclosure is to solve the above-described problem, and an electronic device for translating a text sequence of a first language into a second language and an operating method thereof are provided.

In addition, there is provided a computer-readable recording medium in which a program for executing the method in a computer is recorded.

1 is a block diagram illustrating an example of a method of translating a text sequence of a first language into a second language according to an embodiment.

2 is a diagram illustrating an example of translating a text sequence according to an embodiment.

3 is a block diagram illustrating an internal configuration of a first electronic device according to an exemplary embodiment.

4 is a block diagram illustrating an internal configuration of a first electronic device according to an exemplary embodiment.

5 is a flowchart illustrating a method of translating a text sequence according to an embodiment.

6 is a block diagram illustrating an example of learning an artificial intelligence model for translating a text sequence according to an embodiment.

7 is a block diagram illustrating an internal configuration of a second electronic device according to an exemplary embodiment.

8 is a block diagram illustrating an internal configuration of a second electronic device according to an exemplary embodiment.

As a technical means for solving the above problem, a first aspect of the present disclosure provides a method of translating a text sequence in a first electronic device, comprising: obtaining first context information corresponding to the first text group by encoding a first text group that does not include a first token indicating an end of a text sequence; obtaining a second text group of a second language corresponding to the first text group by decoding the first context information; detecting a second token in the second text group; and outputting the second text group as a translation result for the first text group as the second token is detected.

Also, in a second aspect of the present disclosure, in a method for learning an artificial intelligence model for translating a text sequence in a second electronic device, a text sequence of a first language and a first language corresponding to the text sequence of the first language obtaining a text sequence of two languages; segmenting the text sequence of the second language and inserting a second token at the segmented position; identifying each section in which the text sequence of the first language is divided corresponding to each section in which the text sequence of the second language is divided; and encoding each section in which the text sequence of the first language is divided based on the identified correspondence relationship, and decoding the encoded result, so that each section in which the text sequence of the second language is divided can be output. To do so, it is possible to provide a method, including the step of learning the artificial intelligence model.

In addition, a third aspect of the present disclosure provides a first electronic device for translating a text sequence, comprising: a memory for storing data necessary for translating the text sequence; First context information corresponding to the first text group by encoding a first text group that does not include a first token indicating the end of the text sequence among at least one text group included in the text sequence of the first language at least one method for obtaining a second text group of a second language corresponding to the first text group and detecting a second token in the second text group by obtaining processor; and an output unit configured to output the second text group as a translation result for the first text group when the second token is detected.

Also, according to a fourth aspect of the present disclosure, in the second electronic device for learning an artificial intelligence model for translating a text sequence, a text sequence of a first language and a text sequence of a second language corresponding to the text sequence of the first language are provided. obtaining a text sequence, dividing the text sequence of the second language, inserting a second token at the divided position, and corresponding to each section in which the text sequence of the second language is divided, Identifies each section into which the text sequence is divided, and encodes each section into which the text sequence of the first language is divided, based on the identified correspondence relationship, and decodes the encoded result of the text of the second language at least one processor for learning the artificial intelligence model so that each section in which the sequence is divided can be output; and a memory for storing the learned artificial intelligence model.

In addition, a fifth aspect of the present disclosure may provide a recording medium in which a program for performing the method of the first aspect or the second aspect is stored.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that other components may be further included, rather than excluding other components, unless otherwise stated.

Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

Referring to FIG. 1 , the first electronic device 1000 may translate a text sequence of a first language into a second language and output it. According to an embodiment, the translated text sequence of the first language may be obtained by performing voice recognition on the voice signal of the first language received by the first electronic device 1000 . The first electronic device 1000 according to an embodiment may acquire a text sequence of the first language through various methods, not limited to voice recognition.

Also, the first electronic device 1000 may translate the text sequence of the first language into the second language, convert the translated text sequence of the second language into a voice signal, and output the converted text sequence. Without being limited to the above-described example, the first electronic device 1000 may convert and output the translated text sequence of the second language into various forms.

The first electronic device 1000 according to an embodiment may sequentially acquire a text sequence, which is a set of speech-recognized texts, as a voice signal of the first language is sequentially received. For example, the first electronic device 1000 performs voice recognition for each segment of the divided voice signal according to the tone of the voice signal or a pose section sensed in the voice signal, so that the texts are sequentially The listed text sequence can be obtained.

According to an embodiment, a text sequence including a first token (ex. <eos> token) indicating the end of the text sequence, for example, the end of the sentence, or the end of the sentence, may be obtained at the end of the text sequence. have. Accordingly, according to an embodiment, according to the first token, a section of the text sequence may be divided into sentence units, and translation may be performed for each divided text sequence.

The text sequence according to an embodiment may include at least one text arranged in order, and as the first token is inserted into the text sequence, the text sequence may be divided into sentence units.

The first token according to an embodiment may be encoded last after words preceding the first token are sequentially first encoded when a text sequence of the first language corresponding to one sentence is encoded.

Encoding and decoding according to an embodiment may be performed in units of words, but is not limited thereto, and may be performed in various units (eg, phrases, morphemes, idioms).

According to an embodiment, context information, which is a result of encoding a text sequence of the first language corresponding to one sentence, may be decoded to obtain a text sequence of the second language. The text sequence of the second language according to an embodiment may include a first token positioned at the end of the text sequence, similarly to the text sequence of the first language.

The context information according to an embodiment may include information in which words included in a text sequence of the first language are sequentially encoded. For example, the context information may include values in a vector format as a result of sequentially encoding words included in a text sequence of the first language. The context information may include information in which the first token is further encoded, as well as the word.

According to an embodiment, the encoder and the decoder, which are used for encoding and decoding, sequentially process words, at least one artificial intelligence model (ex. Recurrent Neural Network (RNN), Long short-term memory (LSTM)) ) can be composed of

The artificial intelligence model used for encoding in the encoder according to an embodiment may output context information by sequentially processing words included in a text sequence of a first language. For example, when a plurality of words are sequentially processed by the artificial intelligence model of the encoder, the current word is processed based on the word processed in the previous encoding step, thereby outputting context information. For example, based on the processing result of the first encoded word 1, by encoding the word 2 in the next step, context information that is a result of encoding all the words in order may be output.

Also, the artificial intelligence model used for decoding in the decoder according to an embodiment may sequentially output words of the second language based on context information output by the encoder. For example, when the words of the second language are sequentially output by the artificial intelligence model of the decoder, the current word is output based on the word output in the previous decoding step, thereby including the words output from the decoder. A text sequence in the second language may be obtained as a result of the translation. For example, based on the word 1 output as a result of decoding in the previous step, decoding is performed in the current step, so that word 2 may be output. That is, word 2 may be output as a decoding result in the current step, as word 1 is output as a decoding result in the previous step.

According to an embodiment, the artificial intelligence model used by the encoder to perform encoding may be preliminarily configured to output appropriate context information for obtaining the text sequence of the second language based on the text sequence of the first language. can be learned Also, according to an embodiment, the artificial intelligence model used by the decoder to perform decoding is pre-learned so as to output the text sequence of the second language corresponding to the text sequence of the first language by decoding the context information. can be

According to an embodiment, since decoding is performed after encoding, encoding and decoding may be performed by one pre-trained AI model. The present invention is not limited thereto, and encoding and decoding may be performed by a plurality of artificial intelligence models, respectively.

The text sequence according to an embodiment may be translated in units of sentences or phrases divided according to the first token. For example, with respect to a text sequence of a first language corresponding to one sentence, context information that is an encoded result may be decoded to obtain a text sequence of a second language. According to an embodiment, the context information obtained by sequentially encoding at least one word included in the text sequence of the first language may be decoded.

The translation of the text sequence is not limited to the above-described example in which the text sequence is translated in units of sentences, and the translation of the text sequence of the first language may be performed according to various units divided by the first token. In the present specification, it is assumed that the unit of the text sequence divided by the first token is the sentence unit, but the present disclosure is not limited thereto, and may be divided into various units.

According to an embodiment, since the encoding and decoding results may be changed according to the order of the preceding and following words, even if encoding and decoding are performed whenever a word of the first language is sequentially obtained, the output of the decoded result is a sentence It is preferable that encoded and decoded results for all words included in the text sequence of the first language of the unit are output.

However, according to an embodiment, based on context information that is a result encoded for at least one word of the first language that is sequentially obtained, when the decoded result includes a token specified in advance, the It may be output even though it is not the decoding result for the .

According to an embodiment, even if the decoded result is not decoding based on the context information for the complete sentence, in the case of decoding based on the encoded context information in units of sections divided according to a token specified in advance, output as a translation result can be According to an embodiment, the token specified in advance may be a second token that can be inserted into a section of a text sequence of a second language divided according to a context or a sentence type.

For example, the second token is inserted in the text sequence of the second language at a position divided according to context or sentence type, so that the text sequence of the second language including the second token is learned in advance at the decoding end, so that decoding In poetry, it is a token that can be obtained as a sequence of text in a second language. According to an embodiment, in a section divided according to context or sentence type, after the section is decoded, considering that the decoded result is not significantly changed according to the word obtained from the text sequence of the first language, first By outputting the decoded result, user convenience to obtain a faster translation result can be promoted.

The second token according to an embodiment is not limited to being inserted at a position divided according to context or sentence type in the text sequence of the second language, and the user recognizes the translation result, even if it is first output as a decoding result, The second token is inserted into the text sequence of the second language according to whether the decoding result output first is a section in which the possibility of being deformed is relatively low by at least one word of the first language obtained later, at the decoding end can be learned in advance.

Accordingly, according to an embodiment, before the complete sentence of the first language is obtained, the result of the translation may be quickly output in real time.

According to an embodiment, instead of outputting the decoded result after all the words of the text sequence of the first language are encoded, each time the words included in the sequentially input text sequence of the first language are sequentially encoded, Decoding may be performed. For example, whenever words included in the text sequence of the first language are sequentially encoded, decoding may be performed on the obtained context information.

Therefore, according to one embodiment, instead of 'after encoding up to the first token included with the words included in the text sequence of the first language and finally included, decoding is performed', before the first token is encoded, the obtained Based on the context information, decoding may be performed.

However, according to an embodiment, decoding may be performed as encoding is performed whenever words of the first language are sequentially obtained, but outputting the decoded result is detected by the second token It can be performed according to Accordingly, according to an embodiment, as the second token is detected from the result of decoding, in the text sequence of the first language, before the first token is processed, the second token is obtained until the second token is output. Words of the language may be output as a result of the translation.

Accordingly, according to an embodiment, as the translated result is output in units of sections that are relatively shorter than sections in units of sentences or phrases in which the speech recognized text sequence is obtained, the translation result may be output more quickly.

Also, as the first electronic device 1000 according to an embodiment provides the translation result to the user quickly, the translated result may be provided at a point in time when the voice signal of the first language is not significantly delayed from the point in time when the speech signal is spoken. can Accordingly, according to an embodiment, the convenience of the user receiving the interpretation and translation service may be increased.

The first electronic device 1000 according to an embodiment may be implemented in various forms. For example, the first electronic device 1000 described herein may include a digital camera, a smart phone, a laptop computer, a tablet PC, an electronic book terminal, a digital broadcasting terminal, and a personal digital (PDA). Assistants), a Portable Multimedia Player (PMP), a navigation system, an MP3 player, a vehicle, and the like, but is not limited thereto. The first electronic device 1000 described herein may be a wearable device that can be worn by a user. Wearable devices include accessory type devices (e.g., watches, rings, wristbands, ankle bands, necklaces, eyeglasses, contact lenses), head-mounted-devices (HMDs), textile or clothing-integrated devices (e.g., electronic clothing), a body attachable device (eg, a skin pad), or a bioimplantable device (eg, an implantable circuit). Hereinafter, for convenience of description, a case in which the first electronic device 1000 is a smart phone will be described as an example.

According to an embodiment, the first electronic device 1000 interprets or translates for a conversation between a first user and a second user who use different languages. By translating the signal into the second language, a voice signal of the second language may be output. Here, interpretation is converting a voice signal formed in the first language into 'speech', which is a voice signal formed in the second language, and translation is converting a voice signal formed in the first language into 'text' formed in the second language. )' is converted to

According to an embodiment, both interpretation and translation may include an operation in which, after a voice signal in the first language is converted into text in the first language, the text in the second language is obtained as a translated result. Accordingly, the method of translating text according to an embodiment may be used in both interpretation and translation.

According to an embodiment, the first electronic device 1000 obtains a text sequence of a first language in the step 110 , obtains context information through encoding 120 , and obtains a text sequence in a second language through decoding 130 . ) and outputting the text sequence of the second language 140 , an operation of translating the text may be performed.

In the step of obtaining the text sequence of the first language 110 according to an embodiment, the first electronic device 1000 may obtain the text sequence of the first language to be translated. The text sequence of the first language according to an embodiment may include texts in which at least one text of the first language is arranged in the order in which it is obtained in the first electronic device 1000 .

In step 120 of obtaining context information through encoding according to an embodiment, the first electronic device 1000, for the texts included in the text sequence, for each translatable unit (eg, words, idioms) Encoding may be performed and context information may be obtained. For example, when words of “look the weather is nice <eos>” are sequentially obtained as a text sequence of the first language, the first electronic device 1000 may display “look”, “the”, and “weather” Context information may be obtained by sequentially performing encoding on ", "is", "nice", and <eos>, respectively. As an example, as a result of encoding “look”, context information 1 may be obtained, and as a result of sequentially encoding “look” and “the”, context information 2 may be obtained. Similarly, as a result of sequentially encoding "look" to <eos>, context information 6 may be obtained.

However, according to an embodiment, whenever the context information is obtained, decoding of the context information may be performed. After at least one word of the first language is excluded, encoding may be performed. For example, in the decoding result for context information 1, when the first token is detected, instead of sequentially encoding "look" and "the", encoding may be performed on "the" excluding "look". have. According to an embodiment, since a decoding result corresponding to “look” is output as a translation result as the first token is detected, it is preferable that “look” is not encoded thereafter.

In addition, as a result of decoding on the context information corresponding to "the", in the following step 150, when the first token is not detected, "the" is not excluded from the subsequent encoding operation, "the" and "weather" Decoding may be performed on context information encoded in the order of ". On the other hand, when the first token is detected, "the" may be excluded from the subsequent encoding operation, and decoding may be performed on context information in which "weather" is encoded.

According to an embodiment, in step 130 of obtaining the text sequence of the second language through decoding, the first electronic device 1000 decodes the context information obtained in step 120, so that the text sequence of the second language can be obtained. According to an embodiment, as the context information is decoded, words of the second language may be sequentially obtained. According to an embodiment, the current word may be obtained based on the order of words obtained in the preceding step, so that the words of the second language may be sequentially obtained.

In step 140 of outputting the text sequence of the second language according to an embodiment, the first electronic device 1000 receives the second token from among the output words of the second language as a result of decoding the context information According to the detection, the previously acquired words of the second language may be output as a decoding result.

Also, in order to provide a simultaneous interpretation service, the first electronic device 1000 according to an embodiment may convert a text sequence of the second language output as a decoding result into a voice signal of the second language and output the converted text sequence. For example, in step 140 , whenever a text sequence of the second language is output, the first electronic device 1000 may convert the text into a voice signal and output it according to a text to speech (TTS) technique. have.

According to an embodiment, in step 150 of determining whether the first token is detected, the first electronic device 1000 determines whether the first token is detected in the text sequence of the second language output as a result of decoding in step 140 . Accordingly, the encoding of step 120 may be performed again.

According to an embodiment, when it is determined that the first token indicating the end of a sentence is detected in the text sequence of the second language in step 150 , the first electronic device 1000 does not perform encoding in step 120 , The translation operation for the text sequence of the first language including the one sentence obtained in step 110 may be terminated.

Alternatively, when it is determined that the first token is detected in the text sequence of the second language according to an embodiment, in step 120, as encoding of the text sequence of the first language including a new sentence is performed, steps 120 to A translation operation of 150 may be performed.

When the text sequence of the second language according to an embodiment includes the first token, it may be determined that the text sequence of the second language is finished, and translation of the text sequence of the first language may be ended.

Also, the first electronic device 1000 according to an embodiment may receive the text of the new first language including one independent sentence, and may repeatedly perform steps 110 to 150 .

On the other hand, when it is determined in step 150 that the first token indicating the end of a sentence is not detected in the text sequence of the second language, in step 120 , the first electronic device 1000 , Words included in a text sequence of a language can be encoded sequentially.

Referring to FIG. 2 , on the time axis, upper blocks indicate an operation of an encoder according to an embodiment, and lower blocks indicate an operation of a decoder according to an embodiment.

The encoder and the decoder according to an embodiment may sequentially process texts by LSTM, as shown in the illustrated example. Not limited to the above example, the encoder and the decoder may use other types of recurrent neural networks (ex. RNNs).

At t1 , the encoder 210 may output first context information as a result of encoding “look”, which is a text group of the first input first language.

t1 to t17 according to an embodiment indicate a time point at which an encoding or decoding operation is performed by each LSTM.

At t2 and t3, the decoder 220 may sequentially output "look" and <sep> tokens as a result of decoding based on the first context information. According to an embodiment, at t2 , “look” may be output to the first LSTM 221 based on <go> and the first context information. <go> is a token indicating the start of a sentence, and may be input to the LSTM 221 as an initial value when decoding is started. Also, at t3 , as the output value of the first LSTM 221 “look” is input to the second LSTM 222 , <sep> may be output.

According to an embodiment, as the second token <sep> is output, "Look" may be output 223 as a translation result for the text group of the first language. According to an embodiment, "Look" output according to the <sep> token according to an embodiment is determined to be less likely to be significantly changed to another word by "the weather" and "is nice" of the first language to be encoded. As a result, it may be output first and provided to the user.

According to an embodiment, when it is determined that the probability that the currently translated result is significantly changed is low, the result is output first and provided to the user, thereby improving user convenience. Therefore, in the learning step according to an embodiment, a model for a translation operation may be trained so that a <sep> token is inserted at an appropriate position and a translation result is output before the <eos> token is encoded. .

According to an embodiment, before the text sequence of the first language recognized by the first electronic device 1000 is all encoded up to the sentence end position indicated by the <eos> token, even though only “look” is encoded, the decoding result As the <sep> token is detected in , a translation result corresponding to “look” may be output first. Accordingly, according to an embodiment, as with simultaneous interpretation, as texts of the first language are sequentially input, even if it takes a considerable amount of time until all the text sequences of the first language are acquired, a complete sentence of the first language is obtained Since the translation result can be output before being performed, user convenience can be promoted.

At t4, the encoder 232 may output second' context information as a result of encoding "the" input after "look". The encoder 232 according to an embodiment, when a result decoded based on the first context information corresponding to “look” includes <sep> and is output as a translation result, except for “look” Thereafter, encoding may be newly performed from the received “the”.

At t5 and t6, the decoder 240 may sequentially output "that" and <continuous> tokens as a result of decoding based on the second' context information. According to an embodiment, at t5 , “that” may be obtained based on <go> and the second 'context information in the first LSTM 241 . <go> may be input to the LSTM 241 as an initial value when decoding is started. Also, at t6, as the output value of the first LSTM 241 is input to the second LSTM 242, <continuous> may be output.

The <continuous> token according to an embodiment may indicate that the <sep> token is not included in the decoding result based on the second' context information. Not limited to the above example, instead of the <continuous> token, various types of information may be obtained as a result of decoding of the second LSTM 242 . Accordingly, a result decoded based on the second 'context information may not be output as a translation result as the <sep> token is not obtained as a decoding result.

As it is determined that the decoded "he" according to an embodiment is highly likely to be changed to another word by the word of the first language (ex. weather) to be encoded thereafter, encoding and An artificial intelligence model used for decoding may be trained in advance. For example, the artificial intelligence model used for encoding and decoding may be pre-trained so that the <sep> token does not appear "after" as a result of decoding of the second language.

The encoder 230 according to an embodiment, based on the result of encoding "the" at t7, as the <sep> token is not detected in the decoded result based on the second 'context information, In the second LSTM 233, it may encode “weather”. According to an embodiment, the encoded second context information may be output according to the order in which “weather” appears after “the”.

As “weather” is encoded in the second LSTM 233 of the encoder 230 according to an embodiment, second context information that is information in which “the” and “weather” are sequentially encoded may be output.

At t8, t9, and t10, the decoder 250 may sequentially output "he", "weather" and <sep> tokens as a result of decoding based on the second context information. According to an embodiment, at t8 , “that” may be output to the first LSTM 251 based on <go> and the second context information. <go> may be input to the LSTM 251 as an initial value when decoding is started. Also, at t9 , as “that”, which is an output value of the first LSTM 251 , is input to the second LSTM 252 , “weather” may be output. Also, at t10 , as “weather”, which is an output value of the second LSTM 252 , is input to the third LSTM 253 , a <sep> token may be output.

According to an embodiment, as the second token <sep> is output, “weather” may be output 254 as a translation result for the text group of the first language. According to an embodiment, "he" is determined to be an unnecessary word in consideration of "look 223" that is first output in the translation sentence, and thus only "weather" may be output after it is removed. It is not limited to the above-mentioned example, and "he" may also be output as a translation result.

At t11, the encoder 262 may output 3' context information as a result of encoding "is" input after "weather". The encoder 262 according to an embodiment may output "the weather" as a translation result as a result decoded based on the second context information corresponding to "the weather" includes <sep>. Except for, encoding may be newly performed from "is" received thereafter.

At t12 and t13, the decoder 270 may sequentially output "is" and <continuous> tokens as a result of decoding based on the 3' context information. According to an embodiment, at t12 , “is” may be output to the first LSTM 271 based on <go> and the third 'context information. <go> may be input to the LSTM 271 as an initial value when decoding is started. Also, at t13, as the output value of the first LSTM 271 is input to the second LSTM 272, <continuous> may be output.

The <continuous> token according to an embodiment may indicate that the <sep> token is not included in the decoding result based on the third' context information. Not limited to the above example, instead of the <continuous> token, various types of information may be obtained as a result of decoding of the second LSTM 272 . Accordingly, a result decoded based on the 3' context information may not be output as a translation result as the <sep> token is not obtained as a decoding result.

According to an embodiment, the decoded "is" is determined to be highly likely to be changed to another word by the word of the first language (ex. weather) to be encoded, so that it is not output as a translation result first, encoding and An artificial intelligence model used for decoding may be trained in advance. For example, as a result of decoding of the second language, an artificial intelligence model used for encoding and decoding may be trained in advance so that the <sep> token does not appear after "is".

The encoder 260 according to an embodiment, based on the result of encoding "is" at t14, as the <sep> token is not detected in the decoded result based on the 3' context information, In the second LSTM 263, we can encode "nice".

As a result of encoding "nice" in the second LSTM 263 of the encoder 280 according to an embodiment, third context information may be output.

The third context information according to an embodiment may be obtained according to the result of further encoding the <eos> token indicating the end of the sentence in the LSTM based on the result of encoding “nice”. However, the decoding may be performed by the decoder 280 based on the third context information output without further encoding the <eos> token as shown in the illustrated example.

At t15, t16, and t17, the decoder 280 may sequentially output "is", "like", and <eos> tokens as a result of decoding based on the third context information. According to an embodiment, at t15 , “is” may be output to the first LSTM 281 based on <go> and the third context information. <go> may be input to the LSTM 281 as an initial value when decoding is started. Also, at t16, as “is”, which is an output value of the first LSTM 281 , is input to the second LSTM 282 , “like” may be output. Also, at t17 , as “like”, which is an output value of the second LSTM 282 , is input to the third LSTM 283 , the <eos> token may be output.

According to an embodiment, as the first token <eos>, which is the first token indicating the end of the sentence, is output, “like” may be output 254 as a translation result for the text group of the first language. According to an embodiment, "is" is determined to be an unnecessary word in the translation sentence in consideration of the first output "Look, the weather" (223, 254), so after it is removed, only "good" may be output. . It is not limited to the above example, and "is" may also be output as a translation result.

3 is a block diagram illustrating an internal configuration of the first electronic device 1000 according to an embodiment.

4 is a block diagram illustrating an internal configuration of the first electronic device 1000 according to an embodiment.

Referring to FIG. 3 , the first electronic device 1000 may include a processor 1300 , a memory 1700 , and an output unit 1200 . However, not all of the components illustrated in FIG. 3 are essential components of the first electronic device 1000 . The first electronic device 1000 may be implemented by more components than those illustrated in FIG. 3 , or the first electronic device 1000 may be implemented by fewer components than those illustrated in FIG. 3 . have.

For example, as illustrated in FIG. 4 , the first electronic device 1000 includes a processor 1300 , a memory 1700 , and an output unit 1200 other than the processor 1300 , the memory 1700 , and the output unit 1200 . It may further include a user input unit 1100 , a sensing unit 1400 , a communication unit 1500 , and an A/V input unit 1600 .

The user input unit 1100 means a means for a user to input data for controlling the first electronic device 1000 . For example, the user input unit 1100 includes a key pad, a dome switch, and a touch pad (contact capacitive method, pressure resistance film method, infrared sensing method, surface ultrasonic conduction method, integral type). There may be a tension measurement method, a piezo effect method, etc.), a jog wheel, a jog switch, and the like, but is not limited thereto.

According to an embodiment, the user input unit 1100 may receive a user input for translating a text sequence of a first language into a second language.

The output unit 1200 may output an audio signal, a video signal, or a vibration signal, and the output unit 1200 may include a display unit 1210 , a sound output unit 1220 , and a vibration motor 1230 . have.

The display unit 1210 displays and outputs information processed by the first electronic device 1000 . According to an embodiment, the display unit 1210 may output a result of a text sequence being translated.

On the other hand, when the display unit 1210 and the touch pad form a layer structure to form a touch screen, the display unit 1210 may be used as an input device in addition to an output device. The display unit 1210 includes a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, a three-dimensional display ( 3D display) and electrophoretic display (electrophoretic display) may include at least one. Also, depending on the implementation form of the first electronic device 1000 , the first electronic device 1000 may include two or more display units 1210 .

The sound output unit 1220 outputs audio data received from the communication unit 1500 or stored in the memory 1700 . According to an embodiment, the sound output unit 1220 may output a result of a text sequence being translated. For example, the sound output unit 1220 may output a translated result of a text sequence converted into a voice signal.

The vibration motor 1230 may output a vibration signal. Also, the vibration motor 1230 may output a vibration signal when a touch is input to the touch screen. According to an embodiment, the vibration motor 1230 may output information related to a result of the text sequence being translated.

The processor 1300 generally controls the overall operation of the first electronic device 1000 . For example, the processor 1300 executes programs stored in the memory 1700 , and thus the user input unit 1100 , the output unit 1200 , the sensing unit 1400 , the communication unit 1500 , and the A/V input unit 1600 . ) can be controlled in general.

The first electronic device 1000 may include at least one processor 1300 . For example, the first electronic device 1000 may include various types of processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a neural processing unit (NPU).

The processor 1300 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The command may be provided to the processor 1300 from the memory 1700 or may be received through the communication unit 1500 and provided to the processor 1300 . For example, the processor 1300 may be configured to execute instructions according to program codes stored in a recording device such as a memory.

The processor 1300 according to an embodiment encodes a first text group that does not include a first token indicating the end of the text sequence among at least one text group included in the text sequence of the first language, so that the first The first context information corresponding to the text group may be acquired. Also, the processor 1300 obtains a second text group of a second language corresponding to the first text group by decoding the first context information, and determines whether the second token is included in the second text group. can do.

The processor 1300 according to an embodiment may output the second text group as a translation result for the first text group as the second token is detected from the second text group.

According to an embodiment, the second token is a text of the second language according to a possibility that the second text group is changed to another text by at least one text group of the first language encoded after the first text group. It can be learned by being inserted into a group. For example, as it is determined that the possibility that the second text group will be changed to another text by another text to appear later is relatively low, the second token may be inserted in the next order of the second text group. According to an embodiment, an artificial intelligence model used for decoding may be trained so that the second token may appear as a decoding result at a position where the second token is inserted.

On the other hand, in the second text group, if the second token is not detected, the processor 1300 is configured to perform a third text group including the first text group and at least one text in the following order in the text sequence of the first language. can be encoded. Also, the processor 1300 may obtain the third context information as a result of encoding the third text group, and may obtain the fourth text group of the second language by decoding the third context information.

The sensing unit 1400 may detect a state of the first electronic device 1000 or a state around the first electronic device 1000 , and transmit the sensed information to the processor 1300 .

The sensing unit 1400 includes a geomagnetic sensor 1410 , an acceleration sensor 1420 , a temperature/humidity sensor 1430 , an infrared sensor 1440 , a gyroscope sensor 1450 , and a position sensor. (eg, GPS) 1460 , a barometric pressure sensor 1470 , a proximity sensor 1480 , and at least one of an illuminance sensor 1490 , but is not limited thereto.

The communication unit 1500 may include one or more components that allow the first electronic device 1000 to communicate with the server 2000 or an external device (not shown). For example, the communication unit 1500 may include a short-range communication unit 1510 , a mobile communication unit 1520 , and a broadcast receiving unit 1530 .

Short-range wireless communication unit 1510, Bluetooth communication unit, BLE (Bluetooth Low Energy) communication unit, short-range wireless communication unit (Near Field Communication unit), WLAN (Wi-Fi) communication unit, Zigbee (Zigbee) communication unit, infrared ( It may include an IrDA, infrared Data Association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, an Ant+ communication unit, and the like, but is not limited thereto.

The mobile communication unit 1520 transmits/receives a radio signal to and from at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of data according to transmission/reception of a voice call signal, a video call signal, or a text/multimedia message.

The broadcast receiver 1530 receives a broadcast signal and/or broadcast-related information from the outside through a broadcast channel. The broadcast channel may include a satellite channel and a terrestrial channel. According to an embodiment, the first electronic device 1000 may not include the broadcast receiver 1530 .

According to an embodiment, the communication unit 1500 may transmit/receive data required to translate a text sequence. For example, the communication unit 1500 may receive a text sequence of the first language to be translated from the outside.

The A/V (Audio/Video) input unit 1600 is for inputting an audio signal or a video signal, and may include a camera 1610 , a microphone 1620 , and the like. The camera 1610 may obtain an image frame such as a still image or a moving image through an image sensor in a video call mode or a shooting mode. The image captured through the image sensor may be processed through the processor 1300 or a separate image processing unit (not shown).

The microphone 1620 receives an external sound signal and processes it as electrical voice data. The microphone 1620 according to an embodiment may sequentially receive a user's voice signal corresponding to a text sequence of the first language. According to an embodiment, by performing voice recognition on the user's voice signal received by the microphone 1620, a text sequence of the first language may be obtained.

The memory 1700 may store a program for processing and controlling the processor 1300 , and may also store data input to or output from the first electronic device 1000 .

The memory 1700 according to an embodiment may store data required to translate a text sequence. For example, the memory 1700 may store a learning model (eg, RNN, LSTM) used in an encoder and a decoder for translating a text sequence.

The memory 1700 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , may include at least one type of storage medium among optical disks.

Programs stored in the memory 1700 may be classified into a plurality of modules according to their functions, for example, may be classified into a UI module 1710 , a touch screen module 1720 , a notification module 1730 , and the like. .

The UI module 1710 may provide a specialized UI, GUI, or the like that interworks with the first electronic device 1000 for each application. The touch screen module 1720 may detect a touch gesture on the user's touch screen and transmit information about the touch gesture to the processor 1300 . The touch screen module 1720 according to some embodiments may recognize and analyze a touch code. The touch screen module 1720 may be configured as separate hardware including a controller.

Various sensors may be provided inside or near the touch screen to detect a touch or a proximity touch of the touch screen. A tactile sensor is an example of a sensor for detecting a touch of a touch screen. A tactile sensor refers to a sensor that senses a touch of a specific object to the extent or higher than that felt by a human. The tactile sensor may sense various information such as the roughness of the contact surface, the hardness of the contact object, and the temperature of the contact point.

The user's touch gesture may include a tap, touch & hold, double tap, drag, pan, flick, drag and drop, swipe, and the like.

The notification module 1730 may generate a signal for notifying the occurrence of an event of the first electronic device 1000 .

Referring to FIG. 5 , in operation 510 , the first electronic device 1000 performs a second message that does not include the first token indicating the end of the text sequence among at least one text group included in the text sequence of the first language. By encoding one text group, first context information corresponding to the first text group may be obtained.

The text sequence of the first language according to an embodiment may be acquired by performing voice recognition on the user's voice signal acquired by the first electronic device 1000 . The first text group according to an embodiment may include texts constituting a part of a sentence, not a complete sentence, as texts included in the text sequence of the first language are sequentially acquired.

The first context information according to an embodiment may be obtained by being encoded by an encoder that performs encoding using a pre-trained learning model.

In operation 520, the first electronic device 1000 may obtain a second text group of a second language by decoding the first context information. The second text group of the second language according to an embodiment may be obtained by decoding the first context information by a decoder that performs decoding using a pre-trained learning model.

In operation 530, the first electronic device 1000 may determine whether the second token is included in the second text group. According to an embodiment, the second token is a text of the second language according to a possibility that the second text group is changed to another text by at least one text group of the first language encoded after the first text group. After being inserted into the group, learning may be performed based on the second text group into which the second token is inserted.

Also, in operation 540 , when detecting the second token in the second text group, the first electronic device 1000 may output the second text group to the outside as a translation result. For example, the first electronic device 1000 may convert texts of the second text group including the second token into voice signals, output them through a speaker, or display the texts of the second text group on a display. .

On the other hand, when the second token is not detected in the second text group, the first electronic device 1000 , in the text sequence of the first language, includes the first text group and at least one text in the following order. 3 text groups can be encoded. Also, as a result of encoding the third text group, the first electronic device 1000 may acquire the third context information and decode the third context information, thereby acquiring the fourth text group of the second language.

Referring to FIG. 6 , the second electronic device 2000 may learn an artificial intelligence model used to obtain a text sequence of a second language by translating a text sequence of a first language according to an embodiment. According to an embodiment, an artificial intelligence model used to encode a text sequence of a first language and decode an encoded result may be trained.

The second electronic device 2000 for learning the artificial intelligence model for translating a text sequence according to an embodiment may be the same as the first electronic device 1000 for translating the text sequence, but is not limited thereto, and other It may be a device.

In

steps

610 and 620 , the second electronic device 2000 may obtain a text sequence of a first language and a text sequence of a second language.

According to an embodiment, as a result of the translation of the text sequence of the first language, the artificial intelligence model may be trained so that the text sequence of the second language may be obtained. The text sequences of the first language and the second language according to an embodiment may each include one complete sentence.

In operation 630 , the second electronic device 2000 may insert the second token into the text sequence of the second language. The first electronic device 1000 according to an embodiment may change the second text group, which is the currently decoded result, into another text according to the possibility that the second text group is changed to another text according to at least one text group of the first language encoded thereafter. It is possible to segment the text sequence of the language, and insert the second token at the segmented position. For example, the second token may be inserted at a location where a text sequence of the second language is divided according to a context or a sentence format.

Each section in which the text sequence of the second language is divided according to an embodiment may include a second token or a first token at the end of each section. Accordingly, as a decoded result, when the first token or the second token is detected, before encoding for all sections of the text sequence of the first language is completed, a portion of the text sequence of the second language in which the second token is detected A section may be output. In addition, in the decoded result, when the first token is detected, it is determined that the sentence of the text sequence of the second language is finished, and thus a section of the text sequence of the second language in which the first token is detected may be output. .

In operation 640 , the second electronic device 2000 may identify each section in which the text sequence of the first language is divided corresponding to each section in which the text sequence of the second language is divided.

Accordingly, in operation 650 , the second electronic device 2000 encodes each section into which the text sequence of the first language is divided based on the identified correspondence relationship, and decodes the encoded result, so that the text of the second language The artificial intelligence model can be trained so that each section in which the sequence is divided can be output. According to an embodiment, the result encoded by the encoder may be context information according to an embodiment.

7 is a block diagram illustrating an internal configuration of the second electronic device 2000 according to an exemplary embodiment.

8 is a block diagram illustrating an internal configuration of the second electronic device 2000 according to an exemplary embodiment.

Referring to FIG. 7 , the second electronic device 2000 may include a processor 2300 and a memory 2700 . However, not all of the components shown in FIG. 7 are essential components of the second electronic device 2000 . The second electronic device 2000 may be implemented with more components than those shown in FIG. 7 , or the second electronic device 2000 may be implemented with fewer components than those shown in FIG. 7 . have.

For example, as shown in FIG. 8 , the second electronic device 2000 includes a user input unit 2100 in addition to the processor 2300 and the memory 2700, It may further include a sensing unit 2400 , an A/V input unit 2600 , a communication unit 2500 , and an output unit 1200 .

The user input unit 2100 means a means for a user to input data for controlling the second electronic device 2000 . For example, the user input unit 2100 includes a key pad, a dome switch, and a touch pad (contact capacitive method, pressure resistance film method, infrared sensing method, surface ultrasonic conduction method, integral type). There may be a tension measurement method, a piezo effect method, etc.), a jog wheel, a jog switch, and the like, but is not limited thereto.

According to an embodiment, the user input unit 2100 may receive a user input necessary for learning an artificial intelligence model for translating a text sequence.

The output unit 2200 may output an audio signal, a video signal, or a vibration signal, and the output unit 2200 may include a display unit 2210 , a sound output unit 2220 , and a vibration motor 2230 . have.

The display unit 2210 displays and outputs information processed by the second electronic device 2000 . According to an embodiment, the display unit 2210 may output information related to a result of learning an artificial intelligence model for translating a text sequence.

Meanwhile, when the display unit 2210 and the touchpad form a layer structure to form a touch screen, the display unit 2210 may be used as an input device in addition to an output device. The display unit 2210 includes a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, a three-dimensional display ( 3D display) and electrophoretic display (electrophoretic display) may include at least one. In addition, depending on the implementation form of the second electronic device 2000 , the second electronic device 2000 may include two or more display units 2210 .

The sound output unit 2220 outputs audio data received from the communication unit 2500 or stored in the memory 2700 . The vibration motor 2230 may output a vibration signal. Also, the vibration motor 2230 may output a vibration signal when a touch is input to the touch screen. According to an embodiment, the sound output unit 2220 and the vibration motor 2230 may output information related to a result of learning an artificial intelligence model for translating a text sequence.

The processor 2300 generally controls the overall operation of the second electronic device 2000 . For example, the processor 2300 executes programs stored in the memory 2700 , and thus the user input unit 2100 , the output unit 2200 , the sensing unit 2400 , the communication unit 2500 , and the A/V input unit 2600 . ) can be controlled in general.

The second electronic device 2000 may include at least one processor 2300 . For example, the second electronic device 2000 may include various types of processors, such as a central processing unit (CPU), a graphics processing unit (GPU), and a neural processing unit (NPU).

The processor 2300 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The command may be provided to the processor 2300 from the memory 2700 , or may be received through the communication unit 2500 and provided to the processor 2300 . For example, the processor 2300 may be configured to execute instructions according to program codes stored in a recording device such as a memory.

The processor 2300 according to an embodiment may learn an artificial intelligence model used in an encoder and a decoder for translating a text sequence. The processor 2300 according to an embodiment may generate a text sequence of the second language as a result of translation of the text sequence of the first language based on the text sequence of the first language and the corresponding text sequence of the second language. It is possible to learn artificial intelligence models (ex. LSTM, RNN) used in encoders and decoders so that they can be output.

The text sequences of the first language and the second language according to an embodiment may each include one complete sentence.

The processor 2300 according to an embodiment may divide the text sequence of the second language and insert the second token at the divided position. The second token according to an embodiment may then be inserted into the text group of the second language according to the possibility that the second text group is changed to another text by the encoded at least one text group of the first language. is a token For example, according to the sentence format or context, according to the subsequently encoded result, it is considered that the probability that the currently decoded result is significantly changed to another text is relatively low, and the second A token can be inserted.

According to an embodiment, by determining whether each word included in the text sequence belongs to the same sentence form such as a subject, a verb, an object, and a complement, a text group according to a context may be divided. Also, a second token may be inserted between two words determined to be in different contexts.

The context according to an embodiment is not limited to the above-described sentence format, and may be determined according to various criteria for dividing a plurality of words into a plurality of groups.

According to an embodiment, when the text sequence of the second language is "Look, the weather is nice", "Look" belongs to a verb, and "Weather" belongs to a subject, so "Look" and "Weather" A second token, indicating that the text group is divided due to the difference in the sentence format of the preceding and following words, may be inserted between "a". In addition, as “good” is determined as a bore, a token may be inserted between “weather” and “good”. Also, after “good”, a first token indicating that the sentence is finished may be additionally inserted.

The processor 2300 according to an embodiment may identify each section in which the text sequence of the first language is divided corresponding to each section in which the text sequence of the second language is divided. Accordingly, the processor 2300 is configured to decode the encoded result and the artificial intelligence model used to encode each section in which the text sequence of the first language is divided, based on the identified correspondence relationship, so that the text sequence of the second language is decoded. It is possible to learn an artificial intelligence model used to output texts for each section in which is divided. According to an embodiment, the encoded result may be context information according to an embodiment.

Accordingly, according to an embodiment, as the decoded results, respectively, based on the first context information, the second context information, and the third context information shown in FIG. 2 , “look” 223 and “weather” ( 254), and an artificial intelligence model used for encoding and decoding can be trained so that “like” 284 can be output.

The sensing unit 2400 may detect a state of the second electronic device 2000 or a state around the second electronic device 2000 , and transmit the sensed information to the processor 2300 .

The sensing unit 2400 includes a geomagnetic sensor 2410 , an acceleration sensor 2420 , a temperature/humidity sensor 2430 , an infrared sensor 2440 , a gyroscope sensor 2450 , and a position sensor. (eg, GPS) 2460 , a barometric pressure sensor 2470 , a proximity sensor 2480 , and at least one of an illuminance sensor 2490 , but is not limited thereto.

The communication unit 2500 may include one or more components that allow the second electronic device 2000 to communicate with the server 2000 or an external device (not shown). For example, the communication unit 2500 may include a short-range communication unit 2510 , a mobile communication unit 2520 , and a broadcast receiving unit 2530 .

Short-range wireless communication unit 2510, Bluetooth communication unit, BLE (Bluetooth Low Energy) communication unit, short-range wireless communication unit (Near Field Communication unit), WLAN (Wi-Fi) communication unit, Zigbee (Zigbee) communication unit, infrared ( It may include an IrDA, infrared Data Association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, an Ant+ communication unit, and the like, but is not limited thereto.

The mobile communication unit 2520 transmits/receives a radio signal to and from at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of data according to transmission/reception of a voice call signal, a video call signal, or a text/multimedia message.

The broadcast receiver 2530 receives a broadcast signal and/or broadcast-related information from the outside through a broadcast channel. The broadcast channel may include a satellite channel and a terrestrial channel. According to an implementation example, the second electronic device 2000 may not include the broadcast receiver 2530 .

According to an embodiment, the communication unit 2500 may transmit/receive data required for learning an artificial intelligence model for translating a text sequence.

The A/V (Audio/Video) input unit 2600 is for inputting an audio signal or a video signal, and may include a camera 2610 , a microphone 2620 , and the like. The camera 2610 may obtain an image frame such as a still image or a moving image through an image sensor in a video call mode or a photographing mode. The image captured through the image sensor may be processed through the processor 2300 or a separate image processing unit (not shown).

The microphone 2620 receives an external sound signal and processes it as electrical voice data. For example, the microphone 2620 may receive a user's voice signal for conducting a call.

The memory 2700 may store a program for processing and control of the processor 2300 , and may also store data input to or output from the second electronic device 2000 .

The memory 2700 according to an embodiment may store the artificial intelligence model learned by the processor 2300 in order to translate a text sequence.

The memory 2700 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , may include at least one type of storage medium among optical disks.

Programs stored in the memory 2700 may be classified into a plurality of modules according to their functions, for example, may be classified into a UI module 2710 , a touch screen module 2720 , a notification module 2730 , and the like. .

The UI module 2710 may provide a specialized UI, GUI, or the like that interworks with the second electronic device 2000 for each application. The touch screen module 2720 may detect a touch gesture on the user's touch screen and transmit information about the touch gesture to the processor 2300 . The touch screen module 2720 according to some embodiments may recognize and analyze a touch code. The touch screen module 2720 may be configured as separate hardware including a controller.

The notification module 2730 may generate a signal for notifying the occurrence of an event of the second electronic device 2000 .

According to an embodiment, a translation result for a text sequence that is sequentially obtained may be quickly output.

The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' is a tangible device and only means that it does not contain a signal (eg, electromagnetic wave). It does not distinguish the case where it is stored as For example, the 'non-transitory storage medium' may include a buffer in which data is temporarily stored.

According to one embodiment, the method according to various embodiments disclosed in this document may be provided as included in a computer program product. Computer program products may be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)), or through an application store (eg Play Store™) or on two user devices (eg, It can be distributed (eg downloaded or uploaded) directly, online between smartphones (eg: smartphones). In the case of online distribution, at least a portion of a computer program product (eg, a downloadable app) is stored at least in a machine-readable storage medium, such as a memory of a manufacturer's server, a server of an application store, or a relay server. It may be temporarily stored or temporarily created.

Also, in this specification, “unit” may be a hardware component such as a processor or circuit, and/or a software component executed by a hardware component such as a processor.

The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

A method for translating a text sequence in a first electronic device, the method comprising:

First context information corresponding to the first text group by encoding a first text group that does not include a first token indicating the end of the text sequence among at least one text group included in the text sequence of the first language obtaining a;

obtaining a second text group of a second language corresponding to the first text group by decoding the first context information;

detecting a second token in the second text group; and

outputting the second text group as a translation result for the first text group as the second token is detected.
The method of claim 1, wherein the second token is

According to the possibility that the second text group is changed to another text by at least one text group of the first language encoded after the first text group, a token inserted into the text group of the second language and learned In, way.
According to claim 1,

In the second group of texts, if the second token is not detected, by encoding a third group of texts, including the first group of texts and at least one text in the following order, in the text sequence of the first language; , obtaining third context information corresponding to the third text group;

obtaining a fourth text group of a second language corresponding to the third text group by decoding the third context information; and

translating a text sequence in the first language based on the fourth text group.
According to claim 1,

after outputting the second text group, translating the text sequence in the first language by sequentially encoding the at least one text group excluding the first text group.
A method for learning an artificial intelligence model for translating a text sequence in a second electronic device, the method comprising:

obtaining a text sequence of a first language and a text sequence of a second language corresponding to the text sequence of the first language;

segmenting the text sequence of the second language and inserting a second token at the segmented position;

identifying each section in which the text sequence of the first language is divided corresponding to each section in which the text sequence of the second language is divided; and

Encoding each section in which the text sequence of the first language is divided based on the identified correspondence relationship, and decoding the encoded result so that each section in which the text sequence of the second language is divided can be output , the method comprising the step of learning the artificial intelligence model.
The method of claim 5, wherein each divided section of the text sequence of the second language comprises:

at the end of each interval, the second token indicating that the text sequence is split, or a first token indicating that the text sequence is ending.
The method of claim 5, wherein the learning step

Learning the artificial intelligence model so that each section in which the text sequence of the first language is divided is output as a result of decoding each of the context information output as a result of encoding the text sequence of the second language A method comprising the step of
A first electronic device for translating a text sequence, comprising:

a memory for storing data necessary to translate the text sequence;

First context information corresponding to the first text group by encoding a first text group that does not include a first token indicating the end of the text sequence among at least one text group included in the text sequence of the first language at least one method for obtaining a second text group of a second language corresponding to the first text group and detecting a second token in the second text group by obtaining processor; and

and an output unit configured to output the second text group as a translation result for the first text group when the second token is detected.
The method of claim 8, wherein the second token is

by at least one text group of the first language encoded after the first text group, according to the possibility that the second text group is changed to another text, inserted into the text group of the second language and learned, A first electronic device.
9. The method of claim 8, wherein the at least one processor comprises:

In the second group of texts, if the second token is not detected, by encoding a third group of texts, including the first group of texts and at least one text in the following order, in the text sequence of the first language; , obtains third context information corresponding to the third text group,

by decoding the third context information, obtain a fourth text group of a second language corresponding to the third text group;

Translate the text sequence of the first language based on the fourth text group.
9. The method of claim 8, wherein the at least one processor comprises:

After outputting the second text group, the first electronic device is configured to translate the text sequence of the first language by sequentially encoding the at least one text group except for the first text group.
In a second electronic device for learning an artificial intelligence model for translating a text sequence,

obtaining a text sequence of a first language and a text sequence of a second language corresponding to the text sequence of the first language, dividing the text sequence of the second language, and inserting a second token at the divided position; Identifies each section in which the text sequence of the first language is divided corresponding to each section in which the text sequence of the second language is divided, and based on the identified correspondence, the text sequence of the first language is divided at least one processor for learning the artificial intelligence model so that each section in which the text sequence of the second language is divided can be output by encoding each section and decoding the encoded result; and

A second electronic device comprising a memory for storing the learned artificial intelligence model.
The method of claim 12, wherein each divided section of the text sequence of the second language comprises:

At the end of each section, the second electronic device comprising the second token indicating that the text sequence is split or the first token indicating that the text sequence is ended.
13. The method of claim 12, wherein the at least one processor comprises:

Learning the artificial intelligence model so that each section in which the text sequence of the first language is divided is output as a result of the decoding of respective context information output as a result of encoding each section in which the text sequence of the second language is divided a second electronic device.
A computer-readable recording medium in which a program for implementing the method of any one of claims 1 to 7 is recorded.