CN112084295A

CN112084295A - Cross-language task training

Info

Publication number: CN112084295A
Application number: CN201910447514.4A
Authority: CN
Inventors: 梁耀波; 段楠; 公明; 寿林钧; 姜大昕; 周明
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2020-12-15
Also published as: WO2020242567A1

Abstract

In an embodiment of the disclosure, a cross-language task training method is provided. After obtaining a training sentence in one language, a word in the training sentence is masked, and then a corresponding word in another language corresponding to the masked word is obtained. Next, the cross-language model is pre-trained using the mask sentences of one language and the corresponding words of another language. The pre-trained cross-language model has multi-language understanding and processing capability, and can be further trained according to specific tasks. According to the embodiment of the disclosure, in the case that a large amount of training corpora exist in one language and little or no training corpora exist in another language, the cross-language model can be pre-trained in a cross-language training manner, so that the cross-language model can be suitable for another language, thereby realizing model training for various languages.

Description

Cross-language task training

Background

Natural language processing refers to a technique of processing a human natural language using a computer, and natural language processing enables the computer to understand the human language. The computer is trained through the artificially labeled corpus to generate semantic representation of natural language. Natural language processing is a popular direction in the field of artificial intelligence, and can be applied to semantic analysis, information retrieval, machine translation, automatic question answering, and chat robots, among others.

A language model is a probability distribution of a string of word sequences that is the basis of natural language processing techniques. In general, language models can be constructed through training of large corpora. For example, the neural network-based language model may use a three-layer feed-forward neural network that optimizes parameters in the neural network-based language model based on back propagation techniques during training. The trained language model is able to learn sentences or relationships between words and can thus be used to predict the next word or sentence, etc.

Disclosure of Invention

In an embodiment of the disclosure, a cross-language task training method is provided. After obtaining a training sentence in one language, a word in the training sentence is masked, and a corresponding word in another language corresponding to the masked word is obtained. The cross-language model is then pre-trained using the mask sentences of one language and the corresponding words of another language. The pre-trained cross-language model has multi-language understanding and processing capability, and can be further trained according to specific tasks. According to the embodiment of the disclosure, in the case that a large amount of training corpora exist in one language and little or no training corpora exist in another language, the cross-language model can be pre-trained in a cross-language training manner, so that the cross-language model can be suitable for the other language, thereby realizing model training of various languages.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the disclosure, nor is it intended to be used to limit the scope of the disclosure.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 illustrates a block diagram of a computing device/server in which one or more embodiments of the present disclosure may be implemented;

FIG. 2 illustrates a flow diagram of a method for pre-training a cross-language model in which the present disclosure may be implemented;

FIG. 3 illustrates a schematic diagram of a process for training a cross-language model for a particular task in which the present disclosure may be implemented;

FIG. 4 shows a schematic diagram of one example for pre-training a cross-language model, in accordance with an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of another example for pre-training a cross-language model, in accordance with an embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of yet another example for pre-training a cross-language model, in accordance with an embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of an example for further training a cross-language model, in accordance with an embodiment of the present disclosure;

FIG. 8 shows a schematic diagram of an example for fine-tuning a cross-language model, according to an embodiment of the present disclosure; and

fig. 9 illustrates a diagram of an example of providing question answering in a search engine according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

Conventionally, in order to train a language model of a certain language, a large amount of the language training corpus is usually required to complete the training of the language model. However, for some tasks, not every language has a large number of corpora. Since there are only few or no corpora in many languages, a task-specific language model for many languages cannot be trained at all.

Therefore, the embodiment of the disclosure provides a cross-language task training method. According to the embodiment of the disclosure, under the condition that a training corpus exists in one language and a little or no training corpus exists in another language, the cross-language model can be pre-trained in a cross-language training mode, so that the cross-language model can be suitable for the other language, and therefore model training of various languages is achieved. Therefore, according to the cross-language training mode provided by the embodiment of the disclosure, the multi-language model suitable for multiple languages can be trained on the premise that the training corpus of the multiple languages is not needed.

The basic principles and several example implementations of the present disclosure are explained below with reference to fig. 1-9. Fig. 1 illustrates a block diagram of a computing device/server 100 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the computing device/server 100 illustrated in FIG. 1 is merely exemplary and should not constitute any limitation as to the functionality or scope of the embodiments described herein.

As shown in fig. 1, computing device/server 100 is in the form of a general purpose computing device. Components of computing device/server 100 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160. The processing unit 110 may be a real or virtual processor and can perform various processes according to programs stored in the memory 120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the computing device/server 100.

Computing device/server 100 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing device/server 100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 120 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage 130 may be a removable or non-removable medium and may include a machine-readable medium, such as a flash drive, a magnetic disk, or any other medium that may be capable of being used to store information and/or data (e.g., training data for training) and that may be accessed within computing device/server 100.

The computing device/server 100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 1, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. Memory 120 may include a computer program product 125 having one or more program modules configured to perform the various methods or acts of the various embodiments of the disclosure.

The communication unit 140 enables communication with other computing devices over a communication medium. Additionally, the functionality of the components of the computing device/server 100 may be implemented in a single computing cluster or multiple computing machines, which are capable of communicating over a communications connection. Thus, the computing device/server 100 may operate in a networked environment using logical connections to one or more other servers, network Personal Computers (PCs), or another network node.

The input device 150 may be one or more input devices such as a mouse, keyboard, trackball, or the like. Output device 160 may be one or more output devices such as a display, speakers, printer, or the like. Computing device/server 100 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, through communication unit 140, with one or more devices that enable a user to interact with computing device/server 100, or with any device (e.g., network card, modem, etc.) that enables computing device/server 100 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).

As shown in FIG. 1, a training corpus for training a cross-language model may exist in storage 130. Program product 125 may implement pre-training and fine-tuning across language models from a training corpus stored in storage 130. The cross-language model trained according to the embodiments of the present disclosure can be used for multiple tasks of multiple languages, for example, a chinese question-answer model can be trained with only a few chinese question-answer corpora. It should be understood that although english is used as the first language and chinese is used as the second language in the following embodiments, the second language may be other languages such as french, german, japanese, and the like. Further, the first language is a language having a large corpus of training words, which may be other languages other than english.

FIG. 2 illustrates a flow diagram of a method 200 for pre-training a cross-language model in which the present disclosure may be implemented. It should be understood that the method 200 may be performed by the computing device/server 100 described with reference to fig. 1. For ease and clarity in explaining the method 200 of FIG. 2, reference is also made herein to an example 400 for pre-training a cross-language model shown in FIG. 4.

At 202, a first mask sentence in a first language is generated based on a mask for a first word in the first sentence in the first language. In some embodiments, the first language may be, for example, english, and referring to fig. 4, the first sentence may be an english sentence "This is an example", and the english sentence may serve as a corpus sentence, which may be from an encyclopedia of english, an english book corpus, or the like. With continued reference to fig. 4, the english word "example" in the english sentence is masked to become "[ MASK ]", thereby obtaining an english MASK sentence 420 "This is an [ MASK ]", in which the meaning of the MASK indicates occlusion, masking, and the like. In the example of fig. 4, it is assumed that there is a large amount of training data in english (the first language) and little to no training data in chinese (the second language).

At 204, a second word in a second language corresponding to the first word is determined, wherein the first language and the second language are different languages. In the example of FIG. 4, the second language may be, for example, Chinese, with Chinese corresponding to the masked English word "example" being the word 450 "example". In some embodiments, the Chinese terms 450 may be terms that have been manually labeled. Alternatively, the word 450 may also be obtained by translation of the masked English word "example" by the translation system. It is to be appreciated that various translation systems, platforms, programs, interfaces, etc., either currently existing or developed in the future can be utilized in conjunction with embodiments of the present disclosure.

At 206, the cross-language encoder is pre-trained using the first mask sentence in the first language and the second words in the second language, wherein the cross-language encoder is to be further trained based on the particular task. As shown in FIG. 4, the cross-language model may be pre-trained using English MASK sentence 420 "This is an [ MASK ]" as an input to cross-language model 410, and Chinese word 450 "examples" as an output of cross-language model 410, where the cross-language model includes a cross-language encoder, or a cross-language encoder and decoder. Therefore, a cross-language model trained according to embodiments of the present disclosure can have multi-language understanding and processing capabilities.

It should be understood that in embodiments of the present disclosure, the term "pre-training" means that there are at least two phases of the training process. In the pre-training process of the first stage, training a cross-language model by using general training data to enable the cross-language model to have multi-language understanding and processing capacity; during the second stage of further training and/or fine-tuning (Finetune), the cross-language model is further trained for a specific task, so that the trained cross-language model can process the specific task, for example, a question-answering task, a natural language reasoning task, and the like can be processed.

FIG. 3 illustrates a schematic diagram of a cross-language model process 300 for training a task specific in which the present disclosure may be implemented. As shown in FIG. 3, to train out a cross-language model 340 for a particular task (e.g., a question and answer task), a training process of 3 stages 310-330 may be performed. Any known or future developed machine learning and/or neural network techniques may be used in conjunction with embodiments of the present disclosure to implement cross-language model 340. For example, the cross-language model 340 may be implemented by a Transformer model, a Recurrent Neural Network (RNN), or a Long Short Term Memory (LSTM) network.

In the cross-language pre-training phase 310, pre-training may be performed based on the task-independent generic training data 305, training the cross-language model together with at least two languages, such that the cross-language model has the understanding and processing capabilities of the at least two languages. Several different example training patterns for the cross-language pre-training phase 310 are illustrated below with reference to FIGS. 4-6.

In the cross-language task-related training stage 320, cross-language training may be further performed based on task-related training data 325, such that the trained cross-language model has cross-language task processing capabilities, wherein the task-related training data 325 may include, for example, a question-and-answer corpus of a certain language. An example training pattern for the cross-language task related training phase 320 is shown below with reference to FIG. 7.

During the monolingual task-related hinting phase 330, monolingual hinting may be performed based on the task-related training data 325 such that the trained cross-language model 340 for a particular task is provided with task processing capabilities in the target language. Through further fine tuning of the single language, the task processing capability of the trained model for the target language can be further improved. For example, if the target language is Chinese, the cross-language model can be fine-tuned based on Chinese questions and Chinese answers. An example training pattern for the monolingual task-related fine-tuning phase 330 is shown below with reference to FIG. 8.

FIG. 4 illustrates a schematic diagram of one example 400 for pre-training a cross-language model, in accordance with embodiments of the present disclosure. FIG. 4 illustrates a cross-language word alignment prediction process, which MASKs the English word "example" in the English sentence "This is an example" to generate an English MASK sentence 420 "This is an [ MASK ]". Then, the english mask sentence 420 is embedded as a token (token), the position information 430 of the respective words is embedded as a position, and the language information 440 is embedded as a language, which are input together to the cross-language model 410, and the pre-training for the cross-language model 410 is implemented using the chinese word 450 "example" as an output.

In some embodiments, the cross-language model 410 may have an attention-based encoder-decoder architecture. As shown in FIG. 4, the cross-language model 410 may be trained using a large number of pairs of English mask sentences and corresponding Chinese words, such that the cross-language model has both English and Chinese understanding and processing capabilities. Additionally, the cross-language model 410 of FIG. 4 may be a bi-directional encoder representation. In some embodiments, if subsequent model tasks are related to French, the cross-language model may be pre-trained using French as output 450.

FIG. 5 shows a schematic diagram of another example 500 for pre-training a cross-language model, in accordance with an embodiment of the present disclosure. The cross-language paraphrase classification process is shown in FIG. 5, with the model input being a combination of the English sentence "This is an example" and the Chinese sentence "This is an example", and the output being a flag indicating whether the input English and Chinese sentences have the same meaning, e.g., "true" or "false". As shown in fig. 5, a combination 520 of the english sentence "This is an example" and the chinese sentence "This is an example" may be embedded as a mark, position information 530 of each word in the corresponding sentence as a position, and language information 540 as a language are embedded, input together to the cross-language model 410, and a flag 550 (e.g., "true") is used as an output to implement pre-training for the cross-language model 410. It should be appreciated that the pre-training method of example 500 differs from the training method of conventional translation systems because, in example 500, sentences in both languages are input to the cross-language model as model inputs rather than generating outputs in one language based on inputs in the other language.

FIG. 6 shows a schematic diagram of yet another example 600 for pre-training a cross-language model, in accordance with an embodiment of the present disclosure. The next sentence prediction process across languages is shown in fig. 6, with the model input being a combination of the english sentence "This is an example" and the chinese sentence "tomorrow went to new york", and the output being a flag indicating whether the following sentence is semantically the next sentence of the previous sentence, e.g., "true" or "false". As shown in fig. 6, a combination 620 of the english sentence "This is an example" and the chinese sentence "tomorrow went to new york" is embedded as a token (token), position information 630 of each word in the corresponding sentence is embedded as a position, and language information 640 is embedded as a language, which are input together to the cross-language model 410, and a flag 650 (e.g., "true") is used as an output to implement pre-training for the cross-language model 410. It should be appreciated that although only one example of each training pattern is shown in fig. 4-6, in practice a large amount of training data in such training patterns is required for training for each training pattern.

FIG. 7 shows a schematic diagram of an example 700 for further training a cross-language model, in accordance with an embodiment of the present disclosure. As shown in FIG. 7, after pre-training of the cross-language model is completed, further training related to the task may be performed according to the specific task. For example, FIG. 7 shows an example 700 for training a question-answering task, where a question-answer pair, such as The English question "any is The sky blue" and The English answer "The model of The Earth satellites blue most likely is obtained from The English question corpus since there is only a large amount of English question-answering corpus and not enough Chinese question-answering corpus. The English answer "The atmosphere of The Earth scatter blue most" can then be automatically translated into The Chinese answer "Earth atmospheric scattering blue most" using a translation system. As shown in FIG. 7, the English question "Why is the sky blue" and the Chinese answer "Earth's atmosphere scatters blue light at most" may be used as inputs 720, and input embedding 730 of each token is then generated across language models 410, and a contextual representation 740 of each token is generated. In example 700, output 750 is a plausibility flag indicating whether the English question and the Chinese answer match.

In some embodiments, the English questions may also be translated into Chinese questions, while leaving the English answers unchanged, and then the cross-language model 410 is trained using the Chinese questions and English answers for question answering in the Chinese language.

FIG. 8 shows a schematic diagram of an example 800 for fine-tuning a cross-language model, according to an embodiment of the present disclosure. As shown in FIG. 8, after the task-related further training of the cross-language model in FIG. 7 is completed, task-related fine-tuning may be performed according to the specific task. For example, FIG. 8 shows an example 800 for training a question-and-answer task, since there is only a large amount of English question-and-answer corpora and not enough Chinese corpora, a question-and-answer pair, such as The English question "what is The sky blue" and The English answer "The model of The Earth satellites blue light most", is first obtained from The English question-and-answer corpus. Then, the english question and english answer can be automatically translated into the chinese question "why sky is blue" and chinese answer "earth atmosphere scatters blue light most" respectively using a translation system and using the chinese question and chinese answer as inputs 820, and then generating input embedding 830 of each token across the language model 410, regenerating a context representation 840 of each token. In the example 700, the output 850 is a true-false flag indicating whether the Chinese question and the Chinese answer match. That is, in the fine-tuning phase, the question in the target language and the answer in the target language may be used to fine-tune the cross-language model for question answering.

FIG. 9 shows a diagram of an example 900 of providing questions and answers in a search engine according to an embodiment of the present disclosure. For example, a Chinese question-answer may be processed using the cross-language model trained above with reference to FIG. 8, after receiving a Chinese query 910 in a search engine, the cross-language model determines a Chinese answer 920, and the Chinese answer 920 is presented in a search results page of the search engine. In this way, the Chinese question-answer model can be trained and provided without a Chinese question-answer corpus. It should be understood that Chinese is only one example of a target language, and that other languages may be used in conjunction with embodiments of the present disclosure as a target language.

In some embodiments, a sentence pair may be obtained from the english natural language inference library, where the sentence pair includes an english antecedent sentence and an english hypothesis sentence, and the english antecedent sentence and the english hypothesis sentence are labeled with one of the following relationships: implications, contradictions and neutrality. For example, the prerequisite sentence "You don't have to stay there" and the assumption sentence "You can leave" are implicit relations. Alternatively, english hypothesized sentences may be translated from english language to chinese and cross-language models trained using english preconditions, chinese hypothesized sentences, and sentence relationships for natural language reasoning in chinese language. Alternatively, the english precondition sentence can also be translated from english language to chinese, for example, the english precondition sentence "You do't have to stay there" is translated to chinese "You do not need to stay there", and the cross-language model is trained using the chinese precondition sentence "You do not need to stay there", the english hypothesis sentence "You can leave", and sentence relations (implications) for natural language reasoning of chinese language.

Therefore, according to the embodiment of the disclosure, in the case that a training corpus exists in one language and a little or no training corpus exists in another language, the cross-language model can be pre-trained in a cross-language training manner, so that the cross-language model can be suitable for another language, thereby realizing model training of various languages.

The methods and functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Some example implementations of the present disclosure are listed below.

In one aspect, a computer-implemented method is provided. The method comprises the following steps: generating a first mask sentence in a first language based on a mask for a first word in the first sentence in the first language; determining a second word in a second language corresponding to the first word, wherein the first language is different from the second language; and pre-training a cross-language encoder using the first mask sentence in the first language and the second words in the second language, the cross-language encoder to be further trained based on the particular task.

In some embodiments, wherein the pre-trained cross-lingual encoder comprises: determining a first mask sentence in a first language as a first input; determining a second word in a second language as a first output; and pre-training the cross-lingual encoder using the first input and the first output.

In some embodiments, the method further comprises: obtaining a second sentence in a second language; combining a first sentence in a first language and a second sentence in a second language to obtain a second input; and pre-training the cross-language encoder using the second input and the second output, wherein the second output includes a flag indicating whether the first sentence and the second sentence have the same meaning.

In some embodiments, the method further comprises: obtaining a third sentence in the second language; combining the first sentence in the first language and the third sentence in the second language to obtain a third input; and pre-training the cross-language encoder using the third input and the third output, wherein the third output includes a flag indicating whether the third sentence is semantically a next sentence to the first sentence.

In some embodiments, the method further comprises: after pre-training of the cross-language encoder is completed, obtaining a first question-answer pair from a question-answer corpus of a first language, wherein the first question-answer pair comprises a first question and a first answer of the first language; and performing at least one of: translating the first answer from the first language to a second language, and further training a cross-language encoder using the first question in the first language and the first answer in the second language for question answering in the second language; and translating the first question from the first language to a second language, and further training the cross-language encoder using the first question in the second language and the first answer in the first language for question answering in the second language.

In some embodiments, the method further comprises: obtaining a second question-answer pair from the question-answer corpus, wherein the second question-answer pair comprises a second question and a second answer of the first language; translating the second question and the second answer from the first language to a second language, respectively; and fine-tuning the cross-language encoder using a second question in a second language and a second answer in the second language.

In some embodiments, the method further comprises: in response to receiving a user query in a second language in a search engine; determining an answer in a second language for the user query using a cross-language encoder; and presenting the answer in the second language in a search results page of the search engine.

In some embodiments, the method further comprises: obtaining a first sentence pair from a natural language inference library in a first language, wherein the first sentence pair comprises a first precondition sentence and a first hypothesis sentence in the first language, and the first precondition sentence and the first hypothesis sentence are labeled with one of the following relationships: inclusion, contradiction and neutrality; and performing at least one of: translating the first hypothetical sentence from the first language to the second language, and training a cross-language encoder using the first precondition sentence of the first language, the first hypothetical sentence of the second language, and the relationship for inference in the second language; and translating the first prerequisite sentence from the first language to the second language, and training the cross-language encoder using the first prerequisite sentence in the second language, the first hypothesized sentence in the first language, and the relationship for inference in the second language.

In some embodiments, the method further comprises: determining a third word in a third language corresponding to the first word; and pre-training a cross-language encoder using the first mask sentence in the first language and the third words in the third language for tasks associated with the third language.

In another aspect, an electronic device is provided. The electronic device includes: a processing unit; a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, perform acts comprising: generating a first mask sentence in a first language based on a mask for a first word in the first sentence in the first language; determining a second word in a second language corresponding to the first word, the first language being different from the second language; and pre-training a cross-language encoder using the first mask sentence in the first language and the second words in the second language, wherein the cross-language encoder is to be further trained based on the particular task.

In some embodiments, the actions further comprise: obtaining a second sentence in a second language; combining a first sentence in a first language and a second sentence in a second language to obtain a second input; and pre-training the cross-language encoder using the second input and the second output, wherein the second output includes a flag indicating whether the first sentence and the second sentence have the same meaning.

In some embodiments, the actions further comprise: obtaining a third sentence in the second language; combining the first sentence in the first language and the third sentence in the second language to obtain a third input; and pre-training the cross-language encoder using the third input and the third output, wherein the third output includes a flag indicating whether the third sentence is semantically a next sentence to the first sentence.

In some embodiments, the actions further comprise: after pre-training of the cross-language encoder is completed, obtaining a first question-answer pair from a question-answer corpus of a first language, wherein the first question-answer pair comprises a first question and a first answer of the first language; and performing at least one of: translating the first answer from the first language to a second language, and further training a cross-language encoder using the first question in the first language and the first answer in the second language for question answering in the second language; and translating the first question from the first language to a second language, and further training the cross-language encoder using the first question in the second language and the first answer in the first language for question answering in the second language.

In some embodiments, the actions further comprise: obtaining a second question-answer pair from the question-answer corpus, wherein the second question-answer pair comprises a second question and a second answer of the first language; translating the second question and the second answer from the first language to a second language, respectively; and fine-tuning the cross-language encoder using a second question in a second language and a second answer in the second language.

In some embodiments, the actions further comprise: in response to receiving a user query in a second language in a search engine; determining an answer in a second language for the user query using a cross-language encoder; and presenting the answer in the second language in a search results page of the search engine.

In some embodiments, the actions further comprise: obtaining a first sentence pair from a natural language inference library in a first language, the first sentence pair comprising a first prerequisite sentence and a first hypothesis sentence in the first language, wherein the first prerequisite sentence and the first hypothesis sentence are labeled with one of the following relationships: inclusion, contradiction and neutrality; and performing at least one of: translating the first hypothetical sentence from the first language to the second language, and training a cross-language encoder using the first precondition sentence of the first language, the first hypothetical sentence of the second language, and the relationship for inference in the second language; and translating the first prerequisite sentence from the first language to the second language, and training the cross-language encoder using the first prerequisite sentence in the second language, the first hypothesized sentence in the first language, and the relationship for inference in the second language.

In some embodiments, the actions further comprise: determining a third word in a third language corresponding to the first word; and pre-training a cross-language encoder using the first mask sentence in the first language and the third words in the third language for tasks associated with the third language.

In yet another aspect, a computer program product is provided. The computer program product is stored in a computer storage medium and includes machine executable instructions that, when executed in a device, cause the device to: generating a first mask sentence in a first language based on a mask for a first word in the first sentence in the first language; determining a second word in a second language corresponding to the first word, wherein the first language is different from the second language; and pre-training a cross-language encoder using the first mask sentence in the first language and the second words in the second language, the cross-language encoder to be further trained based on the particular task.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: obtaining a second sentence in a second language; combining a first sentence in a first language and a second sentence in a second language to obtain a second input; and pre-training the cross-language encoder using a second input and a second output, the second output including a flag indicating whether the first sentence and the second sentence have the same meaning.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: obtaining a third sentence in the second language; combining the first sentence in the first language and the third sentence in the second language to obtain a third input; and pre-training the cross-language encoder using a third input and a third output, the third output including a flag indicating whether the third sentence is semantically a next sentence to the first sentence.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: after pre-training of the cross-language encoder is completed, obtaining a first question-answer pair from a question-answer corpus of a first language, wherein the first question-answer pair comprises a first question and a first answer of the first language; and performing at least one of: translating the first answer from the first language to a second language, and further training a cross-language encoder using the first question in the first language and the first answer in the second language for question answering in the second language; and translating the first question from the first language to a second language, and further training the cross-language encoder using the first question in the second language and the first answer in the first language for question answering in the second language.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: obtaining a second question-answer pair from the question-answer corpus, wherein the second question-answer pair comprises a second question and a second answer in the first language; translating the second question and the second answer from the first language to a second language, respectively; and fine-tuning the cross-language encoder using a second question in a second language and a second answer in the second language.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: in response to receiving a user query in a second language in a search engine; determining an answer in a second language for the user query using a cross-language encoder; and presenting the answer in the second language in a search results page of the search engine.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: obtaining a first sentence pair from a natural language inference library in a first language, wherein the first sentence pair comprises a first precondition sentence and a first hypothesis sentence in the first language, and the first precondition sentence and the first hypothesis sentence are labeled with one of the following relationships: inclusion, contradiction and neutrality; and performing at least one of: translating the first hypothetical sentence from the first language to the second language, and training a cross-language encoder using the first precondition sentence of the first language, the first hypothetical sentence of the second language, and the relationship for inference in the second language; and translating the first prerequisite sentence from the first language to the second language, and training the cross-language encoder using the first prerequisite sentence in the second language, the first hypothesized sentence in the first language, and the relationship for inference in the second language.

In some embodiments, the machine executable instructions, when executed in the apparatus, further cause the apparatus to: determining a third word in a third language corresponding to the first word; and pre-training a cross-language encoder using the first mask sentence in the first language and the third words in the third language for tasks associated with the third language.

Although the present disclosure has been described in language specific to structural and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A computer-implemented method, comprising:

generating a first mask sentence in a first language based on a mask for a first word in the first sentence;

determining a second word in a second language corresponding to the first word, the first language being different from the second language; and

pre-training a cross-language encoder using the first mask sentence in the first language and second words in the second language, the cross-language encoder to be further trained based on a particular task.

2. The method of claim 1, wherein pre-training the cross-lingual encoder comprises:

determining the first mask sentence in the first language as a first input;

determining the second word in the second language as a first output; and

pre-training the cross-lingual encoder using the first input and the first output.

3. The method of claim 1, further comprising:

obtaining a second sentence in the second language;

combining the first sentence in the first language and the second sentence in the second language to obtain a second input; and

pre-training the cross-language encoder using the second input and a second output, the second output including a flag indicating whether the first sentence and the second sentence have the same meaning.

4. The method of claim 1, further comprising:

obtaining a third sentence in the second language;

combining the first sentence in the first language and the third sentence in the second language to obtain a third input; and

pre-training the cross-language encoder using the third input and a third output, the third output including a flag indicating whether the third sentence is semantically a next sentence to the first sentence.

5. The method of claim 1, further comprising:

after completing the pre-training of the cross-language encoder, obtaining a first question-answer pair from the question-answer corpus of the first language, the first question-answer pair comprising a first question and a first answer of the first language; and

performing at least one of:

translating the first answer from the first language to the second language and further training the cross-language encoder using a first question in the first language and a first answer in the second language for question answering in the second language, an

Translating the first question from the first language to the second language, and further training the cross-language encoder using the first question in the second language and the first answer in the first language for question answering in the second language.

6. The method of claim 5, further comprising:

obtaining a second question-answer pair from the question-answer corpus, wherein the second question-answer pair comprises a second question and a second answer in the first language;

translating the second question and the second answer from the first language to the second language, respectively; and

fine-tuning the cross-language encoder using the second question in the second language and the second answer in the second language.

7. The method of claim 6, further comprising:

in response to receiving a user query in the second language in a search engine;

determining, using the cross-language encoder, an answer in a second language for the user query; and

presenting the answer in the second language in a search results page of the search engine.

8. The method of claim 1, further comprising:

obtaining a first sentence pair from the natural language inference library in the first language, the first sentence pair comprising a first precondition sentence and a first hypothesis sentence in the first language, the first precondition sentence and the first hypothesis sentence being labeled with one of the following relationships between them: inclusion, contradiction and neutrality; and

performing at least one of:

translating the first hypothetical sentence from the first language to the second language, and training the cross-language encoder using the first hypothetical sentence of the first language, the first hypothetical sentence of the second language, and the relationship for inference in the second language, an

Translating the first prerequisite sentence from the first language to the second language, and training the cross-language encoder using the first prerequisite sentence of the second language, the first hypothesized sentence of the first language, and the relationship for inference in the second language.

9. The method of claim 1, further comprising:

determining a third word in a third language corresponding to the first word; and

pre-training a cross-language encoder using the first mask sentence in the first language and a third word in the third language for a task associated with the third language.

10. An electronic device, comprising:

a processing unit;

a memory coupled to the processing unit and storing instructions that, when executed by the processing unit, perform acts comprising:

11. The apparatus of claim 10, wherein the pre-trained cross-lingual encoder comprises:

determining the first mask sentence in the first language as a first input;

determining the second word in the second language as a first output; and

12. The apparatus of claim 10, the acts further comprising:

obtaining a second sentence in the second language;

13. The apparatus of claim 10, the acts further comprising:

obtaining a third sentence in the second language;

14. The apparatus of claim 10, the acts further comprising:

performing at least one of:

15. The apparatus of claim 14, the acts further comprising:

16. The apparatus of claim 15, the acts further comprising:

17. The apparatus of claim 10, the acts further comprising:

performing at least one of:

18. The apparatus of claim 10, the acts further comprising:

19. A computer program product stored in a computer storage medium and comprising machine executable instructions that when run in a device cause the device to:

20. The computer program product of claim 19, further comprising:

obtaining a second sentence in the second language;