CN111783435A

CN111783435A - Shared vocabulary selection method and device and storage medium

Info

Publication number: CN111783435A
Application number: CN201910204303.8A
Authority: CN
Inventors: 童毅轩; 张永伟; 董滨; 姜珊珊; 张佳师
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2020-10-16

Abstract

The invention provides a method and a device for selecting shared words and a storage medium. The method for selecting the shared vocabulary provided by the embodiment of the invention can select the shared vocabulary pair shared by the encoder and the decoder of the neural machine translation model, thereby reducing model parameters, reducing the training time of the subsequent neural machine translation model, and reducing the data volume required by the training of the neural machine translation model.

Description

Shared vocabulary selection method and device and storage medium

Technical Field

The present invention relates to the technical field of neural machine translation in Natural Language Processing (NLP), and in particular, to a method, an apparatus, and a storage medium for selecting a shared vocabulary.

Background

Neural Machine Translation (NMT) refers to a Machine Translation method that directly employs a Neural network to perform Translation modeling in an end-to-end manner. Different from a method for improving a certain module in the traditional statistical machine translation by utilizing a deep learning technology, the neural machine translation adopts a simple and visual method to complete the translation work: the source language sentence is first encoded into a dense vector using a neural network called an Encoder (Encoder), and the target language sentence is then decoded from the vector using a neural network called a Decoder (Decoder). The neural network model described above is generally referred to as an "Encoder-Decoder" (Encoder-Decoder) structure.

Currently, common NMT models include sequence-to-sequence (seq2seq) models, convolutional sequence-to-sequence (convS2S) models, and transformer models. The prior art usually has the problem of long training time when training a neural machine translation model. In addition, to improve the translation performance of the trained neural machine translation model, a large amount of training data is usually required to be relied on.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a method, an apparatus, and a storage medium for selecting a shared vocabulary, which can select a shared vocabulary shared by an encoder and a decoder of a neural machine translation model, thereby simplifying the subsequent training process of the neural machine translation model and improving the translation performance of the trained neural machine translation model.

To solve the above technical problem, a method for selecting a shared vocabulary according to an embodiment of the present invention includes:

selecting a plurality of candidate vocabulary pairs from a source vocabulary list and a target vocabulary list, wherein the source vocabulary list is a vocabulary list formed by source vocabularies at an encoder end of a neural machine translation model, and the target vocabulary list is a vocabulary list formed by target vocabularies at a decoder end of the neural machine translation model; each of the candidate vocabulary pairs comprises a candidate source vocabulary in the source vocabulary and a candidate target vocabulary in the target vocabulary;

respectively initializing a sharing tendency parameter for each candidate vocabulary pair, pre-training the neural machine translation model by using a source sentence and a target sentence, updating model parameters including the sharing tendency parameter to obtain a first neural machine translation model and model parameters thereof, wherein in the pre-training process, for candidate target vocabularies existing in the target sentence, weighting and summing decoder word vectors of the candidate target vocabularies and encoder word vectors of candidate source vocabularies corresponding to the candidate target vocabularies according to the sharing tendency parameter of the candidate vocabulary pairs, and inputting the candidate target vocabularies and the encoder word vectors into a decoder;

and selecting a shared vocabulary pair from the plurality of candidate vocabulary pairs according to the sharing tendency parameter of each candidate vocabulary pair obtained by pre-training.

Preferably, during the pre-training process:

for non-candidate target words existing in the target sentence, inputting decoder word vectors of the non-candidate target words into the decoder, wherein the non-candidate target words are the rest words except the candidate target words in the target vocabulary;

for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

Preferably, the step of selecting a plurality of candidate vocabulary pairs from the source vocabulary and the target vocabulary comprises:

selecting the same vocabulary existing in the source vocabulary and the target vocabulary as a candidate vocabulary pair;

or the like, or, alternatively,

selecting a first vocabulary from the source vocabulary, and selecting a second vocabulary from the target vocabulary, and combining the first vocabulary and the second vocabulary into a candidate vocabulary pair, wherein the first vocabulary and the second vocabulary have the same or similar meanings in a preset dictionary.

Preferably, the step of selecting a shared vocabulary pair from the plurality of candidate vocabulary pairs according to the pre-trained shared tendency parameter of each candidate vocabulary pair includes:

and selecting the candidate vocabulary pair with the sharing tendency parameter larger than a preset threshold value as the sharing vocabulary pair.

Preferably, the step of performing weighted summation on the decoder word vector of the candidate target word and the encoder word vector of the candidate source word corresponding to the candidate target word and inputting the result to the decoder according to the sharing tendency parameter of the candidate word pair of the candidate target word existing in the target sentence includes:

mapping the sharing tendency parameter of the candidate vocabulary pair to be a first weight with the value range between 0 and 1 according to a preset activation function;

according to the first weight, weighting the encoder word vector of the candidate source word corresponding to the candidate target word to obtain a first intermediate vector; weighting the decoder word vector of the candidate target vocabulary according to a second weight to obtain a second intermediate vector, wherein the second weight is negatively related to the first weight;

and calculating the vector sum of the first intermediate vector and the second intermediate vector to obtain a word vector of the candidate target vocabulary and inputting the word vector to the decoder.

Preferably, after selecting the shared vocabulary pair, the method further comprises:

updating an encoder word vector and a decoder word vector of the neural machine translation model, wherein for each vocabulary of the shared vocabulary pair in the source vocabulary and the target vocabulary, the same word vector is used;

and training the neural machine translation model according to the updated encoder word vector and decoder word vector to obtain a second neural machine translation model.

The embodiment of the present invention further provides a device for selecting a shared vocabulary, including:

the device comprises a first selection unit, a second selection unit and a third selection unit, wherein the first selection unit is used for selecting a plurality of candidate vocabulary pairs from a source vocabulary and a target vocabulary, the source vocabulary is a vocabulary formed by source vocabularies at an encoder end of a neural machine translation model, and the target vocabulary is a vocabulary formed by target vocabularies at a decoder end of the neural machine translation model; each of the candidate vocabulary pairs comprises a candidate source vocabulary in the source vocabulary and a candidate target vocabulary in the target vocabulary;

the first training unit is used for initializing a sharing tendency parameter for each candidate word pair respectively, pre-training the neural machine translation model by using a source sentence and a target sentence, updating model parameters including the sharing tendency parameter to obtain a first neural machine translation model and model parameters thereof, wherein in the pre-training process, for candidate target words existing in the target sentence, after weighted summation is carried out on a decoder word vector of the candidate target words and an encoder word vector of candidate source words corresponding to the candidate target words according to the sharing tendency parameter of the candidate word pair, the candidate target words are input to the decoder; the encoder word vector is a word vector obtained by pre-training vocabularies in a source word vocabulary; the decoder word vector is a word vector obtained by pre-training vocabularies in a target vocabulary;

and the second selection unit is used for selecting a shared vocabulary pair from the candidate vocabulary pairs according to the sharing tendency parameter of each candidate vocabulary pair obtained by pre-training.

Preferably, the first training unit is further configured to, in the pre-training process: for non-candidate target words existing in the target sentence, inputting decoder word vectors of the non-candidate target words into the decoder, wherein the non-candidate target words are the rest words except the candidate target words in the target vocabulary; for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

Preferably, the first selection unit includes:

the first selection subunit is used for selecting the same vocabulary existing in the source vocabulary and the target vocabulary as a candidate vocabulary pair;

or the like, or, alternatively,

and the second selection subunit is used for selecting a first vocabulary from the source vocabulary, selecting a second vocabulary from the target vocabulary, and combining the first vocabulary and the second vocabulary into a candidate vocabulary pair, wherein the meanings of the first vocabulary and the second vocabulary in a preset dictionary are the same or similar.

Preferably, the second selecting unit is further configured to select a candidate vocabulary pair with the sharing tendency parameter being greater than a preset threshold as the shared vocabulary pair.

Preferably, the first training unit includes:

the word vector calculation unit is used for mapping the sharing tendency parameter of the candidate vocabulary pair to be a first weight with the value range between 0 and 1 according to a preset activation function; according to the first weight, weighting the encoder word vector of the candidate source word corresponding to the candidate target word to obtain a first intermediate vector; weighting the decoder word vector of the candidate target vocabulary according to a second weight to obtain a second intermediate vector, wherein the second weight is negatively related to the first weight; and calculating the vector sum of the first intermediate vector and the second intermediate vector to obtain a word vector of the candidate target vocabulary and inputting the word vector to the decoder.

Preferably, the shared vocabulary selecting means further includes:

a word vector updating unit for updating an encoder word vector and a decoder word vector of the neural machine translation model, wherein for each shared vocabulary pair vocabulary in the source vocabulary and the target vocabulary, the same word vector is used;

and the second training unit is used for training the neural machine translation model according to the updated encoder word vector and decoder word vector to obtain a second neural machine translation model.

The embodiment of the present invention further provides a device for selecting a shared vocabulary, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of selecting a shared vocabulary as described above.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for selecting a shared vocabulary as described above are implemented.

Compared with the prior art, the shared vocabulary selecting method, the device and the storage medium provided by the embodiment of the invention select the shared vocabulary pair for the encoder and the decoder of the neural machine translation model, so that the model parameters can be reduced, the training time of the subsequent neural machine translation model can be reduced, and the data volume required for training the neural machine translation model can be reduced. In addition, the embodiment of the invention can also improve the generalization capability of the trained neural machine translation model and improve the translation performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.

FIG. 1 is a flowchart illustrating a method for selecting a shared vocabulary according to an embodiment of the present invention;

FIG. 2 is a diagram of an example of a neural machine translation model provided by an embodiment of the present invention;

FIG. 3 is a diagram of an exemplary vocabulary sharing layer according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for selecting a shared vocabulary according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an embodiment of a vocabulary selecting apparatus;

FIG. 6 is a schematic diagram of another structure of the apparatus for selecting shared words according to the embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

To assist in understanding embodiments of the present invention, a brief description of related concepts to which embodiments of the present invention may relate is provided.

1) Word, character, and vocabulary

A word is the smallest unit that can be used independently in a language, and refers to the position and role of a word in the syntactic structure. For example, in English, a word is generally referred to as a word (word), and may include one or more letters of the English alphabet. In addition, in the representation of english sentences, spaces or punctuation marks are usually present between words. In chinese, a word generally refers to a word, which may include one or more chinese characters. In a statement representation in chinese, there is typically no boundary between words.

Character: the characters in this document generally refer to the letters in English, Chinese characters, and various punctuation marks (e.g., periods, commas, etc.).

Vocabulary (vocabulary), which may also be referred to as vocabulary elements or subword elements (subwords), is a unit of textual representation that is interposed between characters and words. For example, for the english word "homework", it includes 8 characters, and may be split into 2 sub-word units, respectively "home" and "work", and may also be split into 3 sub-word units, respectively "home", "me", and "work". For the Chinese word "life detector", it includes 5 characters, and may be split into 2 sub-word units, respectively "life" and "detector", and may also be split into 3 sub-word units, respectively "life", "detection" and "instrument".

2) Source sentence, target sentence, and parallel speech

The parallel linguistic data are linguistic data required by training of a neural machine translation model. The parallel corpus generally includes a source sentence corpus and a target sentence corpus. The source sentence corpus comprises a plurality of source sentences in the source language, the target sentence corpus comprises a plurality of target sentences in the target language, each source sentence has a target sentence corresponding to the source sentence, and the source sentence corpus and the target sentence constitute a parallel corpus.

3) Neural machine translation model, encoder, decoder, encoder word vector and decoder word vector

The neural machine translation model generally includes a neural network called an Encoder (Encoder) and a neural network called a Decoder (Decoder). The encoder encodes a source language sentence (also referred to herein simply as a source sentence) into a dense vector, and then decodes a target language sentence (also referred to herein simply as a target sentence) from the vector using a decoder.

The encoder side uses a source vocabulary consisting of words in the source language (referred to herein as source words) and training the words in the source vocabulary results in an encoder word vector for use by the encoder side. Similarly, the decoder side uses a target vocabulary consisting of words in the target language (referred to herein as target words), and training the words in the target vocabulary results in a decoder word vector for use by the decoder side.

In the neural machine translation model, an independent word vector search layer is respectively arranged at an encoder end and a decoder end and is used for converting an ID sequence formed by vocabulary IDs of a source sentence or a target sentence into a corresponding word vector sequence. The word vector search layer at the encoder end needs to use the word vectors of the encoder to perform the word vector conversion; the word vector search layer at the decoder side needs to perform the word vector conversion by using the decoder word vector. That is, in the neural machine translation model, the vocabulary in the source sentence will be converted into the encoder word vector corresponding to the vocabulary through the word vector search layer at the encoder side, and the vocabulary in the target sentence will be converted into the decoder word vector corresponding to the vocabulary through the word vector search layer at the decoder side.

Referring to fig. 1, a flow diagram of a method for selecting a shared vocabulary according to an embodiment of the present invention is shown, where the method for selecting a shared vocabulary can select a shared vocabulary shared by an encoder and a decoder of a neural machine translation model, so as to simplify a subsequent training process of the neural machine translation model and improve the translation performance of the trained neural machine translation model. Specifically, the neural machine translation model is a sequence-to-sequence (seq2seq) model, a convolution sequence-to-sequence (convS2S) model or a transformer model, and of course, the embodiment of the present invention may also be applied to other types of neural machine translation models, which is not specifically limited in the present invention.

As shown in fig. 1, a method for selecting a shared vocabulary according to an embodiment of the present invention may include:

step 101, a plurality of candidate vocabulary pairs are selected from a source vocabulary and a target vocabulary.

In the above steps, the source vocabulary is a vocabulary formed by source vocabularies at an encoder end of the neural machine translation model, and the target vocabulary is a vocabulary formed by target vocabularies at a decoder end of the neural machine translation model; each of the candidate vocabulary pairs includes a source vocabulary in the source vocabulary (referred to herein as a candidate source vocabulary) and a target vocabulary in the target vocabulary (referred to herein as a candidate target vocabulary).

Generally, before training the neural machine translation model, vocabularies of a source language (e.g., japanese) and a target language (e.g., chinese) need to be obtained in advance. Here, the source vocabulary provides the vocabulary of the source language for use by the encoder side of the neural machine translation model; the target vocabulary provides the vocabulary of the target language for use by the decoder side of the neural machine translation model.

One way to obtain the source vocabulary and the target vocabulary is as follows:

raw corpora, including source and target languages, are obtained from various corpora and/or the internet. Then, performing data preprocessing on the original corpora to obtain the source vocabulary and the target vocabulary, specifically, the data preprocessing may include:

A) and (6) data cleaning.

Removing noise from the original corpus, the noise typically comprising: uniform Resource Locators (URLs), email addresses, and query types such as < "," > ", and "wait for symbols introduced by the web page; xml tags such as "< html >", "< title >" and "< body >" introduced by the webpage are removed, and only the text between the tags is reserved;

B) data partitioning

And (5) dividing the original text in the original corpus into sentences according to periods. Then, a sentence can be further divided into words by using a tool such as an open-source text data analysis tool kytea, and a word sequence constituting the sentence is obtained. Then, the word sequence can be split into elements (i.e. vocabulary) between words for use using a Byte Pair Encoding (BPE) algorithm tool or the like.

C) Data ID-ization

To simplify processing, the vocabulary may be assigned an ID at both the encoder and decoder sides. For example, from the vocabularies divided in the step B), according to the occurrence frequency of the vocabularies, a first number (e.g., 30000) of different vocabularies are selected as the vocabularies (i.e., source vocabulary) at the encoder end of the neural machine translation model, and a second number (e.g., 20000) of different vocabularies are selected as the vocabularies (i.e., target vocabulary) at the decoder end of the neural machine translation model.

At the encoder side, each vocabulary is assigned a unique Identification (ID). For example, 1 is assigned as an ID to the 1 st vocabulary of the 30000 vocabularies described above, 2 is assigned as an ID to the 2 nd vocabulary, and so on. For an unknown vocabulary that is not in the first number of vocabularies described above, 0 is assigned as its ID. Then, the divided words are replaced with the corresponding IDs.

Similarly, ID is assigned to 20000 vocabularies at decoder side in the same way, and 0 is assigned as its ID to unknown vocabularies that are not in the second number of vocabularies.

In the embodiment of the present invention, a shared vocabulary commonly used at the encoder and decoder sides needs to be selected from the source vocabulary and the target vocabulary, and in the above step 101, a plurality of candidate vocabulary pairs are first selected, each candidate vocabulary pair includes a vocabulary in the source vocabulary and a vocabulary in the target vocabulary, which form a vocabulary pair, and there is a corresponding relationship between the two vocabularies.

For example, for Japanese and Chinese, respectively, the source and target languages, there are a large number of words with the same appearance in Japanese and Chinese, and the meaning of the two is the same or close. For source and target languages english and chinese, respectively, there are a large number of symbols (symbols can also be used as a vocabulary) with the same shape, so that a candidate vocabulary pair can be selected by: and selecting the same vocabulary existing in the source vocabulary and the target vocabulary as a candidate vocabulary pair.

For another example, when the related dictionary provides meanings or corresponding relations of vocabularies in the source and target languages, a first vocabulary can be selected from the source vocabulary, and a second vocabulary can be selected from the target vocabulary, and the first vocabulary and the second vocabulary are combined into a candidate vocabulary pair, wherein the meanings of the first vocabulary and the second vocabulary in the preset dictionary are the same or similar.

Of course, the above examples are only a few of the possible alternatives of candidate vocabulary pairs that may be employed in the embodiments of the present invention. Any manner of selecting words that may have the same or similar meaning in the source and target languages may be applied to the embodiments of the present invention.

Step 102, respectively initializing a sharing tendency parameter for each candidate vocabulary pair, pre-training the neural machine translation model by using a source sentence and a target sentence, and updating model parameters including the sharing tendency parameter to obtain a first neural machine translation model and model parameters thereof.

Here, a sharing tendency parameter corresponding to each candidate vocabulary pair is initialized, and the parameter can be updated in the model training process. Specifically, it may be initialized to a floating point number of-1. The sharing tendency parameter is set for each candidate vocabulary pair, and the sharing tendency parameter of the candidate vocabulary pair is adopted by both vocabularies in the candidate vocabulary pair.

In the training of the conventional neural machine translation model, the word vector input of the encoder is the word vector searched in the encoder word vector by the vocabulary in the source sentence, and the word vector input of the decoder is the word vector searched in the decoder word vector by the vocabulary in the target sentence. In step 102, the word vector input at the decoder end is changed, specifically, in the pre-training process, for a candidate target word existing in the target sentence, according to the sharing tendency parameter of the candidate word pair, the decoder word vector of the candidate target word and the encoder word vector of the candidate source word corresponding to the candidate target word are subjected to weighted summation, and then input to the decoder. Here, the candidate source vocabulary corresponding to the candidate target vocabulary is a candidate source vocabulary in the candidate vocabulary pair to which the candidate target vocabulary belongs.

For example, assume that vocabulary 5 in the source vocabulary is a candidate vocabulary pair with vocabulary 9 in the target vocabulary. Then, in the model pre-training process in step 102, if there is a vocabulary 9 in the target sentence, the decoded target sentence is weighted with the decoder word vector of vocabulary 9 and the encoder word vector of vocabulary 5, and then the weighted target sentence is input to the decoder.

Here, an example of the above weighting process is provided in the embodiment of the present invention, specifically:

1) mapping the sharing tendency parameter of the candidate vocabulary pair to be a first weight with the value range between 0 and 1 according to a preset activation function;

2) according to the first weight, weighting the encoder word vector of the candidate source word corresponding to the candidate target word to obtain a first intermediate vector; weighting the decoder word vector of the candidate target vocabulary according to a second weight to obtain a second intermediate vector, wherein the second weight is negatively related to the first weight;

3) and calculating the vector sum of the first intermediate vector and the second intermediate vector to obtain a word vector of the candidate target vocabulary and inputting the word vector to the decoder.

In addition, in the pre-training process, for a non-candidate target word existing in the target sentence, a decoder word vector of the non-candidate target word can be directly input to the decoder, wherein the non-candidate target word is a residual word except the candidate target word in the target vocabulary; for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

In step 102, the pre-training process is performed, and all word vectors and the shared tendency parameter are updated in the pre-training process until a predetermined training end condition is met, such as a predetermined training number is reached or loss of training to a verification set does not decrease, so that the first neural machine translation model and model parameters thereof can be obtained, where the model parameters include the shared tendency parameter of each candidate word pair.

And 103, selecting a shared vocabulary pair from the candidate vocabulary pairs according to the sharing tendency parameter of each candidate vocabulary pair obtained by pre-training.

Here, a preset threshold may be set, and then a candidate vocabulary pair having the sharing tendency parameter larger than the preset threshold may be selected as the shared vocabulary pair. For example, a candidate vocabulary pair having a sharing tendency parameter larger than 0 is used as the shared vocabulary pair.

Through the steps, the embodiment of the invention selects the shared vocabulary pair which can be shared by the encoder and the decoder of the neural machine translation model, thereby reducing the model parameters, reducing the training time of the subsequent neural machine translation model and reducing the data volume required by the training of the neural machine translation model. In addition, after the shared vocabulary pair is adopted, the generalization capability of the trained neural machine translation model can be improved, and the translation performance of the neural machine translation model is improved.

FIG. 2 is a diagram of an example of a neural machine translation model according to an embodiment of the present invention, which is a sequence-to-sequence neural machine translation model with a vocabulary sharing layer, as shown in FIG. 2. In the neural machine translation model, an encoder is used for converting Japanese input into a sentence vector with a fixed length, and a decoder is used for decoding information in the sentence vector into Chinese sentences. Typically, both the encoder and decoder require a separate word vector search layer to convert the input sequence of vocabulary IDs into a corresponding sequence of word vectors. However, as shown in FIG. 2, embodiments of the present invention introduce a vocabulary sharing layer to generate the decoder-side word vectors. In the model pre-training process, the sharing tendency parameters of the candidate word pairs are learned in the word sharing layer, and then the decoder word vector or the encoder word vector is determined to be used, wherein the later situation corresponds to the word sharing situation.

In the neural machine translation model shown in fig. 2: the module 'LSTM' represents a long-short term memory unit used for modeling input sequence information; the model can use Adam algorithm to update model parameters in the pre-training process, and the training is continued until the loss of the verification set is not reduced.

It can be seen that, in the embodiment of the present invention, a vocabulary sharing layer is introduced at the decoder end of the existing neural machine translation model to replace the word vector search layer at the decoder end, and the word vector search layer is still retained at the encoder end. Fig. 3 further shows a specific example of the word vector sharing layer in fig. 2, which is used to perform the weighting process in step 102. It should be noted that the above example is only one weighting method that can be adopted by the embodiment of the present invention, and is not intended to limit the present invention.

In the vocabulary sharing layer shown in FIG. 3:

a) the operator "+" represents a vector addition operation. For n-dimensional vectors A and B, the resultant vector C of vector A plus vector B is also an n-dimensional vector, and the element C in vector C_iComprises the following steps:

C_i＝A_i+B_i,i∈(1,2,…,n) (1)

b) the operator "×" represents a vector multiplication operation for an n-dimensional vector A and a real number b, the product vector C of A and b is also an n-dimensional vector, and the element C in the vector C_iComprises the following steps:

C_i＝A_i×b,i∈(1,2,…,n) (2)

c) the operator "1-" corresponds to the mapping function f (x), as shown in equation 3:

f(x)＝1-x (3)

d) the operator "σ" corresponds to the activation function g (x), as shown in equation 4:

e) the word vector may be obtained by training the tool "word 2 vec" in advance, for example, the dimension of the word vector may be set to 512 during the training process. If the trained word vector does not contain a certain vocabulary of the active vocabulary or the target vocabulary, the vocabulary can be randomly initialized into a word vector, for example, a 512-dimensional vector;

f) for unknown words not contained in the source vocabulary/target vocabulary, the unknown words can be randomly initialized into a word vector;

g) all word vectors are updated together in the pre-training process;

h) taking the shared vocabulary of the encoder and the decoder as a shared candidate set; here, the common words may be the same words;

i) the sharing tendency parameter of the candidate vocabulary pair may be a floating point number with an initial value of-1. The vocabulary in each candidate vocabulary pair employs a sharing tendency parameter of the candidate vocabulary pair, which is a quantification of the respective vocabulary sharing tendency. The sharing tendency is updated in the model pre-training process;

j) the output of the vocabulary sharing layer is used as the word vector input from the sequence to the decoder end of the sequence model;

after obtaining the shared vocabulary pairs, embodiments of the present invention may further optimize the training of the target model based on the shared vocabulary pairs. As shown in fig. 4, another method for selecting a shared vocabulary according to an embodiment of the present invention may include:

in step 401, a plurality of candidate vocabulary pairs are selected from the source vocabulary and the target vocabulary.

Step 402, respectively initializing a sharing tendency parameter for each candidate vocabulary pair, pre-training the neural machine translation model by using a source sentence and a target sentence, and updating model parameters including the sharing tendency parameter to obtain a first neural machine translation model and model parameters thereof.

Step 403, selecting a shared vocabulary pair from the plurality of candidate vocabulary pairs according to the sharing tendency parameter of each candidate vocabulary pair obtained by pre-training.

The above steps 401 to 403 are similar to the steps 101 to 103 in FIG. 1, and are not described herein again for brevity.

Step 404, updating an encoder word vector and a decoder word vector of the neural-machine translation model, wherein for each vocabulary of the shared vocabulary pair in the source vocabulary and the target vocabulary, the same word vector is used.

Here, based on the obtained shared vocabulary pair, the same word vector is used for both the source vocabulary and the target vocabulary in the shared vocabulary pair, without using the encoder word vector and the decoder word vector, respectively. Preferably, the encoder word vectors of the source words may all be used, i.e. in the decoder word vectors, the word vectors of the target words in the shared word pair are replaced by the encoder word vectors of the source words in the shared word pair.

Step 405, training the neural machine translation model according to the updated encoder word vector and decoder word vector to obtain a second neural machine translation model.

Here, the word vector search layers may be used at the encoder and decoder ends, respectively, to convert the word vectors in the source sentence and the target sentence into an encoder word vector and a decoder word vector, which are then input to the encoder and decoder, respectively, to perform training of the model, and when the training satisfies a predetermined termination condition, the training is terminated, resulting in a second neural-machine translation model as the target model, according to the existing model training manner.

Through the steps, the embodiment of the invention adopts the shared vocabulary pairs in the neural machine translation model, reduces the model parameters, can reduce the training time of the neural machine translation model, and can also reduce the data volume required by the training of the neural machine translation model. In addition, the neural machine translation model obtained based on the method has better generalization capability.

Based on the above method, an embodiment of the present invention further provides an apparatus for implementing the above method, and referring to fig. 5, an apparatus 500 for selecting a shared vocabulary according to an embodiment of the present invention includes:

a first selecting unit 501, configured to select a plurality of candidate vocabulary pairs from a source vocabulary and a target vocabulary, where the source vocabulary is a vocabulary formed by source vocabularies at an encoder end of a neural machine translation model, and the target vocabulary is a vocabulary formed by target vocabularies at a decoder end of the neural machine translation model; each of the candidate vocabulary pairs comprises a candidate source vocabulary in the source vocabulary and a candidate target vocabulary in the target vocabulary;

a first training unit 502, configured to initialize a sharing tendency parameter for each candidate word pair, pre-train the neural machine translation model using a source sentence and a target sentence, update model parameters including the sharing tendency parameter, and obtain a first neural machine translation model and model parameters thereof, where in the pre-training process, for a candidate target word existing in the target sentence, a decoder word vector of the candidate target word and an encoder word vector of a candidate source word corresponding to the candidate target word are subjected to weighted summation according to the sharing tendency parameter of the candidate word pair, and then input the result to the decoder;

a second selecting unit 503, configured to select a shared vocabulary pair from the multiple candidate vocabulary pairs according to the pre-trained sharing tendency parameter of each candidate vocabulary pair.

Through the above units, the shared vocabulary selection apparatus 500 according to the embodiment of the present invention can select a shared vocabulary pair shared by the encoder and decoder ends of the neural machine translation model, thereby reducing model parameters, reducing the training time of the subsequent neural machine translation model, and reducing the data amount required for training the neural machine translation model.

Preferably, the first training unit 502 is further configured to, in the pre-training process: for non-candidate target words existing in the target sentence, inputting decoder word vectors of the non-candidate target words into the decoder, wherein the non-candidate target words are the rest words except the candidate target words in the target vocabulary; for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

Preferably, the first selection unit 501 may include the following units:

or the like, or, alternatively,

Preferably, the second selecting unit 503 is further configured to select a candidate vocabulary pair with the sharing tendency parameter being greater than a preset threshold as the shared vocabulary pair.

Preferably, the first training unit 502 includes:

Preferably, the shared vocabulary selecting means 500 further includes:

Through the word vector updating unit and the second training unit, the second neural machine translation model with higher generalization capability can be obtained through training, and the time and data required by training can be reduced.

Referring to fig. 6, a block diagram of a hardware structure of a device for selecting a shared vocabulary according to an embodiment of the present invention is further provided, as shown in fig. 6, the device 600 for selecting a shared vocabulary includes:

a processor 602; and

a memory 604, in which memory 604 computer program instructions are stored,

wherein the computer program instructions, when executed by the processor, cause the processor 602 to perform the steps of:

Further, as shown in fig. 6, the shared vocabulary selecting apparatus 600 may further include a network interface 601, an input device 603, a hard disk 605, and a display device 606.

The various interfaces and devices described above may be interconnected by a bus architecture. The bus architecture may be any architecture that includes any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 602, and one or more memories, represented by memory 604, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.

The network interface 601 may be connected to a network (e.g., the internet, a local area network, etc.), collect source sentence corpus and target sentence corpus from the network, and store the collected corpora in the hard disk 605.

The input device 603 can receive various commands input by an operator and send the commands to the processor 602 for execution. The input device 603 may include a keyboard or a pointing device (e.g., a mouse, trackball, touch pad, touch screen, etc.).

The display device 606 may display a result obtained by the processor 602 executing the instruction, for example, displaying a progress of model training and displaying a translation result of a sentence to be translated.

The memory 604 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 602.

It will be appreciated that memory 604 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 604 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 604 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 6041 and application programs 6042.

The operating system 6041 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 6042 includes various applications such as a Browser (Browser) and the like for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 6042.

The methods disclosed in the above embodiments of the present invention may be implemented in the processor 602 or implemented by the processor 602. The processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The processor 602 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 604, and the processor 602 reads the information in the memory 604 and performs the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

In particular, the computer program, when executed by the processor 602, may further implement the steps of:

during the pre-training process: for non-candidate target words existing in the target sentence, inputting decoder word vectors of the non-candidate target words into the decoder, wherein the non-candidate target words are the rest words except the candidate target words in the target vocabulary; for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

or the like, or, alternatively,

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method for selecting the shared vocabulary according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for selecting a shared vocabulary, comprising:

2. The method of claim 1, wherein during the pre-training process:

3. The method of claim 1, wherein the step of selecting a plurality of candidate vocabulary pairs from the source vocabulary and the target vocabulary comprises:

or the like, or, alternatively,

4. The method of claim 1, wherein the step of selecting a shared vocabulary pair from the plurality of candidate vocabulary pairs based on a pre-trained shared propensity parameter for each of the candidate vocabulary pairs comprises:

5. The method of claim 1, wherein the step of performing weighted summation on the decoder word vector of the candidate target word and the encoder word vector of the candidate source word corresponding to the candidate target word for the candidate target word existing in the target sentence according to the sharing tendency parameter of the candidate word pair comprises:

6. The method of any of claims 1 to 5, wherein after selecting a shared vocabulary pair, the method further comprises:

7. An apparatus for selecting a shared vocabulary, comprising:

8. Selection device according to claim 7,

the first training unit is further configured to, during the pre-training process: for non-candidate target words existing in the target sentence, inputting decoder word vectors of the non-candidate target words into the decoder, wherein the non-candidate target words are the rest words except the candidate target words in the target vocabulary; for a vocabulary of the source sentence, an encoder word vector for the vocabulary is input to the encoder.

9. The selection apparatus of claim 7, wherein the first selection unit comprises:

or the like, or, alternatively,

10. Selection device according to claim 7,

the second selecting unit is further configured to select the candidate vocabulary pair with the sharing tendency parameter larger than a preset threshold as the shared vocabulary pair.

11. The selection apparatus of claim 7, wherein the first training unit comprises:

12. The selection apparatus according to any one of claims 7 to 11, further comprising:

13. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method of selecting a shared vocabulary of any of claims 1 to 6.