CN115081461A

CN115081461A - Lightweight machine translation method based on convolutional neural network and translation model

Info

Publication number: CN115081461A
Application number: CN202210529030.6A
Authority: CN
Inventors: 徐新涛; 赵志远; 陈刚; 申荣铉; 边昳; 鲁华祥
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2022-09-20

Abstract

The disclosure provides a lightweight machine translation method based on a convolutional neural network and a translation model. The method comprises the following steps: responding to a text translation request, and acquiring a sentence to be translated carried in the text translation request; preprocessing the sentence to be translated to obtain a word segmentation sequence with position information; inputting the word segmentation sequence into a convolutional neural network, and outputting a local characteristic sequence; and inputting the local characteristic sequence into a translation model, and outputting a text translation result. In addition, the disclosure also provides a lightweight machine translation device, electronic equipment and a readable storage medium based on the convolutional neural network and the translation model.

Description

Lightweight machine translation method based on convolutional neural network and translation model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a machine translation method and device based on a convolutional neural network and a translation model.

Background

Machine translation is a process of translating a source language into a target language using a computer, and is an important branch of computational linguistics. With the rapid development of deep learning technology, researchers introduce neural networks into language models, and can better process the representation of common words and rare words. In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: generally, there is no light-weight machine translation method suitable for edge, and the precision of machine translation is also reduced due to the light weight of the machine translation method.

Disclosure of Invention

In view of the above, the present disclosure provides a lightweight machine translation method that improves the accuracy of machine translation while ensuring lightweight. Specifically, a method, apparatus, device, medium, and program product for lightweight machine translation based on convolutional neural networks and translation models are provided.

According to a first aspect of the present disclosure, there is provided a lightweight machine translation method based on a convolutional neural network and a translation model, including: responding to a text translation request, and acquiring a sentence to be translated carried in the text translation request; preprocessing the sentence to be translated to obtain a word segmentation sequence with position information; inputting the word segmentation sequence into a convolutional neural network, and outputting a local characteristic sequence; and inputting the local characteristic sequence into a translation model, and outputting a text translation result.

According to an embodiment of the present disclosure, the convolutional neural network includes an input layer, a normalization layer, a dimension extension layer, a convolution computation layer, an activation function layer, a full connection layer, and a residual connection layer.

According to an embodiment of the present disclosure, the inputting the word segmentation sequence into a convolutional neural network, and outputting a local feature sequence includes: inputting the word segmentation sequence into the input layer to obtain an original input sequence; expanding a processing dimension of the convolutional neural network using the dimension expansion layer so that the processing dimension is the same as a dimension of the original input sequence; normalizing the original input sequence by using the normalization layer to obtain a normalized word segmentation sequence; processing the normalized word segmentation sequence by utilizing the convolution calculation layer to obtain an initial local characteristic sequence; carrying out nonlinear processing on the initial local characteristic sequence by utilizing the activation function layer to obtain a nonlinear local characteristic sequence; transforming the dimensionality of the nonlinear local feature sequence by using the full-connection layer to obtain a nonlinear local feature sequence with a target dimensionality; and processing the nonlinear local feature sequence with the target dimensionality and the original input sequence by utilizing the residual connecting layer to obtain the local feature sequence.

According to an embodiment of the present disclosure, the preprocessing the sentence to be translated to obtain a word segmentation sequence with position information includes: performing word segmentation processing on the sentence to be translated by using a word segmentation algorithm to obtain an initial word segmentation sequence; and carrying out position coding processing on the initial word segmentation sequence, and embedding position information into the initial word segmentation sequence to obtain the word segmentation sequence with the position information.

According to an embodiment of the present disclosure, the translation model comprises an encoder sub-model and a decoder sub-model; wherein, the inputting the local feature sequence into a translation model, and the outputting a text translation result comprises: inputting the local characteristic sequence into the encoder submodel, and outputting a coding sequence result; and inputting the coding sequence result into the decoder submodel, and outputting the text translation result.

According to an embodiment of the present disclosure, the convolutional neural network and the translation model are obtained by the following training mode: acquiring a training sample data set, wherein the training sample data set comprises a sample statement and label information corresponding to the sample statement; preprocessing the sample sentence to obtain a sample word segmentation sequence with position information; inputting the sample word segmentation sequence into an initial convolutional neural network, and outputting a sample local characteristic sequence; inputting the sample local characteristic sequence into the initial translation model, and outputting a prediction translation result; and adjusting model parameters of the initial translation model and the initial convolutional neural network according to the predicted translation result and the label information to obtain the convolutional neural network and the translation model.

Another aspect of the present disclosure provides a lightweight machine translation apparatus based on a convolutional neural network and a translation model, including: the first obtaining module is used for responding to a text translation request and obtaining a sentence to be translated carried in the text translation request; the processing module is used for preprocessing the sentence to be translated to obtain a word segmentation sequence with position information; the first input module is used for inputting the word segmentation sequence into a convolutional neural network and outputting a local characteristic sequence; and the second input module is used for inputting the local characteristic sequence into a translation model and outputting a text translation result.

Another aspect of the present disclosure also provides an electronic device including: one or more processors; storage means for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described method for lightweight machine translation based on convolutional neural networks and translation models.

Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described convolutional neural network and translation model-based lightweight machine translation method.

Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described convolutional neural network and translation model-based lightweight machine translation method.

According to the embodiment of the disclosure, since the translation model does not have the capability of extracting the local features in the translation process essentially, and meanwhile, the translation precision is reduced due to the light weight of the translation model, the convolutional neural network is arranged in front of the translation model, the local features of the participle sequence are extracted through the convolutional neural network, and the translation model processes the local features extracted through the convolutional neural network to obtain the final translation result of the text translation request.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates a system architecture diagram of a convolutional neural network and translation model based lightweight machine translation method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a convolutional neural network and translation model based lightweight machine translation method, in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates a block diagram of a model used in a convolutional neural network and translation model based lightweight machine translation method, according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of a convolutional neural network, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically shows a schematic diagram of a translation model according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates an overall structure diagram of a model used in a convolutional neural network and translation model-based lightweight machine translation method according to another embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of a lightweight machine translation apparatus based on a convolutional neural network and a translation model according to an embodiment of the present disclosure; and

fig. 8 schematically illustrates a block diagram of an electronic device suitable for implementing a convolutional neural network and translation model based lightweight machine translation method, in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

It should be noted that the light-weight machine translation method and device based on the convolutional neural network and the translation model disclosed by the invention can be applied to the technical field of artificial intelligence, and can also be applied to any field except the technical field of artificial intelligence.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated. In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

The neural network is introduced into the language model, so that the representation of common words and rare words can be better processed. For example, an RNN (Recurrent Neural Networks) model can adapt to any sentence length, recursively process the context, and obtain the final result. However, RNN models and RNN model-based derivative models, such as GRU (Gated current Unit) models and LSTM (Long Short-Term-Memory) algorithms, need to learn the Long-distance dependency of each input word vector, generally use an embedding layer to map a sentence into an embedding space, and then use a hidden layer to sequentially calculate the knowledge obtained by the previous calculation. In the process, a plurality of hidden layers flow in sequence, and the calculation inside a single hidden layer is also performed in sequence, so that the problem that the parallel operation cannot be performed exists. Furthermore, the RNN model must wait for all previous inputs to be processed before the next input can be processed, which is a bottleneck in processing long sequences. And the RNN model associates the information of two arbitrary input or output positions, the number of operations required in the association process can be increased along with the increase of the position distance, and the difficulty in extracting the complex dependency relationship of the information between the remote positions is increased, so that the RNN model is difficult to be parallel, hardware acceleration is not facilitated, and the translation effect is not ideal.

Because of the problem of being unable to parallelize, the transform model (translation model) forbids a recursive structure in the RNN model because of stacking a self-attention layer and a point-by-point full connection layer, and thus the transform-based translation method has an advantage of high parallelism. However, the parameter amount of the Transformer model is large, for example, the Bert-large parameter is 334M, the Bert-base parameter is 109M, and the IB-Bertlage parameter is 293M. Because the number of parameters is large, the parameter sparsity is low, and the method is not generally applied to an edge end, a proper edge end transform algorithm does not exist.

In view of this, embodiments of the present disclosure provide a lightweight machine translation method based on a convolutional neural network and a translation model, which improves machine translation accuracy while providing a lightweight machine translation method suitable for an edge. Specifically, the method comprises the following steps: responding to the text translation request, and acquiring a sentence to be translated carried in the text translation request; preprocessing a sentence to be translated to obtain a word segmentation sequence with position information; inputting the word segmentation sequence into a convolutional neural network, and outputting a local characteristic sequence; and inputting the local characteristic sequence into a translation model, and outputting a text translation result.

Fig. 1 schematically shows a system architecture diagram of a convolutional neural network and translation model based lightweight machine translation method and apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send translation requests and the like. Various client applications may be installed on the

terminal devices

101, 102, 103, such as a translation-type application, a web browser application, a search-type application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, and 103 may be various electronic devices having a display screen, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, and the like, and the

terminal devices

101, 102, and 103 may display translation results obtained by the lightweight machine translation method.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for translation requests sent by users with the

terminal devices

101, 102, 103. The background management server may analyze and translate the received data such as the translation request, and feed back a processing result (for example, a translation result obtained according to the translation request of the user, a web page, information, or data obtained or generated according to the translation result, or the like) to the terminal device.

It should be noted that the lightweight machine translation method based on the convolutional neural network and the translation model provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the lightweight machine translation apparatus based on the convolutional neural network and the translation model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The lightweight machine translation method based on the convolutional neural network and the translation model provided by the embodiment of the disclosure can also be executed by a server or a server cluster which is different from the server 105 and can communicate with the

terminal devices

101, 102, 103 and/or the server 105. Correspondingly, the lightweight machine translation apparatus based on the convolutional neural network and the translation model provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

A convolutional neural network and translation model-based lightweight machine translation method according to the disclosed embodiment will be described in detail below with reference to fig. 2 to 6 based on the system architecture described in fig. 1.

Fig. 2 schematically illustrates a flow diagram of a convolutional neural network and translation model based lightweight machine translation method according to an embodiment of the present disclosure.

As shown in fig. 2, this embodiment may include operations S210 to S240.

In operation S210, in response to the text translation request, a to-be-translated sentence carried in the text translation request is acquired.

In operation S220, the sentence to be translated is preprocessed to obtain a word segmentation sequence having location information.

In operation S230, the segmentation word sequence is input into the convolutional neural network, and a local feature sequence is output.

In operation S240, the local feature sequence is input into the translation model, and a text translation result is output.

According to an embodiment of the present disclosure, the text translation request may be a text translation request generated by the terminal device according to the content to be translated. The content to be translated can include sentences to be translated and uploaded pictures to be translated, which are input by a user in the process of running the translation client application by the terminal device.

According to the embodiment of the disclosure, preprocessing can process a sentence to be translated into a symbol sequence so as to be input into a convolutional neural network and a translation model, and the preprocessing can comprise word segmentation operation and position coding operation. The sentence to be translated can obtain a word segmentation sequence with position information after being preprocessed.

According to the embodiment of the disclosure, the convolutional neural network can perform local feature extraction on the word sequence, only a statement with a certain length is extracted each time, and the processed sequence feature is output.

According to the embodiment of the disclosure, the translation model can further extract the processed sequence features output by the convolutional neural network, and the further extracted features are operated by an encoder sub-model in the translation model and then delivered to a decoder sub-model for decoding operation, so as to finally obtain the translation result corresponding to the sentence to be translated.

According to the embodiment of the disclosure, the convolutional neural network is arranged in front of the translation model, the local features of the participle sequence are extracted through the convolutional neural network, and the translation model processes the local features extracted through the convolutional neural network to obtain the final translation result of the text translation request.

According to an embodiment of the present disclosure, operation S220 may further include the operations of: performing word segmentation processing on a sentence to be translated by using a word segmentation algorithm to obtain an initial word segmentation sequence; and carrying out position coding processing on the initial word segmentation sequence, and embedding the position information into the initial word segmentation sequence to obtain the word segmentation sequence with the position information.

According to an embodiment of the present disclosure, the word segmentation algorithm may be a BPE (byte pair encoder) word segmentation algorithm, a to-be-translated sentence is subjected to word segmentation processing by using the BPE word segmentation algorithm, an obtained symbol sequence may be used as an initial word segmentation sequence, and a vocabulary size of the word segmentation algorithm may be set to 10000 (for example only).

According to the embodiment of the disclosure, the initial word segmentation sequence processed by the word segmentation algorithm may further include a position encoding operation, and the position information is embedded into the initial word segmentation sequence obtained after word segmentation to obtain the word segmentation sequence with the position information.

Fig. 3 schematically shows a block diagram of a model used in a convolutional neural network and translation model-based lightweight machine translation method according to an embodiment of the present disclosure.

As shown in fig. 3, the model used in the lightweight machine translation method based on the convolutional neural network and the translation model may be obtained by sequentially cascading both the convolutional neural network and the translation model. After a sample sentence or a sentence to be translated is input, the sample sentence or the sentence to be translated is input into a convolutional neural network after word segmentation algorithm processing and position coding processing, then a result obtained by the convolutional neural network is input into a translation model, activation function layers are arranged in the convolutional neural network model and the translation model, and finally a prediction translation result or a text translation result is output by the translation model.

FIG. 4 schematically shows a block diagram of a convolutional neural network, according to an embodiment of the present disclosure.

As shown in fig. 4, the convolutional neural network may include an input layer, a normalization layer, a dimension extension layer, a convolution computation layer, an activation function layer, a full-link layer, and a residual link layer.

According to an embodiment of the present disclosure, operation S230 may further include the operations of: inputting the word segmentation sequence into an input layer to obtain an original input sequence; expanding the processing dimensionality of the convolutional neural network by using a dimensionality expansion layer so that the processing dimensionality is the same as the dimensionality of the original input sequence; normalizing the original input sequence by using a normalization layer to obtain a normalized word segmentation sequence; processing the normalized word segmentation sequence by utilizing a convolution calculation layer to obtain an initial local characteristic sequence; carrying out nonlinear processing on the initial local characteristic sequence by utilizing an activation function layer to obtain a nonlinear local characteristic sequence; transforming the dimensionality of the nonlinear local feature sequence by using the full-connection layer to obtain the nonlinear local feature sequence with a target dimensionality, wherein the target dimensionality can be the same dimensionality as the input dimensionality of the translation model; and processing the nonlinear local feature sequence with the target dimensionality and the original input sequence by utilizing the residual error connecting layer to obtain a local feature sequence.

According to the embodiment of the disclosure, the normalization processing of the data by the normalization layer can avoid the phenomenon of gradient explosion or disappearance of the data in the subsequent calculation process, reduce the training difficulty and increase the precision of the convolutional neural network model; the residual connecting layer can fuse the local characteristic sequence and the original input sequence, avoid gradient explosion in the calculation process and introduce partial input information for output.

According to the embodiment of the disclosure, if only the translation model is used for machine translation, the attention calculation adopted by the translation model is essentially a space mapping and does not have the capability of extracting local features, and the translation model is used for machine translation, only the original sentence is added with the position information and then is directly handed to the attention module for calculation, so that the relationship among a plurality of locally connected words in one sentence is ignored. Based on the phenomenon, a convolutional neural network structure capable of extracting the characteristics of the local words can be arranged in front of the translation model, so that the extraction of the characteristics of the local words is enhanced, the problem that the relation among a plurality of locally connected words in a sentence is ignored due to the fact that only the translation model is used for machine translation is solved, and the final machine translation precision is improved.

According to an embodiment of the present disclosure, the translation model includes an encoder sub-model and a decoder sub-model; inputting the local feature sequence into a translation model, and outputting a text translation result comprises the following steps: inputting the local characteristic sequence into a coder sub-model, and outputting a coding sequence result; and inputting the coding sequence result into a decoder sub-model, and outputting a text translation result.

According to an embodiment of the present disclosure, in the encoder sub-model, the input may be a local feature sequence, and may also be a sequence feature extracted from the local feature sequence by a translation model. After the characteristics are operated by the encoder submodel, the decoder performs cyclic decoding operation on the result of the coding sequence again until the sentence to be translated is completely translated, and finally the translation result corresponding to the sentence to be translated is obtained.

According to the embodiment of the disclosure, the convolutional neural network and the translation model are obtained by the following training mode: acquiring a training sample data set, wherein the training sample data set comprises a sample statement and label information corresponding to the sample statement; preprocessing a sample sentence to obtain a sample word segmentation sequence with position information; inputting the sample word segmentation sequence into an initial convolutional neural network, and outputting a sample local characteristic sequence; inputting the sample local characteristic sequence into an initial translation model, and outputting a prediction translation result; and adjusting model parameters of the initial translation model and the initial convolutional neural network according to the predicted translation result and the label information to obtain the convolutional neural network and the translation model.

According to the embodiment of the disclosure, a training sample data set may be obtained from an existing machine translation data set, or an existing sentence and a translation result corresponding to the sentence may be used as the training sample data set, where the existing sentence and the translation result corresponding to the sentence are respectively used as a sample sentence and label information corresponding to the sample sentence, and the label information may be used to represent a target translation result corresponding to the sample sentence to be translated in a training process.

According to the embodiment of the disclosure, in the model used in the lightweight machine translation method based on the convolutional neural network and the translation model, the input and the output of the convolutional neural network and the translation model are both symbol sequences, so that the sample sentences in the training sample data set can be preprocessed to generate the sequences which can be input into the model. Specifically, a BPE word segmentation algorithm may be adopted to segment words of a sample sentence to be translated to obtain a sample word segmentation sequence, and the size of a word list of the word segmentation algorithm may be set to 10000; further, the sample word segmentation sequence after word segmentation also needs a position encoding operation, for example, position information is embedded into the sample word segmentation sequence, so as to obtain the sample word segmentation sequence with the position information.

According to the embodiment of the disclosure, the model used in constructing the lightweight machine translation method based on the convolutional neural network and the translation model can be constructed according to the manner shown in fig. 3. The model used can be constructed by a convolutional neural network and a translation model in sequence. In the training process, the initial convolutional neural network can be used for extracting the local features of the sample word segmentation sequence with the position information to obtain a local feature sequence, and then the initial translation model is used for further processing the local feature sequence extracted by the convolutional neural network to obtain a prediction translation result corresponding to the sample sentence; and adjusting model parameters of the initial convolutional neural network and the initial translation model according to the predicted sentence translation result until a loss function about the predicted translation result and the label information is converged, wherein the model obtained when the loss function is converged can be used as a trained model.

According to the embodiment of the disclosure, in constructing and obtaining the lightweight machine translation model based on the convolutional neural network and the translation model, the learning rate can be set in the training process, the step length of the learning rate preheating stage and the hyper-parameters required by model training such as the training batch can be set, and preferably, the step length of the learning rate preheating stage is set to 8000, so that the predicted translation result obtained in the training process can be more accurate.

According to an embodiment of the present disclosure, the initial convolutional neural network may include a normalization layer, a dimension extension layer, a convolution calculation layer, an activation function layer, a full connection layer, and a residual connection layer, as shown in fig. 4.

According to the embodiment of the disclosure, the sequences obtained through the word segmentation algorithm and the position coding operation are firstly subjected to normalization operation, and sentence coding results containing position information are output; before local feature extraction, the normalization operation can avoid gradient explosion or disappearance of data in calculation, so that the training difficulty can be reduced and the model precision can be increased.

According to the embodiment of the present disclosure, the dimension extension layer may be to increase the processing dimension of the initial convolutional neural network, for example, the representation form of the input sample statement is a two-dimensional matrix, while the two-dimensional initial convolutional neural network can only process a three-dimensional matrix, so that one channel dimension needs to be added, and the dimension extension layer is set to 1, which may represent that the input is the feature of the one channel. Specifically, the sentence encoding result may be input to a two-dimensional initial convolutional neural network for processing, where an input channel of the initial convolutional neural network structure is 1 and an output channel is a word vector length.

According to the embodiment of the disclosure, the matrix sizes of the outputs of the initial convolutional neural network and the initial translation model need to be kept consistent to facilitate the input and output of the result, and specifically, the number of output channels of the convolutional kernel can be set to 512. Furthermore, the size of the convolution kernel of a conventional processing statement is typically a one-dimensional convolution, whereas the convolution kernel in the disclosed embodiment is a two-dimensional convolution. Specifically, in order to solve the problem that the small-sized convolutional neural network cannot extract the information of the whole word vector, the size of the convolutional kernel is set to be 5 × 512, so that the dimension of the whole word vector can be covered, and in addition, the receptive field of the convolutional kernel with the size can completely cover the whole word vector.

According to the embodiment of the disclosure, a convolution kernel with the size of 5 × 512 is adopted for local feature extraction, the offset step length in the feature extraction can be 1, the padding size can be two 0 s supplemented by the row vector of the matrix, and the 0 supplementing operation is not performed on the column vector. Specifically, a convolution kernel of this size can extract local information that is not changed in the sample sentence, and 5 words of information can be extracted at a time using a convolution kernel of 5 × 512 size. When the convolution calculation is completed, the channel dimension is changed into 512 dimensions, the word vector dimension is changed into 1, and the output matrix of the initial convolution neural network at the moment is only the feature information of 512 channels and has no word vector information. This feature is also an essential difference from the conventional translation model, because the conventional translation model does not process the original information of the sentence any more, but directly processes the feature information, which may have a problem of ignoring the relationship between a plurality of locally connected words in a sentence.

According to an embodiment of the present disclosure, a dimension of the initial convolutional neural network before processing may be (Len, 512), Len may be a length of the sample statement, and a dimension of the initial convolutional neural network after processing may be (512, Len). In an embodiment, the dimension of the processed output may be (512, Len, 1), and the dimension in which the unnecessary 1 dimension needs to be deleted becomes (512, Len), and then the dimension is transformed to obtain (Len, 512). And taking the result obtained after the convolution calculation operation as an initial sample local characteristic sequence.

According to an embodiment of the present disclosure, the activation function layer may introduce a non-linear operation for the initial convolutional neural network so that the extracted initial sample local feature sequence becomes a non-linear sample local feature sequence.

According to the embodiment of the disclosure, the size of the full-connection layer can be set to 512 × 512, and the full-connection layer with the size is utilized to perform spatial mapping on the nonlinear sample local feature sequence to obtain the nonlinear sample local feature sequence of the target dimension. The target dimension may be the same dimension as the input dimension of the initial translation model.

According to the embodiment of the disclosure, the sample local feature sequence with the target dimensionality and the originally input information are subjected to residual error connection, and the output result of the initial convolutional neural network can be obtained. The setting of the residual error connecting layer in the application can avoid gradient explosion during training and introduce partial input information for output.

FIG. 5 schematically shows a schematic diagram of a translation model according to an embodiment of the present disclosure.

As shown in fig. 5, the translation model in the embodiment of the present disclosure may include an encoder sub-model and a decoder sub-model, and may further include a word vector embedding layer, a position encoding layer, and an activation layer. Wherein the encoder submodel and the decoder submodel can be stacked 4-6 layers each, preferably 7 layers each. The encoder submodel can also comprise three self-attention modules, alternatively, the number of the heads can also be set to be 5-8, and the encoder submodel can be adaptively adjusted according to actual needs. The encoder sub-model may also include a residual and normalization layer, a fully-connected layer, and an activation function layer. The decoder submodel can also comprise a Masked multi-head attention module, a self-attention multi-head module, a residual error and normalization layer, a full connection layer, an activation function layer and a linear layer.

According to an embodiment of the present disclosure, the sample local feature sequence output by the initial convolutional neural network is input to a three-headed self-attention module in 7 stacked encoder submodels; then, carrying out first residual error connection on a result sequence obtained after normalization operation of a normalization layer and a sample local characteristic sequence; the result after the first residual error connection operation can be input to a full connection layer with the size of (512, 1024), and then the second residual error connection is performed with the result after the first residual error connection, and finally the output result of the encoder submodel, namely the sample coding sequence, can be obtained.

According to the embodiment of the disclosure, the sample coding result is input into 7 stacked decoder submodels, and a first decoding character is output; returning the first decoded character to the decoder sub-model, and continuing to operate with the sample local characteristic sequence again to obtain a second decoded character; and returning the second decoded character to the decoder sub-model, continuing to operate with the sample local characteristic sequence again to obtain a third decoded character, and continuously repeating the process until the whole sample statement is translated. The translation result may be a sequence of the first decoded character, the second decoded character, and the third decoded character … ….

Fig. 6 schematically shows an overall structure diagram of a model used in a convolutional neural network and translation model-based lightweight machine translation method according to another embodiment of the present disclosure.

Fig. 6 shows a convolutional neural network and a translation model cascaded in sequence. The structure of the convolutional neural network and the partial structure of the translation model can refer to the description of fig. 3 to 5, and are not described in detail here. After a sample sentence or a sentence to be translated is input, a word segmentation algorithm processing and a position coding processing are carried out, and then an extracted initial sample local characteristic sequence or a local characteristic sequence is obtained through a normalization layer and a convolution calculation layer of a convolution neural network; and then the results are input into a mask multi-head attention module, a self-attention multi-head module, a residual error and normalization layer, a multi-head self-attention layer, a full connection layer, an activation function layer and a linear layer in 7 stacked decoder submodels, and a prediction translation result or a sample translation result is output.

According to the embodiment of the disclosure, the number of the multi-head self-attention module of the translation model is reduced, preferably reduced to three-head self-attention module, which does not cause the reduction of the translation precision, but due to the reduction of the attention matrix, specifically, the attention matrix is reduced to 128 dimensions, the scale of the attention calculation matrix is reduced, and further the number of parameters of the translation model is reduced, so that the light weight of the machine translation model is realized, and the machine translation method suitable for being applied to the edge end is provided.

According to the embodiment of the disclosure, the encoder in the translation model is set to 4-6 layers, preferably 7 stacked layers, and the attention calculation of the input is repeatedly performed through the encoder, so that the information extraction capability is enhanced.

According to the embodiment of the disclosure, parameters of a model used in a lightweight machine translation method based on a convolutional neural network and a translation model can be controlled within 38M, specifically, the translation model can be at 36.42M, and the convolutional neural network can be at 1.57M. The convolutional neural network is introduced to enhance the local feature extraction capability and effectively improve the translation accuracy under the condition that the parameter scale increase proportion is small.

It should be noted that, unless explicitly stated that there is an execution sequence between different operations or there is an execution sequence between different operations in technical implementation, the execution sequence between multiple operations may not be sequential, or multiple operations may be executed simultaneously in the flowchart in this disclosure.

Based on the light-weight machine translation method based on the convolutional neural network and the translation model, the disclosure also provides a light-weight machine translation device based on the convolutional neural network and the translation model. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 schematically shows a block diagram of a lightweight machine translation apparatus based on a convolutional neural network and a translation model according to an embodiment of the present disclosure.

As shown in fig. 7, the lightweight machine translation apparatus 700 based on the convolutional neural network and the translation model according to the embodiment includes a first obtaining module 710, a first processing module 720, a first input module 730, and a second input module 740.

The first obtaining module 710 is configured to, in response to the text translation request, obtain a sentence to be translated, where the sentence is carried in the text translation request.

And the processing module 720 is used for preprocessing the sentence to be translated to obtain a word segmentation sequence with position information.

The first input module 730 is configured to input the word segmentation sequence into the convolutional neural network, and output a local feature sequence.

And the second input module 740 is configured to input the local feature sequence into the translation model, and output a text translation result.

According to an embodiment of the present disclosure, the first input module 730 further includes a first input unit, an extension unit, a first processing unit, a second processing unit, a third processing unit, a fourth processing unit, and a fifth processing unit.

And the first input unit is used for inputting the word segmentation sequence into the input layer to obtain an original input sequence.

And the extension unit is used for extending the processing dimension of the convolutional neural network by using the dimension extension layer so that the processing dimension is the same as that of the original input sequence.

And the first processing unit is used for carrying out normalization processing on the original input sequence by utilizing the normalization layer to obtain a normalized word segmentation sequence.

And the second processing unit is used for processing the normalized word segmentation sequence by utilizing the convolution calculation layer to obtain an initial local feature sequence.

And the third processing unit is used for carrying out nonlinear processing on the initial local characteristic sequence by utilizing the activation function layer to obtain a nonlinear local characteristic sequence.

And the fourth processing unit is used for transforming the dimensionality of the nonlinear local feature sequence by utilizing the full connection layer to obtain the nonlinear local feature sequence with the target dimensionality.

And the fifth processing unit is used for processing the nonlinear local feature sequence with the target dimensionality and the original input sequence by utilizing the residual connecting layer to obtain a local feature sequence.

According to an embodiment of the present disclosure, the first processing module 720 further includes a sixth processing unit and a seventh processing unit.

And the sixth processing unit is used for performing word segmentation processing on the sentence to be translated by using a word segmentation algorithm to obtain an initial word segmentation sequence.

And the seventh processing unit is used for carrying out position coding processing on the initial word segmentation sequence and embedding the position information into the initial word segmentation sequence to obtain the word segmentation sequence with the position information.

According to an embodiment of the present disclosure, the second input module 740 further includes a second input unit and a third input unit.

And the second input unit is used for inputting the local characteristic sequence into the encoder submodel and outputting an encoding sequence result.

And the third input unit is used for inputting the coding sequence result into the decoder submodel and outputting a text translation result.

According to an embodiment of the present disclosure, the lightweight machine translation device 700 based on the convolutional neural network and the translation model further includes a second obtaining module, a second processing module, a third input module, a fourth input module, and an adjusting module.

And the second obtaining module is used for obtaining a training sample data set, wherein the training sample data set comprises a sample statement and label information corresponding to the sample statement.

And the second processing module is used for preprocessing the sample sentence to obtain a sample word segmentation sequence with position information.

And the third input module is used for inputting the sample word segmentation sequence into the initial convolutional neural network and outputting a sample local characteristic sequence.

And the fourth input module is used for inputting the sample local characteristic sequence into the initial translation model and outputting a prediction translation result.

And the adjusting module is used for adjusting model parameters of the initial translation model and the initial convolutional neural network according to the predicted translation result and the label information to obtain the convolutional neural network and the translation model.

According to the embodiment of the present disclosure, any plurality of the first obtaining module 710, the first processing module 720, the first input module 730, and the second input module 740 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 710, the first processing module 720, the first input module 730, and the second input module 740 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the first obtaining module 710, the first processing module 720, the first input module 730 and the second input module 740 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

It should be noted that, the portion of the lightweight machine translation apparatus based on the convolutional neural network and the translation model in the embodiment of the present disclosure corresponds to the portion of the lightweight machine translation method based on the convolutional neural network and the translation model in the embodiment of the present disclosure, and the description of the portion of the lightweight machine translation apparatus based on the convolutional neural network and the translation model specifically refers to the portion of the lightweight machine translation method based on the convolutional neural network and the translation model, and is not repeated here.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. Electronic device 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the light weight machine translation method based on the convolutional neural network and the translation model provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A lightweight machine translation method based on a convolutional neural network and a translation model comprises the following steps:

responding to a text translation request, and acquiring a sentence to be translated carried in the text translation request;

preprocessing the sentence to be translated to obtain a word segmentation sequence with position information;

inputting the word segmentation sequence into a convolutional neural network, and outputting a local characteristic sequence;

and inputting the local characteristic sequence into a translation model, and outputting a text translation result.

2. The method of claim 1, wherein the convolutional neural network comprises an input layer, a normalization layer, a dimension extension layer, a convolution computation layer, an activation function layer, a full-join layer, and a residual-join layer.

3. The method of claim 2, wherein the inputting the sequence of word segments into a convolutional neural network, outputting a sequence of local features comprises:

inputting the word segmentation sequence into the input layer to obtain an original input sequence;

expanding a processing dimension of the convolutional neural network using the dimension expansion layer so that the processing dimension is the same as a dimension of the original input sequence;

normalizing the original input sequence by using the normalization layer to obtain a normalized word segmentation sequence;

processing the normalized word segmentation sequence by utilizing the convolution calculation layer to obtain an initial local characteristic sequence;

carrying out nonlinear processing on the initial local characteristic sequence by utilizing the activation function layer to obtain a nonlinear local characteristic sequence;

transforming the dimensionality of the nonlinear local feature sequence by using the full-connection layer to obtain a nonlinear local feature sequence with a target dimensionality;

and processing the nonlinear local feature sequence with the target dimensionality and the original input sequence by utilizing the residual connecting layer to obtain the local feature sequence.

4. The method of claim 1, wherein the preprocessing the sentence to be translated to obtain a word segmentation sequence with position information comprises:

performing word segmentation processing on the sentence to be translated by using a word segmentation algorithm to obtain an initial word segmentation sequence;

and carrying out position coding processing on the initial word segmentation sequence, and embedding position information into the initial word segmentation sequence to obtain the word segmentation sequence with the position information.

5. The method of claim 1, wherein the translation model includes an encoder sub-model and a decoder sub-model;

wherein, the inputting the local feature sequence into a translation model, and the outputting a text translation result comprises:

inputting the local characteristic sequence into the encoder submodel, and outputting a coding sequence result;

and inputting the coding sequence result into the decoder submodel, and outputting the text translation result.

6. The method of claim 1, wherein,

the convolutional neural network and the translation model are obtained by the following training mode:

acquiring a training sample data set, wherein the training sample data set comprises a sample statement and label information corresponding to the sample statement;

preprocessing the sample sentence to obtain a sample word segmentation sequence with position information;

inputting the sample word segmentation sequence into an initial convolutional neural network, and outputting a sample local characteristic sequence;

inputting the sample local characteristic sequence into the initial translation model, and outputting a prediction translation result;

and adjusting model parameters of the initial translation model and the initial convolutional neural network according to the predicted translation result and the label information to obtain the convolutional neural network and the translation model.

7. A lightweight machine translation device based on a convolutional neural network and a translation model comprises:

the first obtaining module is used for responding to a text translation request and obtaining a sentence to be translated carried in the text translation request;

the processing module is used for preprocessing the sentence to be translated to obtain a word segmentation sequence with position information;

the first input module is used for inputting the word segmentation sequence into a convolutional neural network and outputting a local characteristic sequence;

and the second input module is used for inputting the local characteristic sequence into a translation model and outputting a text translation result.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 6.